2023-11-18 01:38:34,542 INFO [train_asr.py:1183] (2/4) Training started 2023-11-18 01:38:34,542 INFO [train_asr.py:1193] (2/4) Device: cuda:2 2023-11-18 01:38:34,544 INFO [train_asr.py:1205] (2/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'multi_KD', 'icefall-git-sha1': '025f11fd-dirty', 'icefall-git-date': 'Fri Nov 17 16:19:07 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/anaconda3/envs/multi_KD/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-10-1113160712-78bc8d8bd8-pw6cd', 'IP address': '10.177.94.17'}, 'world_size': 4, 'master_port': 13454, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'audio_tagging_loss_scale': 1.0, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'do_audio_tagging': True, 'full_libri': True, 'mini_libri': False, 'use_vox2': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_audioset': True, 'audioset_subset': 'unbalanced', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'small.en', 'blank_id': 0, 'vocab_size': 500} 2023-11-18 01:38:34,545 INFO [train_asr.py:1207] (2/4) About to create model 2023-11-18 01:38:35,382 INFO [train_asr.py:1211] (2/4) Number of model parameters: 65819362 2023-11-18 01:38:38,408 INFO [train_asr.py:1227] (2/4) Using DDP 2023-11-18 01:38:39,392 INFO [train_asr.py:1271] (2/4) Getting audioset cuts 2023-11-18 01:38:39,392 INFO [kd_datamodule.py:796] (2/4) About to get the audioset cuts. 2023-11-18 01:38:39,460 INFO [train_asr.py:1277] (2/4) Using mux to combine Librispeech with audioset 2023-11-18 01:38:39,460 INFO [train_asr.py:1287] (2/4) CutSet(len=2748469) [underlying data type: ] 2023-11-18 01:38:48,644 INFO [kd_datamodule.py:396] (2/4) Enable MUSAN 2023-11-18 01:38:48,644 INFO [kd_datamodule.py:397] (2/4) About to get Musan cuts 2023-11-18 01:38:51,057 INFO [kd_datamodule.py:427] (2/4) Enable SpecAugment 2023-11-18 01:38:51,057 INFO [kd_datamodule.py:428] (2/4) Time warp factor: 80 2023-11-18 01:38:51,057 INFO [kd_datamodule.py:438] (2/4) Num frame mask: 10 2023-11-18 01:38:51,057 INFO [kd_datamodule.py:451] (2/4) About to create train dataset 2023-11-18 01:38:51,058 INFO [kd_datamodule.py:487] (2/4) Using SimpleCutSampler 2023-11-18 01:38:51,058 INFO [kd_datamodule.py:495] (2/4) About to create train dataloader 2023-11-18 01:38:51,085 INFO [kd_datamodule.py:814] (2/4) About to get the audioset eval cuts. 2023-11-18 01:38:51,122 INFO [kd_datamodule.py:529] (2/4) About to create dev dataset 2023-11-18 01:38:51,625 INFO [kd_datamodule.py:550] (2/4) About to create dev dataloader 2023-11-18 01:39:26,885 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 0, loss[loss=3.897, simple_loss=2.475, pruned_loss=2.433, audio_tagging_loss=1.177, over 15554.00 frames. ], tot_loss[loss=3.897, simple_loss=2.475, pruned_loss=2.433, audio_tagging_loss=1.177, over 15554.00 frames. ], batch size: 57, lr: 2.25e-02, grad_scale: 2.0 2023-11-18 01:39:26,885 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 01:40:00,453 INFO [train_asr.py:1147] (2/4) Epoch 1, validation: loss=2.927, simple_loss=1.349, pruned_loss=1.339, audio_tagging_loss=1.444, over 4681554.00 frames. 2023-11-18 01:40:00,453 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 01:40:01,492 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=121.16 vs. limit=5.0 2023-11-18 01:40:12,642 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=90.58 vs. limit=7.5 2023-11-18 01:40:18,175 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=70.21 vs. limit=5.033333333333333 2023-11-18 01:40:21,125 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.70 vs. limit=7.55 2023-11-18 01:40:23,616 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=133.90 vs. limit=7.525 2023-11-18 01:40:24,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=66.66666666666667, ans=0.49166666666666664 2023-11-18 01:40:26,375 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=233.42 vs. limit=7.525 2023-11-18 01:40:30,632 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=483.10 vs. limit=7.55 2023-11-18 01:40:34,754 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=173.60 vs. limit=7.55 2023-11-18 01:40:38,171 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=236.21 vs. limit=5.066666666666666 2023-11-18 01:40:52,891 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=81.22 vs. limit=7.65 2023-11-18 01:41:04,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=266.6666666666667, ans=0.4875 2023-11-18 01:41:05,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=266.6666666666667, ans=0.19 2023-11-18 01:41:09,568 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 50, loss[loss=0.611, simple_loss=0.5016, pruned_loss=0.5635, audio_tagging_loss=0.04652, over 15120.00 frames. ], tot_loss[loss=1.363, simple_loss=1.004, pruned_loss=0.8643, audio_tagging_loss=0.2704, over 688319.22 frames. ], batch size: 56, lr: 2.48e-02, grad_scale: 1.0 2023-11-18 01:41:18,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=333.3333333333333, ans=0.484375 2023-11-18 01:41:19,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=330.71 vs. limit=7.625 2023-11-18 01:41:24,476 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=511.18 vs. limit=7.8 2023-11-18 01:41:32,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=400.0, ans=0.48125 2023-11-18 01:41:32,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=400.0, ans=0.091 2023-11-18 01:41:49,848 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=130.08 vs. limit=5.266666666666667 2023-11-18 01:41:53,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=533.3333333333334, ans=0.475 2023-11-18 01:41:55,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=533.3333333333334, ans=0.8813333333333333 2023-11-18 01:41:57,759 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.88 vs. limit=7.9 2023-11-18 01:42:01,227 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=123.99 vs. limit=7.9 2023-11-18 01:42:06,695 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.89 vs. limit=3.09 2023-11-18 01:42:17,381 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=165.69 vs. limit=7.75 2023-11-18 01:42:18,214 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 100, loss[loss=0.2315, simple_loss=0.1679, pruned_loss=0.2024, audio_tagging_loss=0.03843, over 15920.00 frames. ], tot_loss[loss=0.8839, simple_loss=0.6709, pruned_loss=0.6465, audio_tagging_loss=0.1425, over 1216528.68 frames. ], batch size: 62, lr: 2.70e-02, grad_scale: 2.0 2023-11-18 01:42:19,525 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 4.039e+01 1.213e+02 5.684e+02 1.606e+03 1.428e+04, threshold=1.137e+03, percent-clipped=0.0 2023-11-18 01:42:22,603 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=194.02 vs. limit=7.75 2023-11-18 01:42:23,148 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=24.32 vs. limit=8.0 2023-11-18 01:42:28,017 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=306.47 vs. limit=7.75 2023-11-18 01:42:32,156 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=221.96 vs. limit=5.366666666666667 2023-11-18 01:42:33,167 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=230.26 vs. limit=7.775 2023-11-18 01:42:34,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=733.3333333333334, ans=0.465625 2023-11-18 01:42:36,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=733.3333333333334, ans=0.4083333333333333 2023-11-18 01:42:38,504 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=192.44 vs. limit=7.775 2023-11-18 01:42:38,670 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=186.06 vs. limit=7.775 2023-11-18 01:42:44,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=800.0, ans=0.095 2023-11-18 01:42:49,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=800.0, ans=0.4625 2023-11-18 01:42:53,143 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=72.24 vs. limit=7.8 2023-11-18 01:42:53,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=800.0, ans=0.4625 2023-11-18 01:42:55,517 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=100.45 vs. limit=7.8 2023-11-18 01:43:05,622 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=147.24 vs. limit=8.15 2023-11-18 01:43:05,673 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=188.00 vs. limit=7.825 2023-11-18 01:43:10,970 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.14 vs. limit=5.233333333333333 2023-11-18 01:43:11,953 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=131.10 vs. limit=7.85 2023-11-18 01:43:20,006 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=158.90 vs. limit=7.85 2023-11-18 01:43:24,588 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 150, loss[loss=0.5114, simple_loss=0.4201, pruned_loss=0.5158, audio_tagging_loss=0.02052, over 16076.00 frames. ], tot_loss[loss=0.7068, simple_loss=0.5459, pruned_loss=0.5681, audio_tagging_loss=0.09468, over 1618754.79 frames. ], batch size: 58, lr: 2.93e-02, grad_scale: 2.0 2023-11-18 01:43:34,287 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=179.63 vs. limit=8.25 2023-11-18 01:43:39,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=1066.6666666666667, ans=0.45 2023-11-18 01:43:42,205 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=167.64 vs. limit=8.3 2023-11-18 01:43:57,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=1133.3333333333333, ans=0.1575 2023-11-18 01:43:57,735 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=201.93 vs. limit=8.35 2023-11-18 01:43:57,961 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=65.51 vs. limit=7.925 2023-11-18 01:43:58,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=1133.3333333333333, ans=0.446875 2023-11-18 01:44:04,306 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=221.76 vs. limit=7.95 2023-11-18 01:44:07,031 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=178.73 vs. limit=8.4 2023-11-18 01:44:09,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=1200.0, ans=8.4 2023-11-18 01:44:11,241 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=47.06 vs. limit=7.95 2023-11-18 01:44:19,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1266.6666666666667, ans=0.440625 2023-11-18 01:44:24,403 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=41.79 vs. limit=5.633333333333334 2023-11-18 01:44:29,329 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=316.22 vs. limit=8.45 2023-11-18 01:44:32,275 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 200, loss[loss=0.2616, simple_loss=0.2049, pruned_loss=0.2483, audio_tagging_loss=0.01972, over 13996.00 frames. ], tot_loss[loss=0.5954, simple_loss=0.4637, pruned_loss=0.5019, audio_tagging_loss=0.06966, over 1932687.49 frames. ], batch size: 54, lr: 3.15e-02, grad_scale: 4.0 2023-11-18 01:44:33,048 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=131.62 vs. limit=8.0 2023-11-18 01:44:33,546 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.394e+01 4.484e+01 5.110e+01 6.274e+01 1.485e+02, threshold=1.022e+02, percent-clipped=0.0 2023-11-18 01:44:46,359 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=102.58 vs. limit=8.025 2023-11-18 01:44:49,142 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=149.85 vs. limit=8.025 2023-11-18 01:44:51,908 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=23.89 vs. limit=8.55 2023-11-18 01:44:54,143 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=264.36 vs. limit=8.55 2023-11-18 01:45:01,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1466.6666666666667, ans=0.23533333333333334 2023-11-18 01:45:13,631 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.53 vs. limit=4.613333333333333 2023-11-18 01:45:14,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1533.3333333333333, ans=0.428125 2023-11-18 01:45:18,741 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=77.66 vs. limit=8.075 2023-11-18 01:45:18,935 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=51.72 vs. limit=8.65 2023-11-18 01:45:28,394 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=84.95 vs. limit=8.1 2023-11-18 01:45:31,135 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=35.53 vs. limit=8.1 2023-11-18 01:45:32,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=1600.0, ans=0.0104 2023-11-18 01:45:37,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=1600.0, ans=0.284 2023-11-18 01:45:41,052 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 250, loss[loss=0.4714, simple_loss=0.3833, pruned_loss=0.4481, audio_tagging_loss=0.01682, over 15861.00 frames. ], tot_loss[loss=0.5309, simple_loss=0.4161, pruned_loss=0.4602, audio_tagging_loss=0.05445, over 2179716.21 frames. ], batch size: 58, lr: 3.38e-02, grad_scale: 4.0 2023-11-18 01:45:53,054 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=199.25 vs. limit=8.15 2023-11-18 01:45:55,785 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=119.97 vs. limit=8.8 2023-11-18 01:45:57,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1733.3333333333333, ans=0.41875 2023-11-18 01:45:57,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=1733.3333333333333, ans=0.41875 2023-11-18 01:46:02,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=1733.3333333333333, ans=0.23266666666666666 2023-11-18 01:46:12,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=1800.0, ans=0.0595 2023-11-18 01:46:17,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=1800.0, ans=0.232 2023-11-18 01:46:28,386 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=50.25 vs. limit=8.2 2023-11-18 01:46:29,663 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=4.746666666666667 2023-11-18 01:46:30,896 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.17 vs. limit=4.746666666666667 2023-11-18 01:46:40,967 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.04 vs. limit=5.483333333333333 2023-11-18 01:46:46,673 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 300, loss[loss=0.3876, simple_loss=0.3122, pruned_loss=0.3616, audio_tagging_loss=0.01383, over 15607.00 frames. ], tot_loss[loss=0.4896, simple_loss=0.3852, pruned_loss=0.4315, audio_tagging_loss=0.04432, over 2377090.19 frames. ], batch size: 58, lr: 3.60e-02, grad_scale: 8.0 2023-11-18 01:46:47,929 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.566e+01 4.754e+01 5.461e+01 6.771e+01 2.069e+02, threshold=1.092e+02, percent-clipped=3.0 2023-11-18 01:46:48,622 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=60.76 vs. limit=8.25 2023-11-18 01:46:54,893 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=116.60 vs. limit=8.25 2023-11-18 01:46:56,339 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=52.17 vs. limit=8.25 2023-11-18 01:46:56,382 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=45.35 vs. limit=6.0 2023-11-18 01:47:05,037 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=82.99 vs. limit=8.275 2023-11-18 01:47:11,165 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=102.09 vs. limit=8.275 2023-11-18 01:47:19,356 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.86 vs. limit=9.1 2023-11-18 01:47:27,919 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=26.04 vs. limit=9.15 2023-11-18 01:47:31,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=2200.0, ans=0.22499999999999998 2023-11-18 01:47:32,950 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=149.41 vs. limit=8.325 2023-11-18 01:47:37,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=2266.6666666666665, ans=0.2773333333333333 2023-11-18 01:47:40,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=2266.6666666666665, ans=0.5 2023-11-18 01:47:51,172 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 350, loss[loss=0.4119, simple_loss=0.3275, pruned_loss=0.3825, audio_tagging_loss=0.01451, over 15165.00 frames. ], tot_loss[loss=0.4571, simple_loss=0.3596, pruned_loss=0.4063, audio_tagging_loss=0.0375, over 2518854.41 frames. ], batch size: 55, lr: 3.83e-02, grad_scale: 8.0 2023-11-18 01:47:54,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=2333.3333333333335, ans=0.1125 2023-11-18 01:48:02,904 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=42.83 vs. limit=9.25 2023-11-18 01:48:08,369 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=4.96 2023-11-18 01:48:13,429 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=39.80 vs. limit=8.4 2023-11-18 01:48:14,887 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.68 vs. limit=9.3 2023-11-18 01:48:15,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=2400.0, ans=0.11 2023-11-18 01:48:18,665 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=27.71 vs. limit=6.233333333333333 2023-11-18 01:48:20,897 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=30.88 vs. limit=6.233333333333333 2023-11-18 01:48:25,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2466.6666666666665, ans=6.541666666666666 2023-11-18 01:48:29,867 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=87.34 vs. limit=8.45 2023-11-18 01:48:36,187 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=274.29 vs. limit=8.45 2023-11-18 01:48:43,725 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=27.70 vs. limit=8.475 2023-11-18 01:48:44,700 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=103.49 vs. limit=8.475 2023-11-18 01:48:57,494 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 400, loss[loss=0.3487, simple_loss=0.266, pruned_loss=0.3099, audio_tagging_loss=0.02246, over 15009.00 frames. ], tot_loss[loss=0.4321, simple_loss=0.3391, pruned_loss=0.3854, audio_tagging_loss=0.03263, over 2637236.62 frames. ], batch size: 57, lr: 4.05e-02, grad_scale: 16.0 2023-11-18 01:48:58,705 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.943e+01 5.270e+01 6.183e+01 8.354e+01 3.927e+02, threshold=1.237e+02, percent-clipped=8.0 2023-11-18 01:49:00,559 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=45.56 vs. limit=9.5 2023-11-18 01:49:06,701 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=52.76 vs. limit=8.5 2023-11-18 01:49:08,099 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=118.22 vs. limit=8.5 2023-11-18 01:49:11,059 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=48.17 vs. limit=6.366666666666667 2023-11-18 01:49:11,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=2733.3333333333335, ans=0.44653265991548297 2023-11-18 01:49:11,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=2733.3333333333335, ans=0.371875 2023-11-18 01:49:11,819 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=45.04 vs. limit=8.525 2023-11-18 01:49:14,480 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=125.74 vs. limit=9.55 2023-11-18 01:49:16,880 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=163.99 vs. limit=8.525 2023-11-18 01:49:22,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=2800.0, ans=0.095 2023-11-18 01:49:33,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=2800.0, ans=0.095 2023-11-18 01:49:36,135 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.76 vs. limit=5.716666666666667 2023-11-18 01:49:43,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=52.40 vs. limit=8.575 2023-11-18 01:49:46,408 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=87.91 vs. limit=8.575 2023-11-18 01:49:59,959 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=30.30 vs. limit=8.625 2023-11-18 01:50:00,771 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 450, loss[loss=0.381, simple_loss=0.2899, pruned_loss=0.3355, audio_tagging_loss=0.02211, over 15013.00 frames. ], tot_loss[loss=0.4135, simple_loss=0.3235, pruned_loss=0.3685, audio_tagging_loss=0.02902, over 2728550.47 frames. ], batch size: 54, lr: 4.28e-02, grad_scale: 16.0 2023-11-18 01:50:11,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=3000.0, ans=0.0875 2023-11-18 01:50:12,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=3066.6666666666665, ans=0.031 2023-11-18 01:50:14,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=3066.6666666666665, ans=0.1166666666666667 2023-11-18 01:50:25,954 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=42.01 vs. limit=8.675 2023-11-18 01:50:37,847 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.45 vs. limit=9.85 2023-11-18 01:50:44,105 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=89.43 vs. limit=8.7 2023-11-18 01:50:47,974 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.80 vs. limit=9.9 2023-11-18 01:50:52,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=3266.6666666666665, ans=0.346875 2023-11-18 01:51:05,765 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 500, loss[loss=0.347, simple_loss=0.2638, pruned_loss=0.2933, audio_tagging_loss=0.02083, over 14162.00 frames. ], tot_loss[loss=0.4044, simple_loss=0.3151, pruned_loss=0.3587, audio_tagging_loss=0.02638, over 2795852.50 frames. ], batch size: 57, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:51:06,950 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 3.968e+01 4.922e+01 5.274e+01 6.306e+01 1.338e+02, threshold=1.055e+02, percent-clipped=1.0 2023-11-18 01:51:07,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=3333.3333333333335, ans=0.07499999999999998 2023-11-18 01:51:07,692 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.61 vs. limit=10.0 2023-11-18 01:51:07,703 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.85 vs. limit=10.0 2023-11-18 01:51:11,079 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=32.12 vs. limit=8.75 2023-11-18 01:51:15,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=3333.3333333333335, ans=0.7833333333333333 2023-11-18 01:51:21,608 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=91.83 vs. limit=8.775 2023-11-18 01:51:25,530 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=17.79 vs. limit=8.775 2023-11-18 01:51:26,657 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=91.66 vs. limit=8.775 2023-11-18 01:51:27,895 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=2.42 vs. limit=3.51 2023-11-18 01:51:31,541 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=6.60 vs. limit=5.386666666666667 2023-11-18 01:51:32,726 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=26.73 vs. limit=8.8 2023-11-18 01:51:42,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=3533.3333333333335, ans=0.334375 2023-11-18 01:51:45,231 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.82 vs. limit=8.825 2023-11-18 01:52:03,489 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=18.16 vs. limit=8.85 2023-11-18 01:52:05,732 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=8.85 2023-11-18 01:52:09,209 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 550, loss[loss=0.3946, simple_loss=0.3014, pruned_loss=0.3384, audio_tagging_loss=0.01692, over 15379.00 frames. ], tot_loss[loss=0.3978, simple_loss=0.3087, pruned_loss=0.3506, audio_tagging_loss=0.02416, over 2850108.31 frames. ], batch size: 57, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:52:13,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3666.6666666666665, ans=0.328125 2023-11-18 01:52:15,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3666.6666666666665, ans=0.2633333333333333 2023-11-18 01:52:27,007 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=20.73 vs. limit=8.9 2023-11-18 01:52:32,975 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=23.07 vs. limit=10.35 2023-11-18 01:52:34,514 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.85 vs. limit=10.35 2023-11-18 01:52:36,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=3800.0, ans=0.057499999999999996 2023-11-18 01:52:37,295 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=77.26 vs. limit=8.925 2023-11-18 01:52:39,320 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.16 vs. limit=8.925 2023-11-18 01:52:40,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3800.0, ans=5.95 2023-11-18 01:52:45,245 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.34 vs. limit=8.925 2023-11-18 01:52:48,888 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=39.94 vs. limit=8.95 2023-11-18 01:52:50,172 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.53 vs. limit=5.966666666666667 2023-11-18 01:52:50,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=3866.6666666666665, ans=0.258 2023-11-18 01:52:52,636 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=38.56 vs. limit=8.95 2023-11-18 01:52:52,849 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.40 vs. limit=8.95 2023-11-18 01:53:07,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=3933.3333333333335, ans=0.315625 2023-11-18 01:53:12,528 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 600, loss[loss=0.3075, simple_loss=0.2303, pruned_loss=0.2514, audio_tagging_loss=0.01869, over 16046.00 frames. ], tot_loss[loss=0.3903, simple_loss=0.3015, pruned_loss=0.3406, audio_tagging_loss=0.02254, over 2891540.51 frames. ], batch size: 61, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:53:13,653 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 4.102e+01 5.824e+01 6.784e+01 8.267e+01 3.333e+02, threshold=1.357e+02, percent-clipped=4.0 2023-11-18 01:53:13,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=4000.0, ans=0.009999999999999995 2023-11-18 01:53:17,912 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=16.71 vs. limit=9.0 2023-11-18 01:53:20,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=4000.0, ans=0.3125 2023-11-18 01:53:21,440 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=35.76 vs. limit=9.0 2023-11-18 01:53:31,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=4066.6666666666665, ans=0.7906666666666666 2023-11-18 01:53:34,384 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=20.79 vs. limit=7.033333333333333 2023-11-18 01:53:43,752 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=14.95 vs. limit=9.05 2023-11-18 01:53:46,769 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=19.70 vs. limit=9.05 2023-11-18 01:53:53,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=4200.0, ans=0.303125 2023-11-18 01:53:55,150 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.91 vs. limit=10.65 2023-11-18 01:53:57,720 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.77 vs. limit=10.65 2023-11-18 01:54:03,672 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.11 vs. limit=5.706666666666667 2023-11-18 01:54:16,777 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 650, loss[loss=0.3998, simple_loss=0.3086, pruned_loss=0.3321, audio_tagging_loss=0.01093, over 15281.00 frames. ], tot_loss[loss=0.3826, simple_loss=0.294, pruned_loss=0.3294, audio_tagging_loss=0.02161, over 2923660.25 frames. ], batch size: 56, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:54:19,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=4333.333333333333, ans=0.296875 2023-11-18 01:54:23,544 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=42.50 vs. limit=9.125 2023-11-18 01:54:24,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=4333.333333333333, ans=0.296875 2023-11-18 01:54:31,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=17.45 vs. limit=9.15 2023-11-18 01:54:34,253 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.65 vs. limit=10.8 2023-11-18 01:55:00,535 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.49 vs. limit=10.9 2023-11-18 01:55:02,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=4533.333333333333, ans=0.2875 2023-11-18 01:55:03,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=4533.333333333333, ans=0.04777777777777778 2023-11-18 01:55:05,327 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.69 vs. limit=10.9 2023-11-18 01:55:14,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=4600.0, ans=10.95 2023-11-18 01:55:19,023 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 700, loss[loss=0.3816, simple_loss=0.2912, pruned_loss=0.3027, audio_tagging_loss=0.01564, over 16369.00 frames. ], tot_loss[loss=0.3774, simple_loss=0.2888, pruned_loss=0.3202, audio_tagging_loss=0.02071, over 2949873.19 frames. ], batch size: 60, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:55:19,287 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.136e+01 2023-11-18 01:55:20,172 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 5.747e+01 8.200e+01 9.584e+01 1.192e+02 3.813e+02, threshold=1.917e+02, percent-clipped=10.0 2023-11-18 01:55:25,150 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=47.66 vs. limit=9.25 2023-11-18 01:55:28,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=4666.666666666667, ans=0.009855072463768115 2023-11-18 01:55:35,845 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.54 vs. limit=5.8933333333333335 2023-11-18 01:55:40,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=4733.333333333333, ans=0.25266666666666665 2023-11-18 01:56:08,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=4933.333333333333, ans=0.26875 2023-11-18 01:56:09,665 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.76 vs. limit=11.2 2023-11-18 01:56:12,444 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=28.79 vs. limit=11.2 2023-11-18 01:56:14,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4933.333333333333, ans=0.26875 2023-11-18 01:56:21,540 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 750, loss[loss=0.3473, simple_loss=0.258, pruned_loss=0.2707, audio_tagging_loss=0.01923, over 15090.00 frames. ], tot_loss[loss=0.3776, simple_loss=0.288, pruned_loss=0.3152, audio_tagging_loss=0.01998, over 2974223.67 frames. ], batch size: 59, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:56:22,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=5000.0, ans=0.25 2023-11-18 01:56:37,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=5066.666666666667, ans=0.2625 2023-11-18 01:56:59,688 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.75 vs. limit=7.6 2023-11-18 01:57:03,219 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=10.05 vs. limit=9.45 2023-11-18 01:57:04,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=5200.0, ans=0.035 2023-11-18 01:57:04,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=5200.0, ans=0.248 2023-11-18 01:57:10,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=5266.666666666667, ans=0.253125 2023-11-18 01:57:13,960 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.92 vs. limit=7.633333333333334 2023-11-18 01:57:14,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=5266.666666666667, ans=0.00972463768115942 2023-11-18 01:57:16,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=5266.666666666667, ans=0.253125 2023-11-18 01:57:20,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=5266.666666666667, ans=0.24733333333333332 2023-11-18 01:57:24,943 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 800, loss[loss=0.502, simple_loss=0.3894, pruned_loss=0.3918, audio_tagging_loss=0.0102, over 15361.00 frames. ], tot_loss[loss=0.373, simple_loss=0.2835, pruned_loss=0.3057, audio_tagging_loss=0.01965, over 2986811.29 frames. ], batch size: 57, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:57:26,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=5333.333333333333, ans=0.25 2023-11-18 01:57:27,267 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.496e+01 8.780e+01 1.132e+02 1.440e+02 3.329e+02, threshold=2.265e+02, percent-clipped=7.0 2023-11-18 01:57:38,441 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=21.26 vs. limit=9.525 2023-11-18 01:57:45,348 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.23 vs. limit=11.55 2023-11-18 01:57:47,018 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.07 vs. limit=6.35 2023-11-18 01:57:57,818 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=9.55 2023-11-18 01:58:05,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=5533.333333333333, ans=11.65 2023-11-18 01:58:05,554 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.40 vs. limit=9.575 2023-11-18 01:58:06,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=5533.333333333333, ans=6.383333333333333 2023-11-18 01:58:10,001 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.79 vs. limit=9.575 2023-11-18 01:58:25,519 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 850, loss[loss=0.3802, simple_loss=0.2901, pruned_loss=0.2814, audio_tagging_loss=0.01588, over 15058.00 frames. ], tot_loss[loss=0.369, simple_loss=0.2801, pruned_loss=0.2964, audio_tagging_loss=0.01913, over 2998258.06 frames. ], batch size: 57, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:58:29,534 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.42 vs. limit=11.75 2023-11-18 01:58:31,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=5666.666666666667, ans=0.234375 2023-11-18 01:58:33,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=5666.666666666667, ans=0.043055555555555555 2023-11-18 01:58:36,886 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=9.65 2023-11-18 01:58:40,816 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=11.8 2023-11-18 01:58:54,250 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=28.28 vs. limit=9.675 2023-11-18 01:59:12,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=5866.666666666667, ans=0.07 2023-11-18 01:59:13,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=5933.333333333333, ans=0.04194444444444445 2023-11-18 01:59:17,563 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=18.15 vs. limit=9.725 2023-11-18 01:59:26,569 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 900, loss[loss=0.3439, simple_loss=0.2636, pruned_loss=0.2368, audio_tagging_loss=0.02001, over 14328.00 frames. ], tot_loss[loss=0.3666, simple_loss=0.2789, pruned_loss=0.2873, audio_tagging_loss=0.01875, over 3004777.50 frames. ], batch size: 54, lr: 4.48e-02, grad_scale: 16.0 2023-11-18 01:59:28,504 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=17.95 vs. limit=9.75 2023-11-18 01:59:28,876 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 5.579e+01 7.921e+01 9.785e+01 1.252e+02 2.736e+02, threshold=1.957e+02, percent-clipped=4.0 2023-11-18 01:59:49,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.63 vs. limit=12.05 2023-11-18 01:59:53,939 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=3.634e-02 2023-11-18 01:59:56,562 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.75 vs. limit=9.8 2023-11-18 02:00:07,628 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.47 vs. limit=9.825 2023-11-18 02:00:14,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=6266.666666666667, ans=0.20625 2023-11-18 02:00:16,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=6266.666666666667, ans=0.20625 2023-11-18 02:00:22,880 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=3.94 2023-11-18 02:00:28,092 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 950, loss[loss=0.1942, simple_loss=0.1441, pruned_loss=0.121, audio_tagging_loss=0.0205, over 14888.00 frames. ], tot_loss[loss=0.3564, simple_loss=0.272, pruned_loss=0.2723, audio_tagging_loss=0.01824, over 3014569.73 frames. ], batch size: 57, lr: 4.48e-02, grad_scale: 8.0 2023-11-18 02:00:37,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=6333.333333333333, ans=0.23666666666666666 2023-11-18 02:00:49,162 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.80 vs. limit=9.9 2023-11-18 02:00:49,471 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.54 vs. limit=6.6 2023-11-18 02:00:50,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=6466.666666666667, ans=0.03972222222222222 2023-11-18 02:00:51,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=6466.666666666667, ans=9.925 2023-11-18 02:00:56,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=6466.666666666667, ans=0.19687500000000002 2023-11-18 02:01:01,333 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.06 vs. limit=9.925 2023-11-18 02:01:08,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=6533.333333333333, ans=0.009449275362318842 2023-11-18 02:01:09,058 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.45 vs. limit=6.613333333333333 2023-11-18 02:01:27,321 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1000, loss[loss=0.3133, simple_loss=0.249, pruned_loss=0.2046, audio_tagging_loss=0.01398, over 15538.00 frames. ], tot_loss[loss=0.3485, simple_loss=0.2678, pruned_loss=0.259, audio_tagging_loss=0.01754, over 3016105.79 frames. ], batch size: 56, lr: 4.48e-02, grad_scale: 8.0 2023-11-18 02:01:29,000 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.67 vs. limit=12.5 2023-11-18 02:01:30,669 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 5.563e+01 9.019e+01 1.486e+02 2.475e+02 7.919e+02, threshold=2.973e+02, percent-clipped=36.0 2023-11-18 02:01:36,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=6666.666666666667, ans=0.03888888888888889 2023-11-18 02:01:39,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=6733.333333333333, ans=0.184375 2023-11-18 02:01:44,337 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=6.56 vs. limit=6.683333333333334 2023-11-18 02:01:53,958 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:01:55,750 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.21 vs. limit=6.7 2023-11-18 02:01:56,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=6800.0, ans=0.18125000000000002 2023-11-18 02:02:02,230 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=6.032e-01 2023-11-18 02:02:05,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=6866.666666666667, ans=0.6596666666666666 2023-11-18 02:02:09,563 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.48 vs. limit=12.65 2023-11-18 02:02:11,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=6866.666666666667, ans=0.303 2023-11-18 02:02:18,283 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=12.7 2023-11-18 02:02:24,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=6933.333333333333, ans=0.175 2023-11-18 02:02:26,194 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1050, loss[loss=0.2692, simple_loss=0.2129, pruned_loss=0.1662, audio_tagging_loss=0.01708, over 14764.00 frames. ], tot_loss[loss=0.3348, simple_loss=0.2587, pruned_loss=0.2421, audio_tagging_loss=0.01714, over 3016536.11 frames. ], batch size: 55, lr: 4.48e-02, grad_scale: 8.0 2023-11-18 02:02:26,607 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=12.75 2023-11-18 02:02:27,967 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.82 vs. limit=10.125 2023-11-18 02:02:42,007 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.18 vs. limit=12.8 2023-11-18 02:02:56,620 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.38 vs. limit=12.85 2023-11-18 02:02:59,648 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.01 vs. limit=6.783333333333333 2023-11-18 02:03:20,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=7266.666666666667, ans=0.22733333333333333 2023-11-18 02:03:20,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=7266.666666666667, ans=0.159375 2023-11-18 02:03:25,432 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1100, loss[loss=0.2956, simple_loss=0.2414, pruned_loss=0.1843, audio_tagging_loss=0.01083, over 15786.00 frames. ], tot_loss[loss=0.3222, simple_loss=0.2504, pruned_loss=0.2263, audio_tagging_loss=0.01682, over 3023895.47 frames. ], batch size: 60, lr: 4.48e-02, grad_scale: 8.0 2023-11-18 02:03:28,770 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 1.098e+02 1.778e+02 2.963e+02 6.822e+02, threshold=3.557e+02, percent-clipped=25.0 2023-11-18 02:03:28,833 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:03:31,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=7333.333333333333, ans=0.036111111111111115 2023-11-18 02:03:32,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=7333.333333333333, ans=0.15625 2023-11-18 02:03:37,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=7400.0, ans=0.641 2023-11-18 02:03:53,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=7466.666666666667, ans=0.312 2023-11-18 02:04:00,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=7533.333333333333, ans=0.05291666666666667 2023-11-18 02:04:11,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=7600.0, ans=0.0 2023-11-18 02:04:22,640 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1150, loss[loss=0.2481, simple_loss=0.2031, pruned_loss=0.1412, audio_tagging_loss=0.01618, over 14560.00 frames. ], tot_loss[loss=0.3085, simple_loss=0.2413, pruned_loss=0.2106, audio_tagging_loss=0.01657, over 3022285.67 frames. ], batch size: 55, lr: 4.47e-02, grad_scale: 8.0 2023-11-18 02:04:36,606 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=13.3 2023-11-18 02:04:45,650 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:04:51,406 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.97 vs. limit=6.95 2023-11-18 02:05:14,597 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.46 vs. limit=8.966666666666667 2023-11-18 02:05:20,093 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1200, loss[loss=0.2303, simple_loss=0.1827, pruned_loss=0.1286, audio_tagging_loss=0.02008, over 16333.00 frames. ], tot_loss[loss=0.2989, simple_loss=0.2355, pruned_loss=0.1984, audio_tagging_loss=0.01639, over 3029239.66 frames. ], batch size: 62, lr: 4.47e-02, grad_scale: 16.0 2023-11-18 02:05:23,350 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 1.072e+02 1.842e+02 2.807e+02 8.662e+02, threshold=3.683e+02, percent-clipped=14.0 2023-11-18 02:06:01,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=8200.0, ans=0.218 2023-11-18 02:06:05,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=8266.666666666666, ans=0.03222222222222222 2023-11-18 02:06:06,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=8266.666666666666, ans=0.21733333333333332 2023-11-18 02:06:13,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=8266.666666666666, ans=0.125 2023-11-18 02:06:14,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=8266.666666666666, ans=0.125 2023-11-18 02:06:14,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=8266.666666666666, ans=0.125 2023-11-18 02:06:17,061 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1250, loss[loss=0.2203, simple_loss=0.1887, pruned_loss=0.1136, audio_tagging_loss=0.0153, over 15400.00 frames. ], tot_loss[loss=0.2903, simple_loss=0.2308, pruned_loss=0.1872, audio_tagging_loss=0.01602, over 3037484.27 frames. ], batch size: 59, lr: 4.47e-02, grad_scale: 16.0 2023-11-18 02:06:23,090 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=13.90 vs. limit=13.75 2023-11-18 02:06:30,568 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.54 vs. limit=13.8 2023-11-18 02:06:38,186 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=10.675 2023-11-18 02:06:53,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=8533.333333333334, ans=0.125 2023-11-18 02:07:13,969 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1300, loss[loss=0.2136, simple_loss=0.1766, pruned_loss=0.1158, audio_tagging_loss=0.01511, over 14621.00 frames. ], tot_loss[loss=0.281, simple_loss=0.225, pruned_loss=0.1766, audio_tagging_loss=0.01582, over 3041946.51 frames. ], batch size: 58, lr: 4.47e-02, grad_scale: 16.0 2023-11-18 02:07:17,223 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.700e+01 1.001e+02 1.539e+02 2.707e+02 8.460e+02, threshold=3.079e+02, percent-clipped=10.0 2023-11-18 02:07:17,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=8666.666666666666, ans=0.32999999999999996 2023-11-18 02:07:21,158 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.47 vs. limit=7.166666666666666 2023-11-18 02:07:24,308 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=10.775 2023-11-18 02:07:33,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=8733.333333333334, ans=0.21266666666666667 2023-11-18 02:07:35,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=8800.0, ans=0.125 2023-11-18 02:07:36,266 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.42 vs. limit=10.8 2023-11-18 02:07:49,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=8866.666666666666, ans=0.125 2023-11-18 02:08:07,598 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.26 vs. limit=9.466666666666667 2023-11-18 02:08:10,217 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1350, loss[loss=0.2246, simple_loss=0.1822, pruned_loss=0.1188, audio_tagging_loss=0.01976, over 15235.00 frames. ], tot_loss[loss=0.2715, simple_loss=0.2189, pruned_loss=0.1663, audio_tagging_loss=0.01573, over 3049185.63 frames. ], batch size: 55, lr: 4.46e-02, grad_scale: 16.0 2023-11-18 02:08:10,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=9000.0, ans=0.02916666666666667 2023-11-18 02:08:12,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=9000.0, ans=0.125 2023-11-18 02:08:17,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=9000.0, ans=0.21000000000000002 2023-11-18 02:08:20,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=9000.0, ans=0.21000000000000002 2023-11-18 02:08:37,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=9133.333333333334, ans=0.125 2023-11-18 02:08:38,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=9133.333333333334, ans=0.20866666666666667 2023-11-18 02:08:52,403 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=10.95 2023-11-18 02:08:52,803 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:08:56,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=9266.666666666666, ans=0.125 2023-11-18 02:09:04,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=9266.666666666666, ans=0.04949747468305833 2023-11-18 02:09:09,603 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1400, loss[loss=0.1632, simple_loss=0.1349, pruned_loss=0.07796, audio_tagging_loss=0.0186, over 14731.00 frames. ], tot_loss[loss=0.2636, simple_loss=0.214, pruned_loss=0.1576, audio_tagging_loss=0.0156, over 3053062.13 frames. ], batch size: 55, lr: 4.46e-02, grad_scale: 16.0 2023-11-18 02:09:12,848 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 1.322e+02 1.809e+02 2.689e+02 4.159e+02, threshold=3.617e+02, percent-clipped=14.0 2023-11-18 02:09:30,021 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.22 vs. limit=9.7 2023-11-18 02:09:42,447 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.19 vs. limit=14.65 2023-11-18 02:09:59,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=9600.0, ans=0.20400000000000001 2023-11-18 02:09:59,764 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.51 vs. limit=11.1 2023-11-18 02:10:05,814 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1450, loss[loss=0.2411, simple_loss=0.2024, pruned_loss=0.129, audio_tagging_loss=0.01497, over 13618.00 frames. ], tot_loss[loss=0.2565, simple_loss=0.2098, pruned_loss=0.1498, audio_tagging_loss=0.01549, over 3054788.00 frames. ], batch size: 53, lr: 4.46e-02, grad_scale: 16.0 2023-11-18 02:10:12,135 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.23 vs. limit=5.0 2023-11-18 02:10:20,536 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=14.8 2023-11-18 02:10:30,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=9800.0, ans=0.125 2023-11-18 02:10:42,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=9866.666666666666, ans=7.466666666666667 2023-11-18 02:10:42,902 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.26 vs. limit=11.2 2023-11-18 02:10:45,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=9866.666666666666, ans=0.02555555555555556 2023-11-18 02:10:49,263 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=11.2 2023-11-18 02:10:53,708 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=7.973333333333334 2023-11-18 02:11:01,695 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1500, loss[loss=0.232, simple_loss=0.1848, pruned_loss=0.1235, audio_tagging_loss=0.02083, over 14453.00 frames. ], tot_loss[loss=0.2496, simple_loss=0.2051, pruned_loss=0.1429, audio_tagging_loss=0.01555, over 3046736.11 frames. ], batch size: 53, lr: 4.46e-02, grad_scale: 16.0 2023-11-18 02:11:04,892 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.893e+01 1.138e+02 1.532e+02 2.102e+02 5.614e+02, threshold=3.064e+02, percent-clipped=6.0 2023-11-18 02:11:09,975 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.45 vs. limit=10.0 2023-11-18 02:11:20,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=10066.666666666666, ans=0.0 2023-11-18 02:11:24,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=10133.333333333334, ans=0.0 2023-11-18 02:11:29,092 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:11:31,747 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.85 vs. limit=7.533333333333333 2023-11-18 02:11:36,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=10200.0, ans=0.02416666666666667 2023-11-18 02:11:48,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=10266.666666666666, ans=0.125 2023-11-18 02:11:59,189 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1550, loss[loss=0.2627, simple_loss=0.2337, pruned_loss=0.1336, audio_tagging_loss=0.01303, over 14977.00 frames. ], tot_loss[loss=0.2455, simple_loss=0.2031, pruned_loss=0.1377, audio_tagging_loss=0.01565, over 3050732.58 frames. ], batch size: 55, lr: 4.45e-02, grad_scale: 16.0 2023-11-18 02:12:06,837 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.39 vs. limit=4.55 2023-11-18 02:12:31,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=10533.333333333334, ans=0.125 2023-11-18 02:12:34,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=10533.333333333334, ans=0.022777777777777775 2023-11-18 02:12:37,277 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.04 vs. limit=10.266666666666667 2023-11-18 02:12:38,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10533.333333333334, ans=0.19466666666666665 2023-11-18 02:12:56,142 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1600, loss[loss=0.2366, simple_loss=0.2072, pruned_loss=0.1228, audio_tagging_loss=0.01158, over 15386.00 frames. ], tot_loss[loss=0.2398, simple_loss=0.1995, pruned_loss=0.132, audio_tagging_loss=0.01566, over 3048387.57 frames. ], batch size: 57, lr: 4.45e-02, grad_scale: 32.0 2023-11-18 02:12:59,355 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.594e+01 1.048e+02 1.443e+02 2.212e+02 4.225e+02, threshold=2.886e+02, percent-clipped=6.0 2023-11-18 02:13:06,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=10733.333333333334, ans=0.008536231884057971 2023-11-18 02:13:17,380 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=15.86 vs. limit=15.55 2023-11-18 02:13:43,837 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.98 vs. limit=11.6 2023-11-18 02:13:48,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=10933.333333333334, ans=0.125 2023-11-18 02:13:51,853 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1650, loss[loss=0.1804, simple_loss=0.1598, pruned_loss=0.08959, audio_tagging_loss=0.01104, over 15858.00 frames. ], tot_loss[loss=0.2347, simple_loss=0.1962, pruned_loss=0.1272, audio_tagging_loss=0.0156, over 3044991.77 frames. ], batch size: 60, lr: 4.45e-02, grad_scale: 16.0 2023-11-18 02:13:59,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=11000.0, ans=0.125 2023-11-18 02:14:14,194 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.83 vs. limit=10.566666666666666 2023-11-18 02:14:14,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=11133.333333333334, ans=0.125 2023-11-18 02:14:39,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=11266.666666666666, ans=0.008420289855072463 2023-11-18 02:14:48,703 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1700, loss[loss=0.2303, simple_loss=0.1963, pruned_loss=0.1181, audio_tagging_loss=0.01522, over 14984.00 frames. ], tot_loss[loss=0.2309, simple_loss=0.1944, pruned_loss=0.1232, audio_tagging_loss=0.0155, over 3051632.42 frames. ], batch size: 56, lr: 4.44e-02, grad_scale: 16.0 2023-11-18 02:14:53,000 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.938e+01 1.231e+02 1.950e+02 2.730e+02 7.528e+02, threshold=3.901e+02, percent-clipped=22.0 2023-11-18 02:14:54,503 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=11.75 2023-11-18 02:15:11,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=11466.666666666666, ans=0.4986666666666667 2023-11-18 02:15:18,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=11466.666666666666, ans=0.018888888888888893 2023-11-18 02:15:31,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11533.333333333334, ans=0.18466666666666665 2023-11-18 02:15:31,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=11533.333333333334, ans=0.49633333333333335 2023-11-18 02:15:44,875 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1750, loss[loss=0.2151, simple_loss=0.1925, pruned_loss=0.1036, audio_tagging_loss=0.01484, over 16491.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.1921, pruned_loss=0.1195, audio_tagging_loss=0.01522, over 3048809.74 frames. ], batch size: 60, lr: 4.44e-02, grad_scale: 16.0 2023-11-18 02:15:52,208 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.89 vs. limit=11.875 2023-11-18 02:16:02,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=11733.333333333334, ans=0.0 2023-11-18 02:16:02,729 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.08 vs. limit=7.933333333333334 2023-11-18 02:16:06,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=11800.0, ans=0.125 2023-11-18 02:16:12,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=11800.0, ans=0.125 2023-11-18 02:16:33,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=11933.333333333334, ans=0.125 2023-11-18 02:16:41,143 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1800, loss[loss=0.2126, simple_loss=0.1828, pruned_loss=0.1076, audio_tagging_loss=0.01419, over 15611.00 frames. ], tot_loss[loss=0.2222, simple_loss=0.1893, pruned_loss=0.1156, audio_tagging_loss=0.01513, over 3046691.32 frames. ], batch size: 60, lr: 4.44e-02, grad_scale: 16.0 2023-11-18 02:16:44,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=12000.0, ans=0.00826086956521739 2023-11-18 02:16:45,460 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.752e+01 1.122e+02 1.379e+02 2.095e+02 9.381e+02, threshold=2.759e+02, percent-clipped=5.0 2023-11-18 02:16:57,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=12066.666666666666, ans=0.01638888888888889 2023-11-18 02:17:07,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=12133.333333333334, ans=0.125 2023-11-18 02:17:12,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=12133.333333333334, ans=0.125 2023-11-18 02:17:15,554 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.67 vs. limit=16.65 2023-11-18 02:17:23,215 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.95 vs. limit=8.05 2023-11-18 02:17:27,644 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.56 vs. limit=4.84 2023-11-18 02:17:37,665 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1850, loss[loss=0.2565, simple_loss=0.2242, pruned_loss=0.1318, audio_tagging_loss=0.01306, over 15552.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.1871, pruned_loss=0.1124, audio_tagging_loss=0.01498, over 3042206.37 frames. ], batch size: 57, lr: 4.43e-02, grad_scale: 16.0 2023-11-18 02:18:06,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=12466.666666666666, ans=0.125 2023-11-18 02:18:09,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=12533.333333333334, ans=0.17466666666666666 2023-11-18 02:18:14,336 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=12.2 2023-11-18 02:18:27,608 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=12.225 2023-11-18 02:18:29,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=12600.0, ans=0.014166666666666668 2023-11-18 02:18:33,425 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1900, loss[loss=0.175, simple_loss=0.1571, pruned_loss=0.08275, audio_tagging_loss=0.01347, over 13961.00 frames. ], tot_loss[loss=0.2142, simple_loss=0.1846, pruned_loss=0.109, audio_tagging_loss=0.0148, over 3041398.71 frames. ], batch size: 54, lr: 4.43e-02, grad_scale: 16.0 2023-11-18 02:18:37,661 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.647e+01 1.124e+02 1.503e+02 2.193e+02 6.798e+02, threshold=3.006e+02, percent-clipped=14.0 2023-11-18 02:18:42,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=12666.666666666666, ans=0.125 2023-11-18 02:18:45,187 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.10 vs. limit=12.275 2023-11-18 02:19:01,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=12800.0, ans=0.125 2023-11-18 02:19:15,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=12866.666666666666, ans=0.4496666666666667 2023-11-18 02:19:29,592 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 1950, loss[loss=0.1805, simple_loss=0.1521, pruned_loss=0.08919, audio_tagging_loss=0.01534, over 14171.00 frames. ], tot_loss[loss=0.2104, simple_loss=0.1825, pruned_loss=0.1058, audio_tagging_loss=0.0148, over 3042284.43 frames. ], batch size: 56, lr: 4.43e-02, grad_scale: 16.0 2023-11-18 02:19:37,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=13000.0, ans=0.445 2023-11-18 02:19:57,942 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=13133.333333333334, ans=0.125 2023-11-18 02:20:02,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=13200.0, ans=0.125 2023-11-18 02:20:10,490 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=12.45 2023-11-18 02:20:15,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=13266.666666666666, ans=0.11733333333333332 2023-11-18 02:20:26,153 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2000, loss[loss=0.2104, simple_loss=0.181, pruned_loss=0.1003, audio_tagging_loss=0.01958, over 14252.00 frames. ], tot_loss[loss=0.2067, simple_loss=0.1797, pruned_loss=0.1032, audio_tagging_loss=0.01482, over 3044028.43 frames. ], batch size: 55, lr: 4.42e-02, grad_scale: 32.0 2023-11-18 02:20:30,373 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 1.116e+02 1.535e+02 2.034e+02 3.808e+02, threshold=3.071e+02, percent-clipped=5.0 2023-11-18 02:20:37,329 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.19 vs. limit=17.55 2023-11-18 02:20:52,034 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=12.55 2023-11-18 02:20:52,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=13466.666666666666, ans=0.007942028985507247 2023-11-18 02:20:55,367 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.59 vs. limit=17.6 2023-11-18 02:20:56,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=13466.666666666666, ans=0.4286666666666667 2023-11-18 02:21:00,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=13533.333333333334, ans=0.010277777777777775 2023-11-18 02:21:21,363 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2050, loss[loss=0.132, simple_loss=0.1173, pruned_loss=0.05841, audio_tagging_loss=0.01494, over 15211.00 frames. ], tot_loss[loss=0.205, simple_loss=0.1793, pruned_loss=0.1016, audio_tagging_loss=0.01468, over 3046763.42 frames. ], batch size: 58, lr: 4.42e-02, grad_scale: 32.0 2023-11-18 02:21:28,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=13666.666666666666, ans=0.42166666666666675 2023-11-18 02:21:33,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=13733.333333333334, ans=0.125 2023-11-18 02:22:01,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=13866.666666666666, ans=0.125 2023-11-18 02:22:07,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=13933.333333333334, ans=0.125 2023-11-18 02:22:17,376 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2100, loss[loss=0.1877, simple_loss=0.1718, pruned_loss=0.08752, audio_tagging_loss=0.01424, over 16043.00 frames. ], tot_loss[loss=0.2033, simple_loss=0.1788, pruned_loss=0.09997, audio_tagging_loss=0.01461, over 3046414.25 frames. ], batch size: 60, lr: 4.42e-02, grad_scale: 32.0 2023-11-18 02:22:21,606 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.391e+01 1.118e+02 1.317e+02 1.653e+02 4.106e+02, threshold=2.634e+02, percent-clipped=4.0 2023-11-18 02:22:49,011 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.71 vs. limit=8.533333333333333 2023-11-18 02:22:49,938 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.03 vs. limit=18.15 2023-11-18 02:22:57,665 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.41 vs. limit=8.55 2023-11-18 02:23:11,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=14266.666666666666, ans=0.025 2023-11-18 02:23:13,716 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2150, loss[loss=0.2174, simple_loss=0.1994, pruned_loss=0.1036, audio_tagging_loss=0.01407, over 15218.00 frames. ], tot_loss[loss=0.2009, simple_loss=0.1777, pruned_loss=0.09806, audio_tagging_loss=0.01455, over 3043837.98 frames. ], batch size: 56, lr: 4.41e-02, grad_scale: 32.0 2023-11-18 02:23:15,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=14333.333333333334, ans=0.3983333333333333 2023-11-18 02:23:18,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=14333.333333333334, ans=0.125 2023-11-18 02:23:21,877 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.98 vs. limit=12.875 2023-11-18 02:23:26,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=14400.0, ans=0.156 2023-11-18 02:23:34,593 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.76 vs. limit=18.35 2023-11-18 02:23:47,238 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.88 vs. limit=18.4 2023-11-18 02:23:47,736 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:23:49,011 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=12.95 2023-11-18 02:23:53,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=14533.333333333334, ans=0.15466666666666667 2023-11-18 02:23:53,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=14533.333333333334, ans=0.125 2023-11-18 02:23:54,096 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.08 vs. limit=5.18 2023-11-18 02:23:55,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=14533.333333333334, ans=0.125 2023-11-18 02:24:05,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=14600.0, ans=0.125 2023-11-18 02:24:07,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=14600.0, ans=0.154 2023-11-18 02:24:10,328 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2200, loss[loss=0.3044, simple_loss=0.277, pruned_loss=0.1545, audio_tagging_loss=0.01142, over 15594.00 frames. ], tot_loss[loss=0.1998, simple_loss=0.1776, pruned_loss=0.0969, audio_tagging_loss=0.01455, over 3044812.71 frames. ], batch size: 55, lr: 4.41e-02, grad_scale: 32.0 2023-11-18 02:24:14,664 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.017e+01 1.117e+02 1.377e+02 2.009e+02 5.109e+02, threshold=2.755e+02, percent-clipped=7.0 2023-11-18 02:24:21,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=14733.333333333334, ans=0.125 2023-11-18 02:24:27,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=14733.333333333334, ans=0.125 2023-11-18 02:25:04,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=14933.333333333334, ans=0.125 2023-11-18 02:25:07,624 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2250, loss[loss=0.2028, simple_loss=0.19, pruned_loss=0.09401, audio_tagging_loss=0.01375, over 15534.00 frames. ], tot_loss[loss=0.1991, simple_loss=0.1778, pruned_loss=0.09607, audio_tagging_loss=0.01448, over 3048694.88 frames. ], batch size: 56, lr: 4.40e-02, grad_scale: 32.0 2023-11-18 02:25:11,629 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.52 vs. limit=13.125 2023-11-18 02:25:30,781 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.81 vs. limit=13.175 2023-11-18 02:26:04,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=15333.333333333334, ans=0.14666666666666667 2023-11-18 02:26:05,404 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2300, loss[loss=0.2029, simple_loss=0.1847, pruned_loss=0.0974, audio_tagging_loss=0.01317, over 15408.00 frames. ], tot_loss[loss=0.1957, simple_loss=0.1756, pruned_loss=0.09373, audio_tagging_loss=0.01443, over 3047557.20 frames. ], batch size: 58, lr: 4.40e-02, grad_scale: 32.0 2023-11-18 02:26:09,709 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 1.107e+02 1.429e+02 1.999e+02 3.636e+02, threshold=2.858e+02, percent-clipped=5.0 2023-11-18 02:26:16,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=15400.0, ans=0.14600000000000002 2023-11-18 02:26:19,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=15400.0, ans=0.125 2023-11-18 02:26:56,035 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:27:01,480 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2350, loss[loss=0.197, simple_loss=0.1675, pruned_loss=0.09443, audio_tagging_loss=0.01889, over 14759.00 frames. ], tot_loss[loss=0.195, simple_loss=0.1757, pruned_loss=0.09275, audio_tagging_loss=0.01453, over 3050782.38 frames. ], batch size: 58, lr: 4.40e-02, grad_scale: 32.0 2023-11-18 02:27:06,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=15666.666666666666, ans=19.25 2023-11-18 02:27:18,487 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.24 vs. limit=13.4 2023-11-18 02:27:43,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=15866.666666666666, ans=0.07 2023-11-18 02:27:57,898 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2400, loss[loss=0.1412, simple_loss=0.1284, pruned_loss=0.05922, audio_tagging_loss=0.01779, over 15977.00 frames. ], tot_loss[loss=0.1938, simple_loss=0.1752, pruned_loss=0.09176, audio_tagging_loss=0.01463, over 3053251.76 frames. ], batch size: 61, lr: 4.39e-02, grad_scale: 32.0 2023-11-18 02:28:02,145 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.230e+01 1.240e+02 1.395e+02 1.790e+02 3.155e+02, threshold=2.791e+02, percent-clipped=5.0 2023-11-18 02:28:44,865 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=13.6 2023-11-18 02:28:46,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=16266.666666666666, ans=0.125 2023-11-18 02:28:54,736 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2450, loss[loss=0.215, simple_loss=0.1979, pruned_loss=0.1008, audio_tagging_loss=0.0153, over 14344.00 frames. ], tot_loss[loss=0.1936, simple_loss=0.1758, pruned_loss=0.09118, audio_tagging_loss=0.01468, over 3046789.15 frames. ], batch size: 54, lr: 4.39e-02, grad_scale: 32.0 2023-11-18 02:29:17,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16466.666666666668, ans=0.1353333333333333 2023-11-18 02:29:25,300 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.80 vs. limit=5.470000000000001 2023-11-18 02:29:32,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=16533.333333333332, ans=0.0 2023-11-18 02:29:34,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=16533.333333333332, ans=0.125 2023-11-18 02:29:49,869 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2500, loss[loss=0.1617, simple_loss=0.1542, pruned_loss=0.07047, audio_tagging_loss=0.01414, over 15919.00 frames. ], tot_loss[loss=0.1901, simple_loss=0.1732, pruned_loss=0.08894, audio_tagging_loss=0.01462, over 3050989.55 frames. ], batch size: 60, lr: 4.38e-02, grad_scale: 32.0 2023-11-18 02:29:54,091 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.242e+01 1.096e+02 1.316e+02 1.723e+02 3.236e+02, threshold=2.632e+02, percent-clipped=4.0 2023-11-18 02:29:57,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=16666.666666666668, ans=0.0 2023-11-18 02:30:00,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=16733.333333333332, ans=0.125 2023-11-18 02:30:13,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=16800.0, ans=0.125 2023-11-18 02:30:18,569 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.92 vs. limit=20.1 2023-11-18 02:30:19,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=16800.0, ans=0.0 2023-11-18 02:30:28,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=16866.666666666668, ans=0.0 2023-11-18 02:30:32,218 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.60 vs. limit=13.433333333333334 2023-11-18 02:30:45,337 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2550, loss[loss=0.1695, simple_loss=0.1583, pruned_loss=0.07386, audio_tagging_loss=0.01655, over 15124.00 frames. ], tot_loss[loss=0.1893, simple_loss=0.1728, pruned_loss=0.08847, audio_tagging_loss=0.01452, over 3053938.67 frames. ], batch size: 57, lr: 4.38e-02, grad_scale: 32.0 2023-11-18 02:31:07,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=17133.333333333332, ans=0.025 2023-11-18 02:31:43,161 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2600, loss[loss=0.215, simple_loss=0.2101, pruned_loss=0.09922, audio_tagging_loss=0.01077, over 15591.00 frames. ], tot_loss[loss=0.1863, simple_loss=0.1704, pruned_loss=0.08685, audio_tagging_loss=0.01431, over 3053816.25 frames. ], batch size: 55, lr: 4.37e-02, grad_scale: 32.0 2023-11-18 02:31:47,407 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.043e+01 1.250e+02 1.620e+02 2.059e+02 4.953e+02, threshold=3.240e+02, percent-clipped=12.0 2023-11-18 02:31:52,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=17333.333333333332, ans=0.125 2023-11-18 02:32:02,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=17400.0, ans=0.125 2023-11-18 02:32:06,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=17466.666666666668, ans=0.9246666666666666 2023-11-18 02:32:11,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=17466.666666666668, ans=0.12533333333333332 2023-11-18 02:32:13,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=17466.666666666668, ans=0.05 2023-11-18 02:32:17,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=17533.333333333332, ans=0.125 2023-11-18 02:32:21,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=17533.333333333332, ans=0.125 2023-11-18 02:32:31,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=17600.0, ans=0.0 2023-11-18 02:32:39,189 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2650, loss[loss=0.1481, simple_loss=0.1429, pruned_loss=0.0664, audio_tagging_loss=0.01024, over 16291.00 frames. ], tot_loss[loss=0.186, simple_loss=0.171, pruned_loss=0.08657, audio_tagging_loss=0.01404, over 3054341.45 frames. ], batch size: 62, lr: 4.37e-02, grad_scale: 32.0 2023-11-18 02:32:40,850 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.65 vs. limit=14.125 2023-11-18 02:32:51,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=17733.333333333332, ans=0.007014492753623189 2023-11-18 02:32:55,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=17733.333333333332, ans=0.12266666666666667 2023-11-18 02:33:17,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=17866.666666666668, ans=14.2 2023-11-18 02:33:34,665 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2700, loss[loss=0.2251, simple_loss=0.216, pruned_loss=0.1049, audio_tagging_loss=0.0122, over 15540.00 frames. ], tot_loss[loss=0.1843, simple_loss=0.17, pruned_loss=0.08536, audio_tagging_loss=0.01397, over 3054859.01 frames. ], batch size: 56, lr: 4.36e-02, grad_scale: 32.0 2023-11-18 02:33:38,903 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.040e+01 1.101e+02 1.289e+02 1.771e+02 2.746e+02, threshold=2.578e+02, percent-clipped=0.0 2023-11-18 02:33:42,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=18000.0, ans=0.125 2023-11-18 02:33:52,430 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.21 vs. limit=21.05 2023-11-18 02:33:52,598 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.09 vs. limit=14.275 2023-11-18 02:34:06,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=18133.333333333332, ans=0.0 2023-11-18 02:34:31,185 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2750, loss[loss=0.1353, simple_loss=0.1224, pruned_loss=0.05933, audio_tagging_loss=0.01481, over 14213.00 frames. ], tot_loss[loss=0.1822, simple_loss=0.1688, pruned_loss=0.08401, audio_tagging_loss=0.01385, over 3056191.23 frames. ], batch size: 53, lr: 4.36e-02, grad_scale: 32.0 2023-11-18 02:34:44,106 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.09 vs. limit=14.4 2023-11-18 02:34:50,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=18400.0, ans=0.0 2023-11-18 02:34:56,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=18466.666666666668, ans=0.125 2023-11-18 02:34:58,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=18466.666666666668, ans=0.125 2023-11-18 02:35:00,254 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.95 vs. limit=14.425 2023-11-18 02:35:03,608 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.03 vs. limit=14.45 2023-11-18 02:35:06,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=18533.333333333332, ans=0.2513333333333334 2023-11-18 02:35:20,309 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:35:27,839 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2800, loss[loss=0.1444, simple_loss=0.1291, pruned_loss=0.06703, audio_tagging_loss=0.01284, over 15588.00 frames. ], tot_loss[loss=0.1813, simple_loss=0.1681, pruned_loss=0.08337, audio_tagging_loss=0.01393, over 3055655.89 frames. ], batch size: 61, lr: 4.36e-02, grad_scale: 32.0 2023-11-18 02:35:32,076 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.686e+01 1.129e+02 1.327e+02 1.684e+02 3.032e+02, threshold=2.655e+02, percent-clipped=2.0 2023-11-18 02:35:33,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=18666.666666666668, ans=0.2466666666666667 2023-11-18 02:35:38,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=18733.333333333332, ans=0.2443333333333334 2023-11-18 02:35:56,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=18800.0, ans=0.125 2023-11-18 02:35:57,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=18800.0, ans=0.242 2023-11-18 02:36:02,092 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=3.05 vs. limit=14.575 2023-11-18 02:36:05,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=18866.666666666668, ans=0.125 2023-11-18 02:36:23,507 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2850, loss[loss=0.1354, simple_loss=0.1311, pruned_loss=0.05469, audio_tagging_loss=0.01514, over 14767.00 frames. ], tot_loss[loss=0.1807, simple_loss=0.168, pruned_loss=0.08282, audio_tagging_loss=0.01392, over 3055953.60 frames. ], batch size: 55, lr: 4.35e-02, grad_scale: 32.0 2023-11-18 02:36:24,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=19000.0, ans=0.125 2023-11-18 02:36:43,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=19066.666666666668, ans=0.125 2023-11-18 02:36:50,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=19133.333333333332, ans=0.0 2023-11-18 02:36:50,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=19133.333333333332, ans=0.10866666666666669 2023-11-18 02:36:52,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=19133.333333333332, ans=0.10866666666666669 2023-11-18 02:36:58,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=19200.0, ans=0.0066956521739130435 2023-11-18 02:37:00,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=19200.0, ans=0.125 2023-11-18 02:37:12,641 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=20.40 vs. limit=21.95 2023-11-18 02:37:19,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=19333.333333333332, ans=0.006666666666666667 2023-11-18 02:37:21,201 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2900, loss[loss=0.1905, simple_loss=0.1826, pruned_loss=0.08626, audio_tagging_loss=0.01293, over 14612.00 frames. ], tot_loss[loss=0.1806, simple_loss=0.1687, pruned_loss=0.08245, audio_tagging_loss=0.01379, over 3061049.94 frames. ], batch size: 54, lr: 4.35e-02, grad_scale: 32.0 2023-11-18 02:37:24,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=19333.333333333332, ans=0.07 2023-11-18 02:37:25,010 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.20 vs. limit=14.75 2023-11-18 02:37:25,459 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.965e+01 1.019e+02 1.241e+02 1.587e+02 2.643e+02, threshold=2.482e+02, percent-clipped=0.0 2023-11-18 02:37:45,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=19466.666666666668, ans=0.10533333333333333 2023-11-18 02:37:46,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19466.666666666668, ans=0.10533333333333333 2023-11-18 02:37:50,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=19466.666666666668, ans=0.0 2023-11-18 02:37:55,670 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.38 vs. limit=14.825 2023-11-18 02:38:07,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=19600.0, ans=9.9 2023-11-18 02:38:17,189 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 2950, loss[loss=0.1527, simple_loss=0.1372, pruned_loss=0.06729, audio_tagging_loss=0.01681, over 15273.00 frames. ], tot_loss[loss=0.1805, simple_loss=0.1691, pruned_loss=0.08211, audio_tagging_loss=0.01389, over 3054956.51 frames. ], batch size: 59, lr: 4.34e-02, grad_scale: 32.0 2023-11-18 02:38:21,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=19666.666666666668, ans=0.0 2023-11-18 02:38:28,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=19733.333333333332, ans=0.10266666666666668 2023-11-18 02:38:37,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=19733.333333333332, ans=0.125 2023-11-18 02:38:39,802 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.52 vs. limit=14.925 2023-11-18 02:39:01,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=19933.333333333332, ans=0.125 2023-11-18 02:39:12,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=20000.0, ans=0.125 2023-11-18 02:39:13,991 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3000, loss[loss=0.1337, simple_loss=0.1229, pruned_loss=0.05807, audio_tagging_loss=0.01422, over 15128.00 frames. ], tot_loss[loss=0.1791, simple_loss=0.1676, pruned_loss=0.08128, audio_tagging_loss=0.01399, over 3046006.97 frames. ], batch size: 57, lr: 4.34e-02, grad_scale: 32.0 2023-11-18 02:39:13,992 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 02:39:44,310 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.6109, 4.2659, 4.2427, 4.2578], device='cuda:2') 2023-11-18 02:39:46,059 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.2203, 5.2769, 5.0520, 5.2175], device='cuda:2') 2023-11-18 02:39:47,916 INFO [train_asr.py:1147] (2/4) Epoch 1, validation: loss=0.1123, simple_loss=0.08353, pruned_loss=0.02777, audio_tagging_loss=0.04274, over 4681554.00 frames. 2023-11-18 02:39:47,916 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 02:39:52,080 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.173e+01 1.112e+02 1.246e+02 1.564e+02 3.954e+02, threshold=2.493e+02, percent-clipped=6.0 2023-11-18 02:39:58,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=20066.666666666668, ans=0.125 2023-11-18 02:40:33,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=20266.666666666668, ans=0.5 2023-11-18 02:40:40,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=20266.666666666668, ans=0.0 2023-11-18 02:40:41,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=20266.666666666668, ans=0.0 2023-11-18 02:40:43,608 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3050, loss[loss=0.1508, simple_loss=0.1422, pruned_loss=0.06293, audio_tagging_loss=0.01677, over 14067.00 frames. ], tot_loss[loss=0.1764, simple_loss=0.1655, pruned_loss=0.07948, audio_tagging_loss=0.01414, over 3038515.28 frames. ], batch size: 54, lr: 4.33e-02, grad_scale: 32.0 2023-11-18 02:40:43,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=20333.333333333332, ans=0.1 2023-11-18 02:40:44,161 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.19 vs. limit=22.5 2023-11-18 02:41:07,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=20466.666666666668, ans=0.125 2023-11-18 02:41:12,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=20466.666666666668, ans=0.1 2023-11-18 02:41:13,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=20466.666666666668, ans=0.125 2023-11-18 02:41:13,423 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2023-11-18 02:41:14,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=20466.666666666668, ans=0.125 2023-11-18 02:41:17,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=20533.333333333332, ans=0.125 2023-11-18 02:41:18,273 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:41:30,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=20600.0, ans=0.1 2023-11-18 02:41:35,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=20600.0, ans=0.2 2023-11-18 02:41:40,406 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3100, loss[loss=0.1824, simple_loss=0.1676, pruned_loss=0.08194, audio_tagging_loss=0.01662, over 16479.00 frames. ], tot_loss[loss=0.1785, simple_loss=0.1679, pruned_loss=0.08035, audio_tagging_loss=0.01417, over 3042882.97 frames. ], batch size: 59, lr: 4.33e-02, grad_scale: 32.0 2023-11-18 02:41:44,733 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 1.051e+02 1.308e+02 1.673e+02 2.696e+02, threshold=2.616e+02, percent-clipped=3.0 2023-11-18 02:42:00,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=20733.333333333332, ans=0.125 2023-11-18 02:42:02,006 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.57 vs. limit=15.0 2023-11-18 02:42:02,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=20800.0, ans=0.05 2023-11-18 02:42:04,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=20800.0, ans=0.1 2023-11-18 02:42:10,676 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2023-11-18 02:42:12,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=20800.0, ans=0.125 2023-11-18 02:42:22,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=20866.666666666668, ans=0.006333333333333333 2023-11-18 02:42:36,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=21000.0, ans=0.0 2023-11-18 02:42:37,776 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3150, loss[loss=0.1601, simple_loss=0.1578, pruned_loss=0.06933, audio_tagging_loss=0.01188, over 15072.00 frames. ], tot_loss[loss=0.1749, simple_loss=0.1648, pruned_loss=0.07838, audio_tagging_loss=0.01415, over 3044188.14 frames. ], batch size: 57, lr: 4.32e-02, grad_scale: 32.0 2023-11-18 02:42:58,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=21133.333333333332, ans=0.125 2023-11-18 02:43:06,343 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:43:06,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=21133.333333333332, ans=0.125 2023-11-18 02:43:07,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=21133.333333333332, ans=0.2 2023-11-18 02:43:08,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=21133.333333333332, ans=0.125 2023-11-18 02:43:34,048 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3200, loss[loss=0.1279, simple_loss=0.1217, pruned_loss=0.05349, audio_tagging_loss=0.01354, over 13755.00 frames. ], tot_loss[loss=0.1756, simple_loss=0.1659, pruned_loss=0.07847, audio_tagging_loss=0.0142, over 3048202.56 frames. ], batch size: 53, lr: 4.32e-02, grad_scale: 32.0 2023-11-18 02:43:38,318 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 1.064e+02 1.244e+02 1.490e+02 2.410e+02, threshold=2.488e+02, percent-clipped=0.0 2023-11-18 02:43:42,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=21333.333333333332, ans=0.125 2023-11-18 02:43:55,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=21466.666666666668, ans=0.0 2023-11-18 02:44:13,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=21533.333333333332, ans=0.2 2023-11-18 02:44:17,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=21533.333333333332, ans=0.125 2023-11-18 02:44:18,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=21600.0, ans=0.0 2023-11-18 02:44:22,266 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.77 vs. limit=15.0 2023-11-18 02:44:30,114 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3250, loss[loss=0.1514, simple_loss=0.1446, pruned_loss=0.06447, audio_tagging_loss=0.01461, over 16911.00 frames. ], tot_loss[loss=0.1744, simple_loss=0.1648, pruned_loss=0.07772, audio_tagging_loss=0.01435, over 3050990.78 frames. ], batch size: 63, lr: 4.31e-02, grad_scale: 32.0 2023-11-18 02:44:44,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=21733.333333333332, ans=0.125 2023-11-18 02:44:50,833 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:44:55,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=21800.0, ans=0.1 2023-11-18 02:45:05,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=21866.666666666668, ans=0.125 2023-11-18 02:45:14,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=21933.333333333332, ans=0.125 2023-11-18 02:45:21,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=21933.333333333332, ans=0.2 2023-11-18 02:45:26,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=22000.0, ans=0.125 2023-11-18 02:45:27,670 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3300, loss[loss=0.1439, simple_loss=0.1341, pruned_loss=0.06096, audio_tagging_loss=0.01594, over 14894.00 frames. ], tot_loss[loss=0.1738, simple_loss=0.1636, pruned_loss=0.0774, audio_tagging_loss=0.01457, over 3044664.30 frames. ], batch size: 56, lr: 4.31e-02, grad_scale: 32.0 2023-11-18 02:45:27,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=22000.0, ans=0.2 2023-11-18 02:45:32,503 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.259e+01 1.069e+02 1.225e+02 1.477e+02 2.736e+02, threshold=2.451e+02, percent-clipped=1.0 2023-11-18 02:45:41,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=22066.666666666668, ans=15.0 2023-11-18 02:45:53,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=22133.333333333332, ans=0.05 2023-11-18 02:45:57,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=22133.333333333332, ans=0.125 2023-11-18 02:46:06,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=22200.0, ans=0.125 2023-11-18 02:46:18,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=22266.666666666668, ans=0.006028985507246377 2023-11-18 02:46:22,133 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.34 vs. limit=10.0 2023-11-18 02:46:24,722 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3350, loss[loss=0.2012, simple_loss=0.193, pruned_loss=0.09196, audio_tagging_loss=0.01269, over 15533.00 frames. ], tot_loss[loss=0.1733, simple_loss=0.1637, pruned_loss=0.07704, audio_tagging_loss=0.01437, over 3045650.32 frames. ], batch size: 57, lr: 4.30e-02, grad_scale: 32.0 2023-11-18 02:46:40,490 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.25 vs. limit=15.0 2023-11-18 02:47:01,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=22533.333333333332, ans=0.1 2023-11-18 02:47:11,648 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.47 vs. limit=6.0 2023-11-18 02:47:17,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=22600.0, ans=0.005956521739130435 2023-11-18 02:47:19,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=22666.666666666668, ans=0.005942028985507246 2023-11-18 02:47:21,310 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3400, loss[loss=0.2167, simple_loss=0.2212, pruned_loss=0.09635, audio_tagging_loss=0.009745, over 16062.00 frames. ], tot_loss[loss=0.1714, simple_loss=0.1624, pruned_loss=0.07608, audio_tagging_loss=0.0141, over 3043422.41 frames. ], batch size: 57, lr: 4.29e-02, grad_scale: 32.0 2023-11-18 02:47:21,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=22666.666666666668, ans=0.0 2023-11-18 02:47:22,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=22666.666666666668, ans=0.1 2023-11-18 02:47:23,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=22666.666666666668, ans=0.005942028985507246 2023-11-18 02:47:25,579 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.820e+01 1.014e+02 1.234e+02 1.515e+02 3.091e+02, threshold=2.469e+02, percent-clipped=0.0 2023-11-18 02:47:51,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=22800.0, ans=0.125 2023-11-18 02:48:18,067 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3450, loss[loss=0.1701, simple_loss=0.1639, pruned_loss=0.07281, audio_tagging_loss=0.01533, over 14471.00 frames. ], tot_loss[loss=0.1713, simple_loss=0.1629, pruned_loss=0.07589, audio_tagging_loss=0.01392, over 3049124.26 frames. ], batch size: 57, lr: 4.29e-02, grad_scale: 32.0 2023-11-18 02:48:18,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=23000.0, ans=0.125 2023-11-18 02:48:21,388 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=15.0 2023-11-18 02:48:23,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=23000.0, ans=0.125 2023-11-18 02:48:24,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=23000.0, ans=0.2 2023-11-18 02:48:28,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=23000.0, ans=0.2 2023-11-18 02:48:36,978 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.17 vs. limit=22.5 2023-11-18 02:48:46,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=23133.333333333332, ans=0.125 2023-11-18 02:48:52,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=23200.0, ans=0.1 2023-11-18 02:49:01,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=23200.0, ans=0.00582608695652174 2023-11-18 02:49:02,014 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.15 vs. limit=15.0 2023-11-18 02:49:08,072 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.06 vs. limit=6.0 2023-11-18 02:49:15,060 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3500, loss[loss=0.1327, simple_loss=0.1324, pruned_loss=0.05237, audio_tagging_loss=0.01412, over 15599.00 frames. ], tot_loss[loss=0.1696, simple_loss=0.1617, pruned_loss=0.07503, audio_tagging_loss=0.01375, over 3052708.95 frames. ], batch size: 58, lr: 4.28e-02, grad_scale: 32.0 2023-11-18 02:49:19,473 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.153e+01 1.129e+02 1.309e+02 1.633e+02 2.948e+02, threshold=2.617e+02, percent-clipped=2.0 2023-11-18 02:49:28,681 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.63 vs. limit=15.0 2023-11-18 02:49:35,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=23466.666666666668, ans=0.125 2023-11-18 02:49:44,228 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:49:52,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=23533.333333333332, ans=0.0 2023-11-18 02:49:55,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=23533.333333333332, ans=0.1 2023-11-18 02:49:58,236 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2023-11-18 02:50:05,974 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.41 vs. limit=15.0 2023-11-18 02:50:10,785 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3550, loss[loss=0.177, simple_loss=0.1664, pruned_loss=0.07563, audio_tagging_loss=0.01819, over 15838.00 frames. ], tot_loss[loss=0.1688, simple_loss=0.1608, pruned_loss=0.07458, audio_tagging_loss=0.01387, over 3046606.23 frames. ], batch size: 57, lr: 4.28e-02, grad_scale: 32.0 2023-11-18 02:50:14,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=23666.666666666668, ans=0.1 2023-11-18 02:50:18,977 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.98 vs. limit=15.0 2023-11-18 02:50:27,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=23733.333333333332, ans=0.0 2023-11-18 02:50:37,358 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2023-11-18 02:50:50,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=23866.666666666668, ans=0.125 2023-11-18 02:51:03,452 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.31 vs. limit=15.0 2023-11-18 02:51:08,224 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3600, loss[loss=0.1635, simple_loss=0.1625, pruned_loss=0.06896, audio_tagging_loss=0.01326, over 15570.00 frames. ], tot_loss[loss=0.1674, simple_loss=0.1594, pruned_loss=0.07379, audio_tagging_loss=0.01394, over 3060885.16 frames. ], batch size: 59, lr: 4.27e-02, grad_scale: 32.0 2023-11-18 02:51:13,841 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.609e+01 1.015e+02 1.156e+02 1.393e+02 2.534e+02, threshold=2.312e+02, percent-clipped=0.0 2023-11-18 02:51:14,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=24000.0, ans=0.125 2023-11-18 02:51:32,345 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.68 vs. limit=15.0 2023-11-18 02:51:44,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=24200.0, ans=0.125 2023-11-18 02:51:47,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=24200.0, ans=0.2 2023-11-18 02:52:03,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=24266.666666666668, ans=0.0 2023-11-18 02:52:05,641 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3650, loss[loss=0.179, simple_loss=0.1737, pruned_loss=0.07748, audio_tagging_loss=0.01473, over 15786.00 frames. ], tot_loss[loss=0.1672, simple_loss=0.1591, pruned_loss=0.07378, audio_tagging_loss=0.01382, over 3064057.86 frames. ], batch size: 58, lr: 4.27e-02, grad_scale: 64.0 2023-11-18 02:52:24,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=24400.0, ans=0.125 2023-11-18 02:53:01,438 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3700, loss[loss=0.1965, simple_loss=0.1993, pruned_loss=0.08473, audio_tagging_loss=0.01215, over 15614.00 frames. ], tot_loss[loss=0.1686, simple_loss=0.1611, pruned_loss=0.07437, audio_tagging_loss=0.01367, over 3061695.75 frames. ], batch size: 56, lr: 4.26e-02, grad_scale: 64.0 2023-11-18 02:53:05,626 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 1.068e+02 1.322e+02 1.624e+02 2.925e+02, threshold=2.645e+02, percent-clipped=5.0 2023-11-18 02:53:07,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=24666.666666666668, ans=0.125 2023-11-18 02:53:10,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=24666.666666666668, ans=0.2 2023-11-18 02:53:10,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=24666.666666666668, ans=0.09899494936611666 2023-11-18 02:53:10,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=24666.666666666668, ans=0.2 2023-11-18 02:53:24,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=24800.0, ans=0.125 2023-11-18 02:53:28,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=24800.0, ans=0.125 2023-11-18 02:53:29,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=24800.0, ans=0.0 2023-11-18 02:53:29,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24800.0, ans=0.1 2023-11-18 02:53:36,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=24866.666666666668, ans=0.0054637681159420285 2023-11-18 02:53:51,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=24933.333333333332, ans=0.0 2023-11-18 02:53:58,213 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3750, loss[loss=0.188, simple_loss=0.1803, pruned_loss=0.08539, audio_tagging_loss=0.01248, over 16283.00 frames. ], tot_loss[loss=0.1694, simple_loss=0.1622, pruned_loss=0.07466, audio_tagging_loss=0.01366, over 3061157.60 frames. ], batch size: 58, lr: 4.26e-02, grad_scale: 64.0 2023-11-18 02:53:59,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=25000.0, ans=0.0 2023-11-18 02:54:00,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=25000.0, ans=0.025 2023-11-18 02:54:11,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=25066.666666666668, ans=0.125 2023-11-18 02:54:22,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=25133.333333333332, ans=0.125 2023-11-18 02:54:37,594 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:54:56,676 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3800, loss[loss=0.1612, simple_loss=0.1553, pruned_loss=0.0671, audio_tagging_loss=0.01643, over 15357.00 frames. ], tot_loss[loss=0.1682, simple_loss=0.161, pruned_loss=0.07379, audio_tagging_loss=0.0139, over 3055409.18 frames. ], batch size: 57, lr: 4.25e-02, grad_scale: 64.0 2023-11-18 02:54:59,360 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.94 vs. limit=15.0 2023-11-18 02:55:01,041 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.087e+01 1.058e+02 1.234e+02 1.426e+02 2.558e+02, threshold=2.469e+02, percent-clipped=0.0 2023-11-18 02:55:05,060 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.26 vs. limit=15.0 2023-11-18 02:55:06,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=25400.0, ans=0.125 2023-11-18 02:55:12,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=25400.0, ans=0.125 2023-11-18 02:55:20,141 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.92 vs. limit=15.0 2023-11-18 02:55:25,070 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.58 vs. limit=15.0 2023-11-18 02:55:25,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=25466.666666666668, ans=0.0 2023-11-18 02:55:29,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=25533.333333333332, ans=0.0 2023-11-18 02:55:38,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=25533.333333333332, ans=0.025 2023-11-18 02:55:48,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=25600.0, ans=15.0 2023-11-18 02:55:53,511 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3850, loss[loss=0.1785, simple_loss=0.165, pruned_loss=0.08361, audio_tagging_loss=0.01233, over 14888.00 frames. ], tot_loss[loss=0.1688, simple_loss=0.1615, pruned_loss=0.07407, audio_tagging_loss=0.01399, over 3062750.44 frames. ], batch size: 56, lr: 4.24e-02, grad_scale: 64.0 2023-11-18 02:56:09,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=25733.333333333332, ans=0.125 2023-11-18 02:56:13,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=25733.333333333332, ans=0.025 2023-11-18 02:56:13,516 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.05 vs. limit=12.0 2023-11-18 02:56:14,598 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.82 vs. limit=22.5 2023-11-18 02:56:27,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=25866.666666666668, ans=0.2 2023-11-18 02:56:32,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=25866.666666666668, ans=0.125 2023-11-18 02:56:42,646 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.60 vs. limit=22.5 2023-11-18 02:56:49,652 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3900, loss[loss=0.1132, simple_loss=0.1066, pruned_loss=0.04624, audio_tagging_loss=0.01364, over 15155.00 frames. ], tot_loss[loss=0.1678, simple_loss=0.1608, pruned_loss=0.07336, audio_tagging_loss=0.01401, over 3052080.18 frames. ], batch size: 56, lr: 4.24e-02, grad_scale: 64.0 2023-11-18 02:56:49,915 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.544e+00 2023-11-18 02:56:54,439 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.882e+01 1.063e+02 1.269e+02 1.447e+02 2.279e+02, threshold=2.539e+02, percent-clipped=0.0 2023-11-18 02:57:00,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=26000.0, ans=0.1 2023-11-18 02:57:06,876 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2023-11-18 02:57:25,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=26200.0, ans=0.0 2023-11-18 02:57:36,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=26266.666666666668, ans=22.5 2023-11-18 02:57:43,362 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:57:47,471 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 3950, loss[loss=0.1818, simple_loss=0.1849, pruned_loss=0.07672, audio_tagging_loss=0.01265, over 15576.00 frames. ], tot_loss[loss=0.1678, simple_loss=0.161, pruned_loss=0.07333, audio_tagging_loss=0.01398, over 3043618.51 frames. ], batch size: 55, lr: 4.23e-02, grad_scale: 64.0 2023-11-18 02:57:49,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=26333.333333333332, ans=0.125 2023-11-18 02:57:54,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=26333.333333333332, ans=0.1 2023-11-18 02:58:06,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=26400.0, ans=0.125 2023-11-18 02:58:08,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=26466.666666666668, ans=0.125 2023-11-18 02:58:11,289 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.98 vs. limit=15.0 2023-11-18 02:58:18,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=26466.666666666668, ans=0.2 2023-11-18 02:58:25,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=26533.333333333332, ans=0.125 2023-11-18 02:58:30,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=26533.333333333332, ans=0.05 2023-11-18 02:58:31,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=26600.0, ans=0.125 2023-11-18 02:58:40,262 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.51 vs. limit=22.5 2023-11-18 02:58:46,897 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4000, loss[loss=0.1823, simple_loss=0.1817, pruned_loss=0.07833, audio_tagging_loss=0.01312, over 16757.00 frames. ], tot_loss[loss=0.1678, simple_loss=0.161, pruned_loss=0.07325, audio_tagging_loss=0.01409, over 3045812.27 frames. ], batch size: 62, lr: 4.23e-02, grad_scale: 64.0 2023-11-18 02:58:51,138 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.575e+01 1.092e+02 1.270e+02 1.504e+02 2.237e+02, threshold=2.540e+02, percent-clipped=0.0 2023-11-18 02:58:54,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=26666.666666666668, ans=0.125 2023-11-18 02:58:56,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=26733.333333333332, ans=0.0 2023-11-18 02:59:04,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=26733.333333333332, ans=0.09899494936611666 2023-11-18 02:59:07,619 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:59:20,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=26866.666666666668, ans=0.025 2023-11-18 02:59:22,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=26866.666666666668, ans=0.125 2023-11-18 02:59:36,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=26933.333333333332, ans=0.09899494936611666 2023-11-18 02:59:41,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=26933.333333333332, ans=0.95 2023-11-18 02:59:42,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=27000.0, ans=0.125 2023-11-18 02:59:43,003 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4050, loss[loss=0.1565, simple_loss=0.1505, pruned_loss=0.06882, audio_tagging_loss=0.01248, over 16788.00 frames. ], tot_loss[loss=0.167, simple_loss=0.16, pruned_loss=0.07288, audio_tagging_loss=0.01412, over 3043302.65 frames. ], batch size: 62, lr: 4.22e-02, grad_scale: 64.0 2023-11-18 02:59:43,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=27000.0, ans=0.125 2023-11-18 02:59:46,360 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:59:49,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=27000.0, ans=0.125 2023-11-18 02:59:54,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=27066.666666666668, ans=0.125 2023-11-18 03:00:35,859 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2023-11-18 03:00:40,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=27333.333333333332, ans=0.0049275362318840586 2023-11-18 03:00:41,271 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4100, loss[loss=0.1622, simple_loss=0.167, pruned_loss=0.06581, audio_tagging_loss=0.01287, over 15070.00 frames. ], tot_loss[loss=0.1681, simple_loss=0.1614, pruned_loss=0.07329, audio_tagging_loss=0.01407, over 3042913.37 frames. ], batch size: 57, lr: 4.22e-02, grad_scale: 64.0 2023-11-18 03:00:45,567 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 1.139e+02 1.299e+02 1.567e+02 2.247e+02, threshold=2.597e+02, percent-clipped=0.0 2023-11-18 03:00:47,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=27333.333333333332, ans=0.125 2023-11-18 03:00:57,312 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.00 vs. limit=15.0 2023-11-18 03:01:01,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=27400.0, ans=0.125 2023-11-18 03:01:05,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=27466.666666666668, ans=0.004898550724637681 2023-11-18 03:01:06,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=27466.666666666668, ans=0.125 2023-11-18 03:01:08,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=27466.666666666668, ans=0.125 2023-11-18 03:01:20,077 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.14 vs. limit=15.0 2023-11-18 03:01:24,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=27533.333333333332, ans=0.0 2023-11-18 03:01:35,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=27600.0, ans=0.1 2023-11-18 03:01:37,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=27666.666666666668, ans=0.125 2023-11-18 03:01:38,125 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4150, loss[loss=0.1716, simple_loss=0.1653, pruned_loss=0.07761, audio_tagging_loss=0.01134, over 14771.00 frames. ], tot_loss[loss=0.1663, simple_loss=0.1603, pruned_loss=0.07226, audio_tagging_loss=0.01387, over 3037345.22 frames. ], batch size: 56, lr: 4.21e-02, grad_scale: 64.0 2023-11-18 03:01:46,243 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.36 vs. limit=6.0 2023-11-18 03:02:00,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=27800.0, ans=0.125 2023-11-18 03:02:09,437 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.32 vs. limit=15.0 2023-11-18 03:02:19,699 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:02:26,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=27933.333333333332, ans=0.125 2023-11-18 03:02:34,608 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4200, loss[loss=0.1201, simple_loss=0.1149, pruned_loss=0.04602, audio_tagging_loss=0.01658, over 14380.00 frames. ], tot_loss[loss=0.1659, simple_loss=0.1604, pruned_loss=0.07202, audio_tagging_loss=0.01369, over 3043627.91 frames. ], batch size: 53, lr: 4.20e-02, grad_scale: 64.0 2023-11-18 03:02:38,899 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.536e+01 1.064e+02 1.276e+02 1.442e+02 2.964e+02, threshold=2.551e+02, percent-clipped=1.0 2023-11-18 03:02:40,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=28000.0, ans=0.125 2023-11-18 03:02:59,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=28133.333333333332, ans=0.0 2023-11-18 03:03:02,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=28133.333333333332, ans=0.125 2023-11-18 03:03:10,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=28200.0, ans=0.025 2023-11-18 03:03:11,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=28200.0, ans=0.125 2023-11-18 03:03:14,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=28200.0, ans=0.125 2023-11-18 03:03:15,039 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.08 vs. limit=22.5 2023-11-18 03:03:32,448 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4250, loss[loss=0.1887, simple_loss=0.1988, pruned_loss=0.07822, audio_tagging_loss=0.01111, over 15169.00 frames. ], tot_loss[loss=0.1663, simple_loss=0.1611, pruned_loss=0.07216, audio_tagging_loss=0.01356, over 3043468.05 frames. ], batch size: 56, lr: 4.20e-02, grad_scale: 64.0 2023-11-18 03:03:33,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=28333.333333333332, ans=0.05 2023-11-18 03:03:55,143 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.77 vs. limit=10.0 2023-11-18 03:04:06,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=28533.333333333332, ans=0.0 2023-11-18 03:04:07,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=28533.333333333332, ans=0.125 2023-11-18 03:04:25,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=28600.0, ans=0.1 2023-11-18 03:04:27,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=28666.666666666668, ans=0.1 2023-11-18 03:04:28,446 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4300, loss[loss=0.1443, simple_loss=0.1381, pruned_loss=0.06166, audio_tagging_loss=0.01361, over 14394.00 frames. ], tot_loss[loss=0.1666, simple_loss=0.1616, pruned_loss=0.07224, audio_tagging_loss=0.01352, over 3050983.11 frames. ], batch size: 55, lr: 4.19e-02, grad_scale: 64.0 2023-11-18 03:04:32,714 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 1.089e+02 1.255e+02 1.443e+02 2.387e+02, threshold=2.510e+02, percent-clipped=0.0 2023-11-18 03:04:40,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=28733.333333333332, ans=0.0 2023-11-18 03:04:45,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=28733.333333333332, ans=15.0 2023-11-18 03:04:47,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=28733.333333333332, ans=0.1 2023-11-18 03:04:48,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=28733.333333333332, ans=0.0 2023-11-18 03:04:49,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=28800.0, ans=0.0 2023-11-18 03:04:55,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=28800.0, ans=0.125 2023-11-18 03:04:55,557 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.03 vs. limit=10.0 2023-11-18 03:05:20,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=28933.333333333332, ans=0.125 2023-11-18 03:05:25,011 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.42 vs. limit=6.0 2023-11-18 03:05:25,325 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4350, loss[loss=0.1843, simple_loss=0.1816, pruned_loss=0.08171, audio_tagging_loss=0.01178, over 16286.00 frames. ], tot_loss[loss=0.1665, simple_loss=0.1614, pruned_loss=0.07224, audio_tagging_loss=0.01352, over 3056702.45 frames. ], batch size: 60, lr: 4.19e-02, grad_scale: 64.0 2023-11-18 03:05:44,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=29066.666666666668, ans=0.125 2023-11-18 03:05:47,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=29133.333333333332, ans=0.0 2023-11-18 03:05:49,389 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.52 vs. limit=15.0 2023-11-18 03:05:57,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=29133.333333333332, ans=0.125 2023-11-18 03:06:03,123 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2023-11-18 03:06:07,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=29200.0, ans=0.0 2023-11-18 03:06:11,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=29266.666666666668, ans=0.125 2023-11-18 03:06:19,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=29266.666666666668, ans=0.004507246376811594 2023-11-18 03:06:22,941 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4400, loss[loss=0.1204, simple_loss=0.1091, pruned_loss=0.04854, audio_tagging_loss=0.01726, over 15234.00 frames. ], tot_loss[loss=0.1646, simple_loss=0.1594, pruned_loss=0.07128, audio_tagging_loss=0.01365, over 3045322.91 frames. ], batch size: 58, lr: 4.18e-02, grad_scale: 64.0 2023-11-18 03:06:27,737 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.425e+01 1.162e+02 1.302e+02 1.640e+02 3.175e+02, threshold=2.603e+02, percent-clipped=6.0 2023-11-18 03:07:03,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=29533.333333333332, ans=0.125 2023-11-18 03:07:09,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=29600.0, ans=0.1 2023-11-18 03:07:14,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=29600.0, ans=0.125 2023-11-18 03:07:19,270 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4450, loss[loss=0.1624, simple_loss=0.1545, pruned_loss=0.06656, audio_tagging_loss=0.01857, over 15226.00 frames. ], tot_loss[loss=0.164, simple_loss=0.1589, pruned_loss=0.07084, audio_tagging_loss=0.01376, over 3049496.25 frames. ], batch size: 58, lr: 4.17e-02, grad_scale: 64.0 2023-11-18 03:07:19,948 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.63 vs. limit=15.0 2023-11-18 03:07:35,537 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=12.42 vs. limit=12.0 2023-11-18 03:07:38,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=29733.333333333332, ans=0.1 2023-11-18 03:07:48,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=29800.0, ans=0.125 2023-11-18 03:07:59,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=29866.666666666668, ans=0.004376811594202898 2023-11-18 03:08:15,493 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4500, loss[loss=0.1613, simple_loss=0.1616, pruned_loss=0.06738, audio_tagging_loss=0.0131, over 15299.00 frames. ], tot_loss[loss=0.1655, simple_loss=0.1607, pruned_loss=0.07163, audio_tagging_loss=0.01352, over 3050107.13 frames. ], batch size: 57, lr: 4.17e-02, grad_scale: 64.0 2023-11-18 03:08:20,342 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 1.091e+02 1.301e+02 1.544e+02 2.749e+02, threshold=2.602e+02, percent-clipped=1.0 2023-11-18 03:08:27,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=30066.666666666668, ans=0.1 2023-11-18 03:08:27,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=30066.666666666668, ans=0.2 2023-11-18 03:08:33,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=30066.666666666668, ans=0.0 2023-11-18 03:08:41,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=30133.333333333332, ans=0.004318840579710145 2023-11-18 03:08:54,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=30200.0, ans=0.015 2023-11-18 03:09:02,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=30266.666666666668, ans=0.0 2023-11-18 03:09:04,913 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.16 vs. limit=15.0 2023-11-18 03:09:05,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=30266.666666666668, ans=0.1 2023-11-18 03:09:07,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=30266.666666666668, ans=0.1 2023-11-18 03:09:13,087 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4550, loss[loss=0.1204, simple_loss=0.1195, pruned_loss=0.049, audio_tagging_loss=0.01165, over 15964.00 frames. ], tot_loss[loss=0.1642, simple_loss=0.1596, pruned_loss=0.07091, audio_tagging_loss=0.01355, over 3045360.27 frames. ], batch size: 59, lr: 4.16e-02, grad_scale: 64.0 2023-11-18 03:09:27,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=30400.0, ans=0.125 2023-11-18 03:09:34,246 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:09:36,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=30466.666666666668, ans=0.0 2023-11-18 03:09:40,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=30466.666666666668, ans=0.0 2023-11-18 03:09:50,915 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.88 vs. limit=15.0 2023-11-18 03:09:57,979 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:10:02,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=30600.0, ans=0.05 2023-11-18 03:10:10,257 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4600, loss[loss=0.1458, simple_loss=0.137, pruned_loss=0.064, audio_tagging_loss=0.01327, over 14300.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.1582, pruned_loss=0.06999, audio_tagging_loss=0.01359, over 3045883.51 frames. ], batch size: 56, lr: 4.15e-02, grad_scale: 64.0 2023-11-18 03:10:11,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=30666.666666666668, ans=0.004202898550724638 2023-11-18 03:10:14,504 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 1.069e+02 1.267e+02 1.546e+02 2.795e+02, threshold=2.534e+02, percent-clipped=1.0 2023-11-18 03:10:15,759 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:10:16,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=30666.666666666668, ans=0.2 2023-11-18 03:10:48,330 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2023-11-18 03:10:52,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=30866.666666666668, ans=0.004159420289855072 2023-11-18 03:11:02,212 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.27 vs. limit=15.0 2023-11-18 03:11:04,591 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.48 vs. limit=15.0 2023-11-18 03:11:06,016 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4650, loss[loss=0.1295, simple_loss=0.1239, pruned_loss=0.05042, audio_tagging_loss=0.0171, over 14485.00 frames. ], tot_loss[loss=0.1627, simple_loss=0.1579, pruned_loss=0.07003, audio_tagging_loss=0.01375, over 3050877.82 frames. ], batch size: 55, lr: 4.15e-02, grad_scale: 64.0 2023-11-18 03:11:11,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=31000.0, ans=0.125 2023-11-18 03:11:24,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=31066.666666666668, ans=0.125 2023-11-18 03:11:42,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=31200.0, ans=0.00408695652173913 2023-11-18 03:11:49,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=31266.666666666668, ans=0.004072463768115942 2023-11-18 03:12:02,402 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4700, loss[loss=0.1539, simple_loss=0.1395, pruned_loss=0.06346, audio_tagging_loss=0.02066, over 15443.00 frames. ], tot_loss[loss=0.1629, simple_loss=0.158, pruned_loss=0.06995, audio_tagging_loss=0.01393, over 3054375.95 frames. ], batch size: 60, lr: 4.14e-02, grad_scale: 64.0 2023-11-18 03:12:07,955 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.482e+01 1.109e+02 1.199e+02 1.387e+02 2.796e+02, threshold=2.398e+02, percent-clipped=1.0 2023-11-18 03:12:12,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=31333.333333333332, ans=0.125 2023-11-18 03:12:44,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=31533.333333333332, ans=0.1 2023-11-18 03:12:59,574 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4750, loss[loss=0.1284, simple_loss=0.1131, pruned_loss=0.0557, audio_tagging_loss=0.01617, over 13895.00 frames. ], tot_loss[loss=0.1605, simple_loss=0.1553, pruned_loss=0.06875, audio_tagging_loss=0.01409, over 3058560.85 frames. ], batch size: 54, lr: 4.14e-02, grad_scale: 64.0 2023-11-18 03:13:13,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=31733.333333333332, ans=0.0 2023-11-18 03:13:19,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=31733.333333333332, ans=0.1 2023-11-18 03:13:48,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=31933.333333333332, ans=0.125 2023-11-18 03:13:55,722 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4800, loss[loss=0.185, simple_loss=0.1873, pruned_loss=0.07834, audio_tagging_loss=0.01305, over 15625.00 frames. ], tot_loss[loss=0.1598, simple_loss=0.1543, pruned_loss=0.06836, audio_tagging_loss=0.01427, over 3049136.13 frames. ], batch size: 55, lr: 4.13e-02, grad_scale: 64.0 2023-11-18 03:13:59,888 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.623e+01 1.059e+02 1.265e+02 1.558e+02 2.176e+02, threshold=2.529e+02, percent-clipped=0.0 2023-11-18 03:14:28,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=32133.333333333332, ans=0.1 2023-11-18 03:14:38,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=32200.0, ans=0.125 2023-11-18 03:14:44,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=32266.666666666668, ans=0.2 2023-11-18 03:14:51,909 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4850, loss[loss=0.1363, simple_loss=0.1348, pruned_loss=0.05361, audio_tagging_loss=0.01524, over 14692.00 frames. ], tot_loss[loss=0.1595, simple_loss=0.1544, pruned_loss=0.06792, audio_tagging_loss=0.01444, over 3042633.26 frames. ], batch size: 55, lr: 4.12e-02, grad_scale: 64.0 2023-11-18 03:15:02,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=32333.333333333332, ans=0.125 2023-11-18 03:15:24,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=32533.333333333332, ans=0.125 2023-11-18 03:15:37,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=32600.0, ans=0.0037826086956521737 2023-11-18 03:15:48,646 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4900, loss[loss=0.1657, simple_loss=0.1689, pruned_loss=0.0703, audio_tagging_loss=0.01099, over 14792.00 frames. ], tot_loss[loss=0.1599, simple_loss=0.155, pruned_loss=0.06821, audio_tagging_loss=0.01418, over 3041404.88 frames. ], batch size: 55, lr: 4.12e-02, grad_scale: 64.0 2023-11-18 03:15:52,880 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.066e+01 1.045e+02 1.197e+02 1.386e+02 2.012e+02, threshold=2.394e+02, percent-clipped=0.0 2023-11-18 03:15:55,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=32666.666666666668, ans=0.2 2023-11-18 03:16:01,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=32733.333333333332, ans=0.04949747468305833 2023-11-18 03:16:04,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=32733.333333333332, ans=0.125 2023-11-18 03:16:12,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=32800.0, ans=0.2 2023-11-18 03:16:20,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=32866.666666666664, ans=0.0 2023-11-18 03:16:27,073 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.83 vs. limit=6.0 2023-11-18 03:16:35,529 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:16:43,806 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 4950, loss[loss=0.1604, simple_loss=0.1517, pruned_loss=0.07149, audio_tagging_loss=0.01308, over 15089.00 frames. ], tot_loss[loss=0.1608, simple_loss=0.156, pruned_loss=0.06898, audio_tagging_loss=0.01386, over 3047178.37 frames. ], batch size: 58, lr: 4.11e-02, grad_scale: 64.0 2023-11-18 03:16:47,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=33000.0, ans=0.0036956521739130435 2023-11-18 03:16:50,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=33000.0, ans=0.125 2023-11-18 03:16:58,799 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.58 vs. limit=22.5 2023-11-18 03:17:01,486 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.39 vs. limit=15.0 2023-11-18 03:17:06,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=33133.333333333336, ans=0.125 2023-11-18 03:17:12,420 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.57 vs. limit=15.0 2023-11-18 03:17:13,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=33133.333333333336, ans=0.1 2023-11-18 03:17:36,011 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.26 vs. limit=10.0 2023-11-18 03:17:39,780 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.03 vs. limit=10.0 2023-11-18 03:17:40,580 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5000, loss[loss=0.184, simple_loss=0.1954, pruned_loss=0.07688, audio_tagging_loss=0.009376, over 15439.00 frames. ], tot_loss[loss=0.1603, simple_loss=0.1562, pruned_loss=0.06858, audio_tagging_loss=0.01363, over 3052376.54 frames. ], batch size: 53, lr: 4.10e-02, grad_scale: 64.0 2023-11-18 03:17:45,399 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 1.071e+02 1.252e+02 1.412e+02 1.907e+02, threshold=2.505e+02, percent-clipped=0.0 2023-11-18 03:17:47,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=33333.333333333336, ans=0.125 2023-11-18 03:17:53,725 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.38 vs. limit=15.0 2023-11-18 03:18:08,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=33466.666666666664, ans=0.1 2023-11-18 03:18:09,608 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=6.14 vs. limit=12.0 2023-11-18 03:18:12,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=33466.666666666664, ans=0.125 2023-11-18 03:18:18,356 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.93 vs. limit=10.0 2023-11-18 03:18:24,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=33600.0, ans=0.125 2023-11-18 03:18:28,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=33600.0, ans=0.0 2023-11-18 03:18:38,043 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5050, loss[loss=0.1836, simple_loss=0.193, pruned_loss=0.0742, audio_tagging_loss=0.01291, over 14285.00 frames. ], tot_loss[loss=0.1593, simple_loss=0.1553, pruned_loss=0.06807, audio_tagging_loss=0.01355, over 3043666.87 frames. ], batch size: 52, lr: 4.10e-02, grad_scale: 64.0 2023-11-18 03:18:44,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=33666.666666666664, ans=0.0 2023-11-18 03:18:46,959 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.58 vs. limit=15.0 2023-11-18 03:18:50,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=33733.333333333336, ans=0.125 2023-11-18 03:18:59,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=33800.0, ans=0.0 2023-11-18 03:19:08,122 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.81 vs. limit=10.0 2023-11-18 03:19:09,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=33800.0, ans=0.125 2023-11-18 03:19:11,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=33866.666666666664, ans=0.125 2023-11-18 03:19:21,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=33866.666666666664, ans=0.125 2023-11-18 03:19:21,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=33933.333333333336, ans=0.125 2023-11-18 03:19:32,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=34000.0, ans=0.1 2023-11-18 03:19:33,397 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5100, loss[loss=0.1415, simple_loss=0.1289, pruned_loss=0.05857, audio_tagging_loss=0.01844, over 15604.00 frames. ], tot_loss[loss=0.1607, simple_loss=0.1567, pruned_loss=0.06885, audio_tagging_loss=0.01347, over 3048762.99 frames. ], batch size: 61, lr: 4.09e-02, grad_scale: 64.0 2023-11-18 03:19:37,617 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 1.065e+02 1.271e+02 1.460e+02 2.434e+02, threshold=2.541e+02, percent-clipped=0.0 2023-11-18 03:19:42,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=34000.0, ans=0.0 2023-11-18 03:19:56,148 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.71 vs. limit=6.0 2023-11-18 03:20:06,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=34200.0, ans=0.0 2023-11-18 03:20:07,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=34200.0, ans=10.0 2023-11-18 03:20:19,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=34266.666666666664, ans=0.003420289855072465 2023-11-18 03:20:29,377 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5150, loss[loss=0.1498, simple_loss=0.1456, pruned_loss=0.06439, audio_tagging_loss=0.01259, over 14289.00 frames. ], tot_loss[loss=0.1599, simple_loss=0.1558, pruned_loss=0.06844, audio_tagging_loss=0.01357, over 3048970.69 frames. ], batch size: 54, lr: 4.09e-02, grad_scale: 64.0 2023-11-18 03:20:33,158 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.84 vs. limit=15.0 2023-11-18 03:20:43,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=34400.0, ans=0.125 2023-11-18 03:20:45,096 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=15.0 2023-11-18 03:20:49,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=34400.0, ans=0.125 2023-11-18 03:21:11,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=34533.333333333336, ans=0.0 2023-11-18 03:21:23,232 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.15 vs. limit=6.0 2023-11-18 03:21:26,263 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5200, loss[loss=0.1917, simple_loss=0.1924, pruned_loss=0.08404, audio_tagging_loss=0.01141, over 15275.00 frames. ], tot_loss[loss=0.1596, simple_loss=0.1554, pruned_loss=0.06829, audio_tagging_loss=0.01364, over 3043311.39 frames. ], batch size: 57, lr: 4.08e-02, grad_scale: 64.0 2023-11-18 03:21:29,091 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.27 vs. limit=22.5 2023-11-18 03:21:30,528 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.892e+01 1.044e+02 1.171e+02 1.375e+02 2.529e+02, threshold=2.342e+02, percent-clipped=0.0 2023-11-18 03:21:48,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=34800.0, ans=0.1 2023-11-18 03:21:57,544 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2023-11-18 03:22:10,516 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.56 vs. limit=15.0 2023-11-18 03:22:13,676 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2023-11-18 03:22:21,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=35000.0, ans=0.125 2023-11-18 03:22:22,079 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5250, loss[loss=0.1157, simple_loss=0.1036, pruned_loss=0.04689, audio_tagging_loss=0.01703, over 13863.00 frames. ], tot_loss[loss=0.1599, simple_loss=0.1562, pruned_loss=0.06826, audio_tagging_loss=0.01354, over 3038740.34 frames. ], batch size: 53, lr: 4.07e-02, grad_scale: 64.0 2023-11-18 03:22:24,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.43 vs. limit=22.5 2023-11-18 03:22:52,947 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=26.45 vs. limit=22.5 2023-11-18 03:22:54,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=35200.0, ans=0.0 2023-11-18 03:23:04,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=35200.0, ans=0.125 2023-11-18 03:23:18,038 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5300, loss[loss=0.1891, simple_loss=0.1914, pruned_loss=0.08298, audio_tagging_loss=0.01044, over 16092.00 frames. ], tot_loss[loss=0.1592, simple_loss=0.1557, pruned_loss=0.06784, audio_tagging_loss=0.01351, over 3046103.76 frames. ], batch size: 59, lr: 4.07e-02, grad_scale: 64.0 2023-11-18 03:23:22,281 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.314e+01 1.054e+02 1.180e+02 1.432e+02 2.621e+02, threshold=2.360e+02, percent-clipped=2.0 2023-11-18 03:23:28,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=35400.0, ans=0.125 2023-11-18 03:23:46,871 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2023-11-18 03:24:10,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=35600.0, ans=0.1 2023-11-18 03:24:11,463 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.18 vs. limit=15.0 2023-11-18 03:24:14,622 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5350, loss[loss=0.09835, simple_loss=0.0879, pruned_loss=0.03653, audio_tagging_loss=0.01787, over 15191.00 frames. ], tot_loss[loss=0.1585, simple_loss=0.1552, pruned_loss=0.06736, audio_tagging_loss=0.01357, over 3048287.17 frames. ], batch size: 59, lr: 4.06e-02, grad_scale: 64.0 2023-11-18 03:24:38,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=35800.0, ans=0.025 2023-11-18 03:24:51,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=35866.666666666664, ans=0.125 2023-11-18 03:24:52,779 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=26.26 vs. limit=22.5 2023-11-18 03:25:05,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=35933.333333333336, ans=0.0 2023-11-18 03:25:10,836 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5400, loss[loss=0.1333, simple_loss=0.1308, pruned_loss=0.05239, audio_tagging_loss=0.01549, over 15358.00 frames. ], tot_loss[loss=0.158, simple_loss=0.1545, pruned_loss=0.06702, audio_tagging_loss=0.01366, over 3046609.80 frames. ], batch size: 57, lr: 4.05e-02, grad_scale: 64.0 2023-11-18 03:25:14,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=36000.0, ans=0.125 2023-11-18 03:25:15,058 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.377e+01 1.086e+02 1.314e+02 1.571e+02 2.162e+02, threshold=2.627e+02, percent-clipped=0.0 2023-11-18 03:25:22,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=36066.666666666664, ans=0.04949747468305833 2023-11-18 03:25:25,924 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.69 vs. limit=15.0 2023-11-18 03:25:26,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=36066.666666666664, ans=0.0 2023-11-18 03:25:27,589 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:25:41,916 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.16 vs. limit=15.0 2023-11-18 03:25:51,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=36200.0, ans=0.1 2023-11-18 03:26:02,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=36266.666666666664, ans=0.125 2023-11-18 03:26:06,809 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5450, loss[loss=0.1674, simple_loss=0.1605, pruned_loss=0.07286, audio_tagging_loss=0.01431, over 14678.00 frames. ], tot_loss[loss=0.1568, simple_loss=0.1531, pruned_loss=0.06646, audio_tagging_loss=0.01376, over 3042755.85 frames. ], batch size: 53, lr: 4.05e-02, grad_scale: 64.0 2023-11-18 03:26:11,578 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.19 vs. limit=15.0 2023-11-18 03:26:32,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=36466.666666666664, ans=0.0 2023-11-18 03:26:34,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=36466.666666666664, ans=0.125 2023-11-18 03:26:39,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=36533.333333333336, ans=0.0 2023-11-18 03:26:45,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=36533.333333333336, ans=0.0 2023-11-18 03:26:45,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=36533.333333333336, ans=0.0 2023-11-18 03:26:48,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=36533.333333333336, ans=0.2 2023-11-18 03:27:00,180 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2023-11-18 03:27:03,292 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5500, loss[loss=0.126, simple_loss=0.1244, pruned_loss=0.04926, audio_tagging_loss=0.01453, over 15857.00 frames. ], tot_loss[loss=0.1556, simple_loss=0.1519, pruned_loss=0.06577, audio_tagging_loss=0.01384, over 3040880.25 frames. ], batch size: 60, lr: 4.04e-02, grad_scale: 64.0 2023-11-18 03:27:07,492 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.157e+01 1.024e+02 1.184e+02 1.343e+02 1.900e+02, threshold=2.368e+02, percent-clipped=0.0 2023-11-18 03:27:12,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=36733.333333333336, ans=0.125 2023-11-18 03:27:20,364 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:27:23,500 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.13 vs. limit=22.5 2023-11-18 03:27:40,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=36866.666666666664, ans=0.1 2023-11-18 03:27:58,586 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5550, loss[loss=0.1206, simple_loss=0.1179, pruned_loss=0.0483, audio_tagging_loss=0.01332, over 15121.00 frames. ], tot_loss[loss=0.1558, simple_loss=0.1517, pruned_loss=0.06583, audio_tagging_loss=0.01406, over 3036793.15 frames. ], batch size: 58, lr: 4.03e-02, grad_scale: 64.0 2023-11-18 03:28:00,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=37000.0, ans=0.125 2023-11-18 03:28:08,179 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2023-11-18 03:28:09,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=37066.666666666664, ans=0.07 2023-11-18 03:28:11,161 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2023-11-18 03:28:16,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=37066.666666666664, ans=0.002811594202898552 2023-11-18 03:28:33,201 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.67 vs. limit=22.5 2023-11-18 03:28:40,821 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=21.30 vs. limit=22.5 2023-11-18 03:28:52,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=37266.666666666664, ans=0.0 2023-11-18 03:28:54,745 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5600, loss[loss=0.1904, simple_loss=0.1888, pruned_loss=0.08287, audio_tagging_loss=0.01309, over 16375.00 frames. ], tot_loss[loss=0.156, simple_loss=0.1522, pruned_loss=0.06574, audio_tagging_loss=0.01419, over 3042684.52 frames. ], batch size: 61, lr: 4.03e-02, grad_scale: 64.0 2023-11-18 03:28:57,810 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.43 vs. limit=15.0 2023-11-18 03:28:59,531 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.818e+01 1.034e+02 1.195e+02 1.444e+02 2.133e+02, threshold=2.390e+02, percent-clipped=0.0 2023-11-18 03:29:29,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=37533.333333333336, ans=0.125 2023-11-18 03:29:31,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=37533.333333333336, ans=0.125 2023-11-18 03:29:35,253 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:29:51,786 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5650, loss[loss=0.189, simple_loss=0.1906, pruned_loss=0.0797, audio_tagging_loss=0.01404, over 16622.00 frames. ], tot_loss[loss=0.1569, simple_loss=0.1536, pruned_loss=0.06599, audio_tagging_loss=0.01415, over 3049110.31 frames. ], batch size: 61, lr: 4.02e-02, grad_scale: 128.0 2023-11-18 03:30:24,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=37866.666666666664, ans=0.125 2023-11-18 03:30:40,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=37933.333333333336, ans=0.125 2023-11-18 03:30:44,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=37933.333333333336, ans=0.125 2023-11-18 03:30:47,184 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5700, loss[loss=0.1341, simple_loss=0.1164, pruned_loss=0.05975, audio_tagging_loss=0.01619, over 15388.00 frames. ], tot_loss[loss=0.1563, simple_loss=0.1533, pruned_loss=0.06554, audio_tagging_loss=0.01413, over 3045873.83 frames. ], batch size: 59, lr: 4.02e-02, grad_scale: 64.0 2023-11-18 03:30:52,374 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.432e+01 1.093e+02 1.259e+02 1.491e+02 2.385e+02, threshold=2.519e+02, percent-clipped=0.0 2023-11-18 03:30:57,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=38066.666666666664, ans=0.2 2023-11-18 03:30:57,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=38066.666666666664, ans=0.1 2023-11-18 03:31:01,966 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.43 vs. limit=22.5 2023-11-18 03:31:03,763 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.14 vs. limit=12.0 2023-11-18 03:31:04,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=38066.666666666664, ans=0.125 2023-11-18 03:31:29,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=38200.0, ans=0.0 2023-11-18 03:31:42,340 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5750, loss[loss=0.1345, simple_loss=0.1414, pruned_loss=0.05204, audio_tagging_loss=0.01169, over 15830.00 frames. ], tot_loss[loss=0.157, simple_loss=0.154, pruned_loss=0.06605, audio_tagging_loss=0.01395, over 3045420.33 frames. ], batch size: 59, lr: 4.01e-02, grad_scale: 32.0 2023-11-18 03:31:46,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=38333.333333333336, ans=0.125 2023-11-18 03:31:56,474 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.86 vs. limit=15.0 2023-11-18 03:32:04,353 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.47 vs. limit=15.0 2023-11-18 03:32:12,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=38466.666666666664, ans=0.125 2023-11-18 03:32:21,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=38533.333333333336, ans=0.0 2023-11-18 03:32:37,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=38600.0, ans=0.125 2023-11-18 03:32:39,958 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5800, loss[loss=0.1938, simple_loss=0.188, pruned_loss=0.08545, audio_tagging_loss=0.01437, over 15902.00 frames. ], tot_loss[loss=0.1573, simple_loss=0.1544, pruned_loss=0.06632, audio_tagging_loss=0.01373, over 3041718.15 frames. ], batch size: 57, lr: 4.00e-02, grad_scale: 32.0 2023-11-18 03:32:46,931 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 1.065e+02 1.200e+02 1.362e+02 2.023e+02, threshold=2.399e+02, percent-clipped=0.0 2023-11-18 03:33:06,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=38800.0, ans=0.125 2023-11-18 03:33:11,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=38800.0, ans=0.125 2023-11-18 03:33:18,348 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.90 vs. limit=15.0 2023-11-18 03:33:30,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=38933.333333333336, ans=0.0024057971014492746 2023-11-18 03:33:36,004 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5850, loss[loss=0.176, simple_loss=0.1683, pruned_loss=0.07615, audio_tagging_loss=0.01576, over 14509.00 frames. ], tot_loss[loss=0.1582, simple_loss=0.1557, pruned_loss=0.06693, audio_tagging_loss=0.01347, over 3038806.19 frames. ], batch size: 53, lr: 4.00e-02, grad_scale: 32.0 2023-11-18 03:33:43,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=39000.0, ans=0.2 2023-11-18 03:33:46,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=39066.666666666664, ans=0.035 2023-11-18 03:33:50,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=39066.666666666664, ans=0.05 2023-11-18 03:33:50,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=39066.666666666664, ans=0.125 2023-11-18 03:34:03,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=39133.333333333336, ans=0.125 2023-11-18 03:34:04,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=39133.333333333336, ans=0.125 2023-11-18 03:34:09,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=39200.0, ans=0.125 2023-11-18 03:34:12,246 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2023-11-18 03:34:16,329 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.95 vs. limit=15.0 2023-11-18 03:34:21,741 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.08 vs. limit=15.0 2023-11-18 03:34:31,908 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5900, loss[loss=0.1256, simple_loss=0.1332, pruned_loss=0.0469, audio_tagging_loss=0.01207, over 15196.00 frames. ], tot_loss[loss=0.1573, simple_loss=0.155, pruned_loss=0.06649, audio_tagging_loss=0.01331, over 3039617.73 frames. ], batch size: 57, lr: 3.99e-02, grad_scale: 32.0 2023-11-18 03:34:38,797 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.189e+01 1.114e+02 1.332e+02 1.512e+02 2.705e+02, threshold=2.665e+02, percent-clipped=2.0 2023-11-18 03:34:52,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=39400.0, ans=0.04949747468305833 2023-11-18 03:34:58,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=39466.666666666664, ans=0.125 2023-11-18 03:35:06,533 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2023-11-18 03:35:11,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.63 vs. limit=22.5 2023-11-18 03:35:28,899 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 5950, loss[loss=0.1789, simple_loss=0.1716, pruned_loss=0.07479, audio_tagging_loss=0.01826, over 15348.00 frames. ], tot_loss[loss=0.1568, simple_loss=0.1547, pruned_loss=0.06614, audio_tagging_loss=0.01332, over 3042221.22 frames. ], batch size: 57, lr: 3.98e-02, grad_scale: 32.0 2023-11-18 03:35:38,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=39666.666666666664, ans=0.0022463768115942037 2023-11-18 03:35:41,825 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2023-11-18 03:35:45,000 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.30 vs. limit=15.0 2023-11-18 03:35:50,317 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2023-11-18 03:35:51,411 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.69 vs. limit=10.0 2023-11-18 03:36:04,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=39866.666666666664, ans=0.5 2023-11-18 03:36:08,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=39866.666666666664, ans=0.1 2023-11-18 03:36:22,856 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2023-11-18 03:36:24,713 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6000, loss[loss=0.1366, simple_loss=0.1383, pruned_loss=0.05561, audio_tagging_loss=0.01187, over 15806.00 frames. ], tot_loss[loss=0.1571, simple_loss=0.1548, pruned_loss=0.06627, audio_tagging_loss=0.01339, over 3048639.81 frames. ], batch size: 59, lr: 3.98e-02, grad_scale: 32.0 2023-11-18 03:36:24,714 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 03:36:49,274 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.5309, 4.4200, 3.6946, 4.3016], device='cuda:2') 2023-11-18 03:36:58,785 INFO [train_asr.py:1147] (2/4) Epoch 1, validation: loss=0.1009, simple_loss=0.07718, pruned_loss=0.02169, audio_tagging_loss=0.04066, over 4681554.00 frames. 2023-11-18 03:36:58,786 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 03:37:05,260 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 1.087e+02 1.275e+02 1.499e+02 2.354e+02, threshold=2.549e+02, percent-clipped=0.0 2023-11-18 03:37:06,585 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:37:14,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=40066.666666666664, ans=0.125 2023-11-18 03:37:32,028 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=15.0 2023-11-18 03:37:35,348 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.27 vs. limit=22.5 2023-11-18 03:37:40,584 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:37:52,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=40266.666666666664, ans=0.0021159420289855076 2023-11-18 03:37:55,890 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6050, loss[loss=0.149, simple_loss=0.1438, pruned_loss=0.06026, audio_tagging_loss=0.01683, over 15462.00 frames. ], tot_loss[loss=0.156, simple_loss=0.1538, pruned_loss=0.06573, audio_tagging_loss=0.0134, over 3045378.51 frames. ], batch size: 59, lr: 3.97e-02, grad_scale: 32.0 2023-11-18 03:38:00,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=40333.333333333336, ans=0.125 2023-11-18 03:38:07,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=40400.0, ans=0.125 2023-11-18 03:38:16,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=40400.0, ans=0.125 2023-11-18 03:38:17,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=40466.666666666664, ans=0.125 2023-11-18 03:38:34,882 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.09 vs. limit=15.0 2023-11-18 03:38:52,475 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6100, loss[loss=0.2076, simple_loss=0.2168, pruned_loss=0.09109, audio_tagging_loss=0.008158, over 15343.00 frames. ], tot_loss[loss=0.1566, simple_loss=0.1545, pruned_loss=0.06608, audio_tagging_loss=0.01326, over 3037376.59 frames. ], batch size: 57, lr: 3.96e-02, grad_scale: 32.0 2023-11-18 03:38:58,885 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.303e+01 1.093e+02 1.234e+02 1.511e+02 2.648e+02, threshold=2.468e+02, percent-clipped=3.0 2023-11-18 03:38:59,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=40666.666666666664, ans=0.1 2023-11-18 03:39:07,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=40733.333333333336, ans=0.125 2023-11-18 03:39:34,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=40866.666666666664, ans=0.125 2023-11-18 03:39:48,157 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6150, loss[loss=0.2016, simple_loss=0.191, pruned_loss=0.09174, audio_tagging_loss=0.01433, over 16920.00 frames. ], tot_loss[loss=0.1567, simple_loss=0.1545, pruned_loss=0.06612, audio_tagging_loss=0.01333, over 3046641.97 frames. ], batch size: 62, lr: 3.96e-02, grad_scale: 32.0 2023-11-18 03:40:13,370 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.49 vs. limit=6.0 2023-11-18 03:40:27,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=41200.0, ans=0.2 2023-11-18 03:40:35,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=41266.666666666664, ans=0.2 2023-11-18 03:40:43,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=41266.666666666664, ans=0.125 2023-11-18 03:40:44,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=41333.333333333336, ans=0.0 2023-11-18 03:40:45,694 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6200, loss[loss=0.1011, simple_loss=0.08627, pruned_loss=0.04119, audio_tagging_loss=0.01673, over 13924.00 frames. ], tot_loss[loss=0.1561, simple_loss=0.1536, pruned_loss=0.06582, audio_tagging_loss=0.01344, over 3045319.90 frames. ], batch size: 56, lr: 3.95e-02, grad_scale: 32.0 2023-11-18 03:40:53,124 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.054e+01 1.077e+02 1.264e+02 1.430e+02 2.412e+02, threshold=2.529e+02, percent-clipped=0.0 2023-11-18 03:40:56,975 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.69 vs. limit=15.0 2023-11-18 03:41:34,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=41600.0, ans=0.0 2023-11-18 03:41:41,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=41600.0, ans=0.0 2023-11-18 03:41:43,086 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6250, loss[loss=0.1444, simple_loss=0.1347, pruned_loss=0.05981, audio_tagging_loss=0.01721, over 16691.00 frames. ], tot_loss[loss=0.1568, simple_loss=0.154, pruned_loss=0.06617, audio_tagging_loss=0.01363, over 3052142.20 frames. ], batch size: 63, lr: 3.94e-02, grad_scale: 32.0 2023-11-18 03:41:44,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=41666.666666666664, ans=0.125 2023-11-18 03:41:45,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=41666.666666666664, ans=0.0018115942028985518 2023-11-18 03:41:55,788 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.18 vs. limit=15.0 2023-11-18 03:41:56,928 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.39 vs. limit=8.0 2023-11-18 03:42:16,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=41866.666666666664, ans=0.1 2023-11-18 03:42:23,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=41866.666666666664, ans=0.0 2023-11-18 03:42:28,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=41933.333333333336, ans=0.125 2023-11-18 03:42:28,851 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.52 vs. limit=15.0 2023-11-18 03:42:30,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=41933.333333333336, ans=0.2 2023-11-18 03:42:39,082 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6300, loss[loss=0.1669, simple_loss=0.1782, pruned_loss=0.06668, audio_tagging_loss=0.01109, over 15476.00 frames. ], tot_loss[loss=0.1562, simple_loss=0.1535, pruned_loss=0.06573, audio_tagging_loss=0.01375, over 3052382.00 frames. ], batch size: 54, lr: 3.94e-02, grad_scale: 32.0 2023-11-18 03:42:46,036 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.073e+01 1.061e+02 1.176e+02 1.388e+02 2.867e+02, threshold=2.352e+02, percent-clipped=1.0 2023-11-18 03:42:50,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=42066.666666666664, ans=0.1 2023-11-18 03:42:55,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=42066.666666666664, ans=0.0 2023-11-18 03:42:58,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=42066.666666666664, ans=0.2 2023-11-18 03:43:05,616 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.60 vs. limit=10.0 2023-11-18 03:43:05,868 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2023-11-18 03:43:36,587 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6350, loss[loss=0.1482, simple_loss=0.1447, pruned_loss=0.06144, audio_tagging_loss=0.01442, over 15306.00 frames. ], tot_loss[loss=0.1565, simple_loss=0.1537, pruned_loss=0.06586, audio_tagging_loss=0.01385, over 3051606.96 frames. ], batch size: 57, lr: 3.93e-02, grad_scale: 32.0 2023-11-18 03:43:36,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=42333.333333333336, ans=0.125 2023-11-18 03:43:41,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=42333.333333333336, ans=0.04949747468305833 2023-11-18 03:44:34,100 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6400, loss[loss=0.152, simple_loss=0.1547, pruned_loss=0.06003, audio_tagging_loss=0.01457, over 15672.00 frames. ], tot_loss[loss=0.1571, simple_loss=0.1541, pruned_loss=0.06607, audio_tagging_loss=0.01401, over 3048020.68 frames. ], batch size: 57, lr: 3.92e-02, grad_scale: 32.0 2023-11-18 03:44:40,524 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.367e+01 1.120e+02 1.287e+02 1.674e+02 2.598e+02, threshold=2.575e+02, percent-clipped=2.0 2023-11-18 03:44:42,917 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:44:44,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.29 vs. limit=12.0 2023-11-18 03:44:54,704 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.74 vs. limit=10.0 2023-11-18 03:45:03,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=42800.0, ans=0.125 2023-11-18 03:45:30,272 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6450, loss[loss=0.1809, simple_loss=0.1799, pruned_loss=0.07742, audio_tagging_loss=0.01353, over 15112.00 frames. ], tot_loss[loss=0.1557, simple_loss=0.153, pruned_loss=0.06518, audio_tagging_loss=0.01404, over 3037557.77 frames. ], batch size: 56, lr: 3.92e-02, grad_scale: 32.0 2023-11-18 03:45:31,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=43000.0, ans=0.125 2023-11-18 03:45:56,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=43133.333333333336, ans=0.125 2023-11-18 03:46:16,877 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.92 vs. limit=15.0 2023-11-18 03:46:26,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=43333.333333333336, ans=0.125 2023-11-18 03:46:27,412 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6500, loss[loss=0.1544, simple_loss=0.1541, pruned_loss=0.0676, audio_tagging_loss=0.009805, over 14229.00 frames. ], tot_loss[loss=0.1543, simple_loss=0.1518, pruned_loss=0.06451, audio_tagging_loss=0.01392, over 3037418.56 frames. ], batch size: 53, lr: 3.91e-02, grad_scale: 32.0 2023-11-18 03:46:33,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=43333.333333333336, ans=0.0 2023-11-18 03:46:34,389 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 9.242e+01 1.070e+02 1.244e+02 1.503e+02 2.306e+02, threshold=2.488e+02, percent-clipped=0.0 2023-11-18 03:46:36,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=43333.333333333336, ans=0.125 2023-11-18 03:46:37,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=43400.0, ans=0.125 2023-11-18 03:46:45,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=43400.0, ans=0.125 2023-11-18 03:47:11,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=43600.0, ans=0.125 2023-11-18 03:47:24,060 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6550, loss[loss=0.1677, simple_loss=0.1702, pruned_loss=0.07008, audio_tagging_loss=0.01252, over 16581.00 frames. ], tot_loss[loss=0.1544, simple_loss=0.1519, pruned_loss=0.06466, audio_tagging_loss=0.01374, over 3040130.55 frames. ], batch size: 60, lr: 3.91e-02, grad_scale: 32.0 2023-11-18 03:47:27,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=43666.666666666664, ans=0.125 2023-11-18 03:47:32,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=43666.666666666664, ans=0.0 2023-11-18 03:47:36,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=43733.333333333336, ans=0.125 2023-11-18 03:48:15,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=43933.333333333336, ans=0.125 2023-11-18 03:48:19,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=44000.0, ans=0.125 2023-11-18 03:48:20,979 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6600, loss[loss=0.1635, simple_loss=0.1697, pruned_loss=0.0682, audio_tagging_loss=0.01039, over 15020.00 frames. ], tot_loss[loss=0.1546, simple_loss=0.1525, pruned_loss=0.06483, audio_tagging_loss=0.01351, over 3042366.72 frames. ], batch size: 56, lr: 3.90e-02, grad_scale: 32.0 2023-11-18 03:48:25,595 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.68 vs. limit=15.0 2023-11-18 03:48:28,033 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.463e+01 1.068e+02 1.215e+02 1.424e+02 2.055e+02, threshold=2.430e+02, percent-clipped=0.0 2023-11-18 03:49:10,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=44266.666666666664, ans=0.5 2023-11-18 03:49:12,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=44266.666666666664, ans=0.125 2023-11-18 03:49:13,048 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.53 vs. limit=22.5 2023-11-18 03:49:17,877 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6650, loss[loss=0.2005, simple_loss=0.1978, pruned_loss=0.0938, audio_tagging_loss=0.00785, over 15855.00 frames. ], tot_loss[loss=0.1536, simple_loss=0.1516, pruned_loss=0.06442, audio_tagging_loss=0.01337, over 3038269.49 frames. ], batch size: 57, lr: 3.89e-02, grad_scale: 32.0 2023-11-18 03:49:38,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=44400.0, ans=0.0012173913043478264 2023-11-18 03:49:54,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=44533.333333333336, ans=0.125 2023-11-18 03:50:15,178 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6700, loss[loss=0.2099, simple_loss=0.2126, pruned_loss=0.09009, audio_tagging_loss=0.01346, over 14620.00 frames. ], tot_loss[loss=0.1531, simple_loss=0.1511, pruned_loss=0.06414, audio_tagging_loss=0.01339, over 3038423.93 frames. ], batch size: 54, lr: 3.89e-02, grad_scale: 32.0 2023-11-18 03:50:21,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.129e+01 1.020e+02 1.157e+02 1.284e+02 2.181e+02, threshold=2.314e+02, percent-clipped=0.0 2023-11-18 03:50:25,469 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.00 vs. limit=15.0 2023-11-18 03:50:38,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=44800.0, ans=0.1 2023-11-18 03:50:51,424 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2023-11-18 03:51:03,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=44933.333333333336, ans=0.125 2023-11-18 03:51:11,203 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6750, loss[loss=0.1703, simple_loss=0.1757, pruned_loss=0.06988, audio_tagging_loss=0.0126, over 14534.00 frames. ], tot_loss[loss=0.1518, simple_loss=0.1496, pruned_loss=0.06345, audio_tagging_loss=0.01351, over 3035992.88 frames. ], batch size: 55, lr: 3.88e-02, grad_scale: 32.0 2023-11-18 03:51:23,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=45066.666666666664, ans=0.2 2023-11-18 03:51:26,761 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2023-11-18 03:51:45,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=45200.0, ans=0.125 2023-11-18 03:52:00,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=45266.666666666664, ans=0.0010289855072463782 2023-11-18 03:52:08,434 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6800, loss[loss=0.1083, simple_loss=0.09214, pruned_loss=0.04648, audio_tagging_loss=0.01577, over 14477.00 frames. ], tot_loss[loss=0.1511, simple_loss=0.1489, pruned_loss=0.06319, audio_tagging_loss=0.01346, over 3037960.66 frames. ], batch size: 57, lr: 3.87e-02, grad_scale: 32.0 2023-11-18 03:52:10,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=45333.333333333336, ans=0.125 2023-11-18 03:52:15,494 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.237e+01 1.104e+02 1.256e+02 1.386e+02 2.512e+02, threshold=2.511e+02, percent-clipped=1.0 2023-11-18 03:52:36,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=45466.666666666664, ans=0.1 2023-11-18 03:52:40,500 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=15.0 2023-11-18 03:52:56,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=45600.0, ans=0.0009565217391304358 2023-11-18 03:53:05,768 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6850, loss[loss=0.1555, simple_loss=0.1554, pruned_loss=0.06167, audio_tagging_loss=0.01618, over 15739.00 frames. ], tot_loss[loss=0.1521, simple_loss=0.15, pruned_loss=0.06358, audio_tagging_loss=0.01351, over 3041045.26 frames. ], batch size: 62, lr: 3.87e-02, grad_scale: 32.0 2023-11-18 03:53:22,100 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=6.755e+00 2023-11-18 03:53:39,386 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.06 vs. limit=22.5 2023-11-18 03:54:01,998 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6900, loss[loss=0.1722, simple_loss=0.1664, pruned_loss=0.07472, audio_tagging_loss=0.01434, over 14471.00 frames. ], tot_loss[loss=0.1518, simple_loss=0.15, pruned_loss=0.06338, audio_tagging_loss=0.01346, over 3043513.50 frames. ], batch size: 55, lr: 3.86e-02, grad_scale: 32.0 2023-11-18 03:54:04,915 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.72 vs. limit=6.0 2023-11-18 03:54:08,335 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.826e+01 1.075e+02 1.233e+02 1.509e+02 2.353e+02, threshold=2.467e+02, percent-clipped=0.0 2023-11-18 03:54:21,787 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2023-11-18 03:54:45,372 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:54:50,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=46266.666666666664, ans=0.0 2023-11-18 03:54:58,485 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 6950, loss[loss=0.1002, simple_loss=0.09377, pruned_loss=0.03811, audio_tagging_loss=0.01516, over 14985.00 frames. ], tot_loss[loss=0.1514, simple_loss=0.1495, pruned_loss=0.06317, audio_tagging_loss=0.01349, over 3041092.41 frames. ], batch size: 57, lr: 3.85e-02, grad_scale: 32.0 2023-11-18 03:55:21,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=46466.666666666664, ans=0.125 2023-11-18 03:55:33,300 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.65 vs. limit=10.0 2023-11-18 03:55:36,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=46533.333333333336, ans=0.025 2023-11-18 03:55:55,780 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7000, loss[loss=0.1428, simple_loss=0.1388, pruned_loss=0.05634, audio_tagging_loss=0.01701, over 15832.00 frames. ], tot_loss[loss=0.1526, simple_loss=0.1504, pruned_loss=0.06378, audio_tagging_loss=0.0136, over 3049104.23 frames. ], batch size: 62, lr: 3.85e-02, grad_scale: 32.0 2023-11-18 03:55:59,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=46666.666666666664, ans=0.125 2023-11-18 03:56:02,185 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.771e+01 1.138e+02 1.312e+02 1.485e+02 2.708e+02, threshold=2.623e+02, percent-clipped=2.0 2023-11-18 03:56:03,911 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.48 vs. limit=10.0 2023-11-18 03:56:06,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=46733.333333333336, ans=0.0 2023-11-18 03:56:15,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=46733.333333333336, ans=0.125 2023-11-18 03:56:25,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=46800.0, ans=0.125 2023-11-18 03:56:51,712 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7050, loss[loss=0.147, simple_loss=0.1527, pruned_loss=0.05685, audio_tagging_loss=0.0138, over 15415.00 frames. ], tot_loss[loss=0.1509, simple_loss=0.1485, pruned_loss=0.06293, audio_tagging_loss=0.01375, over 3040246.95 frames. ], batch size: 57, lr: 3.84e-02, grad_scale: 32.0 2023-11-18 03:56:51,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=47000.0, ans=0.1 2023-11-18 03:56:57,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=47000.0, ans=0.0 2023-11-18 03:57:05,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=47066.666666666664, ans=0.125 2023-11-18 03:57:13,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=47133.333333333336, ans=0.000623188405797101 2023-11-18 03:57:16,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=47133.333333333336, ans=0.1 2023-11-18 03:57:38,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=47266.666666666664, ans=0.0 2023-11-18 03:57:42,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=47266.666666666664, ans=0.04949747468305833 2023-11-18 03:57:47,900 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7100, loss[loss=0.143, simple_loss=0.1393, pruned_loss=0.06023, audio_tagging_loss=0.01317, over 14050.00 frames. ], tot_loss[loss=0.1513, simple_loss=0.1492, pruned_loss=0.06282, audio_tagging_loss=0.01389, over 3032720.24 frames. ], batch size: 56, lr: 3.83e-02, grad_scale: 32.0 2023-11-18 03:57:55,370 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.676e+01 1.058e+02 1.182e+02 1.391e+02 1.929e+02, threshold=2.364e+02, percent-clipped=0.0 2023-11-18 03:58:09,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=47400.0, ans=0.2 2023-11-18 03:58:10,706 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.65 vs. limit=22.5 2023-11-18 03:58:15,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=47466.666666666664, ans=0.0 2023-11-18 03:58:20,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=47466.666666666664, ans=6.0 2023-11-18 03:58:30,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=47533.333333333336, ans=0.0005362318840579708 2023-11-18 03:58:38,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=47600.0, ans=0.0 2023-11-18 03:58:45,124 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7150, loss[loss=0.1482, simple_loss=0.1478, pruned_loss=0.06279, audio_tagging_loss=0.01155, over 15558.00 frames. ], tot_loss[loss=0.1518, simple_loss=0.15, pruned_loss=0.06298, audio_tagging_loss=0.01383, over 3043299.39 frames. ], batch size: 56, lr: 3.83e-02, grad_scale: 32.0 2023-11-18 03:58:46,434 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.652e+00 2023-11-18 03:58:50,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=47666.666666666664, ans=0.125 2023-11-18 03:58:50,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=47666.666666666664, ans=0.125 2023-11-18 03:58:52,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=47666.666666666664, ans=0.1 2023-11-18 03:59:11,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=47800.0, ans=0.0004782608695652179 2023-11-18 03:59:16,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=47866.666666666664, ans=0.125 2023-11-18 03:59:30,742 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.42 vs. limit=6.0 2023-11-18 03:59:32,740 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.78 vs. limit=22.5 2023-11-18 03:59:37,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=47933.333333333336, ans=0.125 2023-11-18 03:59:40,936 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7200, loss[loss=0.15, simple_loss=0.1408, pruned_loss=0.06157, audio_tagging_loss=0.01803, over 16375.00 frames. ], tot_loss[loss=0.1522, simple_loss=0.1505, pruned_loss=0.06306, audio_tagging_loss=0.01391, over 3034318.69 frames. ], batch size: 61, lr: 3.82e-02, grad_scale: 32.0 2023-11-18 03:59:41,441 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.57 vs. limit=15.0 2023-11-18 03:59:47,300 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 1.038e+02 1.215e+02 1.416e+02 1.908e+02, threshold=2.429e+02, percent-clipped=0.0 2023-11-18 03:59:50,053 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.30 vs. limit=15.0 2023-11-18 03:59:54,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=48066.666666666664, ans=0.125 2023-11-18 03:59:55,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=48066.666666666664, ans=0.000420289855072465 2023-11-18 04:00:10,425 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.33 vs. limit=12.0 2023-11-18 04:00:17,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=48200.0, ans=0.125 2023-11-18 04:00:36,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=48333.333333333336, ans=0.0 2023-11-18 04:00:37,476 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7250, loss[loss=0.1678, simple_loss=0.1757, pruned_loss=0.06902, audio_tagging_loss=0.0109, over 15900.00 frames. ], tot_loss[loss=0.1532, simple_loss=0.1519, pruned_loss=0.06352, audio_tagging_loss=0.01376, over 3031246.03 frames. ], batch size: 55, lr: 3.82e-02, grad_scale: 32.0 2023-11-18 04:00:38,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=48333.333333333336, ans=0.125 2023-11-18 04:00:52,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=48400.0, ans=0.04949747468305833 2023-11-18 04:01:21,702 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2023-11-18 04:01:29,839 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.17 vs. limit=22.5 2023-11-18 04:01:34,447 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7300, loss[loss=0.1308, simple_loss=0.1255, pruned_loss=0.05354, audio_tagging_loss=0.01452, over 15441.00 frames. ], tot_loss[loss=0.1515, simple_loss=0.1503, pruned_loss=0.06265, audio_tagging_loss=0.01371, over 3029951.53 frames. ], batch size: 57, lr: 3.81e-02, grad_scale: 32.0 2023-11-18 04:01:40,951 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.111e+01 1.121e+02 1.282e+02 1.467e+02 2.763e+02, threshold=2.564e+02, percent-clipped=2.0 2023-11-18 04:01:45,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=48733.333333333336, ans=0.2 2023-11-18 04:01:48,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=48733.333333333336, ans=0.125 2023-11-18 04:01:55,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=48800.0, ans=0.0 2023-11-18 04:01:56,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=48800.0, ans=0.1 2023-11-18 04:02:30,019 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7350, loss[loss=0.1452, simple_loss=0.1458, pruned_loss=0.05951, audio_tagging_loss=0.01284, over 14496.00 frames. ], tot_loss[loss=0.1534, simple_loss=0.1525, pruned_loss=0.06375, audio_tagging_loss=0.0134, over 3031365.20 frames. ], batch size: 55, lr: 3.80e-02, grad_scale: 32.0 2023-11-18 04:02:45,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=49066.666666666664, ans=0.125 2023-11-18 04:02:46,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=49066.666666666664, ans=0.1 2023-11-18 04:03:09,279 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.73 vs. limit=15.0 2023-11-18 04:03:26,843 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7400, loss[loss=0.1242, simple_loss=0.1288, pruned_loss=0.04872, audio_tagging_loss=0.01106, over 16741.00 frames. ], tot_loss[loss=0.152, simple_loss=0.1515, pruned_loss=0.06301, audio_tagging_loss=0.01321, over 3036667.86 frames. ], batch size: 62, lr: 3.80e-02, grad_scale: 32.0 2023-11-18 04:03:33,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.201e+01 1.102e+02 1.229e+02 1.424e+02 2.293e+02, threshold=2.457e+02, percent-clipped=0.0 2023-11-18 04:03:48,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=49400.0, ans=0.1 2023-11-18 04:03:52,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=49466.666666666664, ans=0.0 2023-11-18 04:04:14,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=49600.0, ans=0.125 2023-11-18 04:04:15,551 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.10 vs. limit=12.0 2023-11-18 04:04:23,578 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7450, loss[loss=0.1425, simple_loss=0.1525, pruned_loss=0.0576, audio_tagging_loss=0.008684, over 15690.00 frames. ], tot_loss[loss=0.1519, simple_loss=0.1518, pruned_loss=0.06297, audio_tagging_loss=0.01301, over 3035509.49 frames. ], batch size: 57, lr: 3.79e-02, grad_scale: 32.0 2023-11-18 04:04:46,118 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2023-11-18 04:04:47,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=49800.0, ans=0.0 2023-11-18 04:04:56,427 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.36 vs. limit=22.5 2023-11-18 04:05:08,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=49933.333333333336, ans=0.2 2023-11-18 04:05:15,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=49933.333333333336, ans=0.125 2023-11-18 04:05:20,162 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7500, loss[loss=0.1907, simple_loss=0.192, pruned_loss=0.08385, audio_tagging_loss=0.01085, over 13897.00 frames. ], tot_loss[loss=0.151, simple_loss=0.151, pruned_loss=0.06256, audio_tagging_loss=0.01297, over 3035930.76 frames. ], batch size: 53, lr: 3.78e-02, grad_scale: 32.0 2023-11-18 04:05:20,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=50000.0, ans=0.0 2023-11-18 04:05:26,116 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2023-11-18 04:05:26,551 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 1.063e+02 1.222e+02 1.436e+02 2.018e+02, threshold=2.444e+02, percent-clipped=0.0 2023-11-18 04:05:41,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=50133.333333333336, ans=0.125 2023-11-18 04:05:41,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=50133.333333333336, ans=0.07 2023-11-18 04:06:00,122 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.48 vs. limit=15.0 2023-11-18 04:06:12,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=50266.666666666664, ans=10.0 2023-11-18 04:06:15,879 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7550, loss[loss=0.1965, simple_loss=0.1934, pruned_loss=0.08776, audio_tagging_loss=0.01206, over 15936.00 frames. ], tot_loss[loss=0.1514, simple_loss=0.1513, pruned_loss=0.06285, audio_tagging_loss=0.01289, over 3040883.61 frames. ], batch size: 59, lr: 3.78e-02, grad_scale: 32.0 2023-11-18 04:06:28,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=50400.0, ans=0.0 2023-11-18 04:06:32,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=50400.0, ans=0.125 2023-11-18 04:06:41,161 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.87 vs. limit=15.0 2023-11-18 04:06:44,031 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2023-11-18 04:07:11,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=50666.666666666664, ans=10.0 2023-11-18 04:07:12,587 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7600, loss[loss=0.1783, simple_loss=0.1795, pruned_loss=0.07626, audio_tagging_loss=0.01228, over 14843.00 frames. ], tot_loss[loss=0.1493, simple_loss=0.1486, pruned_loss=0.06189, audio_tagging_loss=0.01311, over 3040107.26 frames. ], batch size: 54, lr: 3.77e-02, grad_scale: 32.0 2023-11-18 04:07:19,600 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.452e+01 1.053e+02 1.216e+02 1.364e+02 2.093e+02, threshold=2.431e+02, percent-clipped=0.0 2023-11-18 04:07:35,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=50800.0, ans=0.125 2023-11-18 04:07:48,196 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.16 vs. limit=15.0 2023-11-18 04:07:53,985 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2023-11-18 04:07:54,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=50866.666666666664, ans=0.0 2023-11-18 04:07:59,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=50933.333333333336, ans=0.1 2023-11-18 04:08:01,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=50933.333333333336, ans=0.125 2023-11-18 04:08:04,416 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.64 vs. limit=15.0 2023-11-18 04:08:09,175 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7650, loss[loss=0.1747, simple_loss=0.1811, pruned_loss=0.07065, audio_tagging_loss=0.01353, over 17220.00 frames. ], tot_loss[loss=0.1496, simple_loss=0.1491, pruned_loss=0.06199, audio_tagging_loss=0.01304, over 3042307.51 frames. ], batch size: 62, lr: 3.77e-02, grad_scale: 32.0 2023-11-18 04:08:16,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=51000.0, ans=0.1 2023-11-18 04:08:16,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=51000.0, ans=0.0 2023-11-18 04:08:20,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=51066.666666666664, ans=0.2 2023-11-18 04:08:25,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=51066.666666666664, ans=0.1 2023-11-18 04:08:28,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=51066.666666666664, ans=0.125 2023-11-18 04:08:28,744 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.44 vs. limit=15.0 2023-11-18 04:08:36,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=51133.333333333336, ans=0.1 2023-11-18 04:08:41,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=51133.333333333336, ans=0.125 2023-11-18 04:08:52,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=51200.0, ans=0.125 2023-11-18 04:08:54,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=51266.666666666664, ans=0.125 2023-11-18 04:08:55,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=51266.666666666664, ans=0.0 2023-11-18 04:08:57,233 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2023-11-18 04:09:05,121 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7700, loss[loss=0.1858, simple_loss=0.1814, pruned_loss=0.07941, audio_tagging_loss=0.0157, over 14013.00 frames. ], tot_loss[loss=0.1494, simple_loss=0.1491, pruned_loss=0.06175, audio_tagging_loss=0.01308, over 3043787.62 frames. ], batch size: 54, lr: 3.76e-02, grad_scale: 32.0 2023-11-18 04:09:09,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=51333.333333333336, ans=0.2 2023-11-18 04:09:10,806 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=8.0 2023-11-18 04:09:12,122 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 1.072e+02 1.285e+02 1.536e+02 2.038e+02, threshold=2.570e+02, percent-clipped=0.0 2023-11-18 04:09:19,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=51400.0, ans=10.0 2023-11-18 04:09:19,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=51400.0, ans=0.2 2023-11-18 04:09:30,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=51466.666666666664, ans=0.0 2023-11-18 04:09:55,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=51600.0, ans=0.1 2023-11-18 04:10:01,773 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7750, loss[loss=0.1452, simple_loss=0.1436, pruned_loss=0.06279, audio_tagging_loss=0.01059, over 13766.00 frames. ], tot_loss[loss=0.15, simple_loss=0.1498, pruned_loss=0.06194, audio_tagging_loss=0.01315, over 3036950.47 frames. ], batch size: 55, lr: 3.75e-02, grad_scale: 64.0 2023-11-18 04:10:06,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=51666.666666666664, ans=0.125 2023-11-18 04:10:10,372 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.56 vs. limit=22.5 2023-11-18 04:10:41,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=51866.666666666664, ans=0.125 2023-11-18 04:10:43,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=51866.666666666664, ans=0.035 2023-11-18 04:10:58,483 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7800, loss[loss=0.1653, simple_loss=0.1643, pruned_loss=0.06916, audio_tagging_loss=0.01401, over 14443.00 frames. ], tot_loss[loss=0.1499, simple_loss=0.1497, pruned_loss=0.06192, audio_tagging_loss=0.01316, over 3036238.51 frames. ], batch size: 56, lr: 3.75e-02, grad_scale: 64.0 2023-11-18 04:11:04,841 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.657e+01 1.122e+02 1.272e+02 1.519e+02 2.538e+02, threshold=2.545e+02, percent-clipped=0.0 2023-11-18 04:11:12,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=52066.666666666664, ans=0.125 2023-11-18 04:11:20,736 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.82 vs. limit=10.0 2023-11-18 04:11:23,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=52133.333333333336, ans=0.1 2023-11-18 04:11:34,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=52200.0, ans=0.125 2023-11-18 04:11:39,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=52200.0, ans=0.1 2023-11-18 04:11:40,046 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.30 vs. limit=10.0 2023-11-18 04:11:42,161 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.29 vs. limit=12.0 2023-11-18 04:11:51,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=52266.666666666664, ans=0.1 2023-11-18 04:11:54,966 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7850, loss[loss=0.1888, simple_loss=0.1934, pruned_loss=0.08343, audio_tagging_loss=0.008693, over 15494.00 frames. ], tot_loss[loss=0.1508, simple_loss=0.1509, pruned_loss=0.06226, audio_tagging_loss=0.01313, over 3037639.07 frames. ], batch size: 57, lr: 3.74e-02, grad_scale: 64.0 2023-11-18 04:11:55,489 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.28 vs. limit=15.0 2023-11-18 04:11:57,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=52333.333333333336, ans=0.2 2023-11-18 04:12:24,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=52466.666666666664, ans=0.125 2023-11-18 04:12:31,975 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:12:51,438 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7900, loss[loss=0.1142, simple_loss=0.1064, pruned_loss=0.04535, audio_tagging_loss=0.01562, over 15795.00 frames. ], tot_loss[loss=0.1522, simple_loss=0.1522, pruned_loss=0.06286, audio_tagging_loss=0.01327, over 3046074.50 frames. ], batch size: 59, lr: 3.73e-02, grad_scale: 64.0 2023-11-18 04:12:57,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=52666.666666666664, ans=0.125 2023-11-18 04:12:58,419 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.654e+01 1.083e+02 1.346e+02 1.574e+02 2.605e+02, threshold=2.691e+02, percent-clipped=2.0 2023-11-18 04:13:17,217 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.47 vs. limit=15.0 2023-11-18 04:13:42,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=52933.333333333336, ans=0.125 2023-11-18 04:13:43,503 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:13:43,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=52933.333333333336, ans=0.0 2023-11-18 04:13:45,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=52933.333333333336, ans=0.1 2023-11-18 04:13:47,578 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 7950, loss[loss=0.1439, simple_loss=0.1404, pruned_loss=0.05767, audio_tagging_loss=0.01603, over 14610.00 frames. ], tot_loss[loss=0.1522, simple_loss=0.1518, pruned_loss=0.06282, audio_tagging_loss=0.01343, over 3045389.49 frames. ], batch size: 55, lr: 3.73e-02, grad_scale: 64.0 2023-11-18 04:13:53,303 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2023-11-18 04:14:01,019 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-11-18 04:14:01,651 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:14:01,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=53066.666666666664, ans=0.2 2023-11-18 04:14:01,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=53066.666666666664, ans=0.125 2023-11-18 04:14:03,326 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=15.0 2023-11-18 04:14:05,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=53066.666666666664, ans=0.125 2023-11-18 04:14:24,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=53200.0, ans=0.2 2023-11-18 04:14:28,419 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.93 vs. limit=15.0 2023-11-18 04:14:37,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=53266.666666666664, ans=0.2 2023-11-18 04:14:45,285 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8000, loss[loss=0.1501, simple_loss=0.1528, pruned_loss=0.05991, audio_tagging_loss=0.01384, over 15077.00 frames. ], tot_loss[loss=0.1511, simple_loss=0.1502, pruned_loss=0.06235, audio_tagging_loss=0.01365, over 3042114.92 frames. ], batch size: 57, lr: 3.72e-02, grad_scale: 64.0 2023-11-18 04:14:47,180 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.05 vs. limit=15.0 2023-11-18 04:14:50,735 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.22 vs. limit=15.0 2023-11-18 04:14:52,308 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.958e+01 1.064e+02 1.192e+02 1.330e+02 2.518e+02, threshold=2.384e+02, percent-clipped=0.0 2023-11-18 04:15:13,150 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.17 vs. limit=22.5 2023-11-18 04:15:21,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=53533.333333333336, ans=0.125 2023-11-18 04:15:24,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=53533.333333333336, ans=0.035 2023-11-18 04:15:28,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=53600.0, ans=0.025 2023-11-18 04:15:41,084 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8050, loss[loss=0.1241, simple_loss=0.1207, pruned_loss=0.04646, audio_tagging_loss=0.01729, over 14844.00 frames. ], tot_loss[loss=0.1495, simple_loss=0.1485, pruned_loss=0.06149, audio_tagging_loss=0.01381, over 3049505.13 frames. ], batch size: 56, lr: 3.72e-02, grad_scale: 64.0 2023-11-18 04:15:54,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=53733.333333333336, ans=0.0 2023-11-18 04:16:02,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=53800.0, ans=0.1 2023-11-18 04:16:08,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=53800.0, ans=0.2 2023-11-18 04:16:10,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=53800.0, ans=0.0 2023-11-18 04:16:25,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=53933.333333333336, ans=0.125 2023-11-18 04:16:27,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=53933.333333333336, ans=0.125 2023-11-18 04:16:30,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=53933.333333333336, ans=0.0 2023-11-18 04:16:37,486 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8100, loss[loss=0.08177, simple_loss=0.07707, pruned_loss=0.02701, audio_tagging_loss=0.01623, over 14130.00 frames. ], tot_loss[loss=0.1498, simple_loss=0.1489, pruned_loss=0.06172, audio_tagging_loss=0.01363, over 3050390.19 frames. ], batch size: 55, lr: 3.71e-02, grad_scale: 64.0 2023-11-18 04:16:40,200 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.94 vs. limit=12.0 2023-11-18 04:16:43,799 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 1.055e+02 1.175e+02 1.442e+02 1.996e+02, threshold=2.349e+02, percent-clipped=0.0 2023-11-18 04:16:45,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=54000.0, ans=0.2 2023-11-18 04:17:04,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=54133.333333333336, ans=0.2 2023-11-18 04:17:05,698 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.85 vs. limit=22.5 2023-11-18 04:17:07,507 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.85 vs. limit=6.0 2023-11-18 04:17:14,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=54200.0, ans=0.025 2023-11-18 04:17:32,894 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8150, loss[loss=0.1501, simple_loss=0.1564, pruned_loss=0.05851, audio_tagging_loss=0.01344, over 15593.00 frames. ], tot_loss[loss=0.1486, simple_loss=0.1479, pruned_loss=0.06117, audio_tagging_loss=0.0135, over 3048468.73 frames. ], batch size: 58, lr: 3.70e-02, grad_scale: 64.0 2023-11-18 04:18:04,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=54466.666666666664, ans=10.0 2023-11-18 04:18:08,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=54533.333333333336, ans=0.0 2023-11-18 04:18:23,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=54600.0, ans=0.125 2023-11-18 04:18:29,068 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8200, loss[loss=0.1201, simple_loss=0.1212, pruned_loss=0.04701, audio_tagging_loss=0.0125, over 14590.00 frames. ], tot_loss[loss=0.1485, simple_loss=0.1481, pruned_loss=0.06114, audio_tagging_loss=0.01329, over 3049372.28 frames. ], batch size: 58, lr: 3.70e-02, grad_scale: 32.0 2023-11-18 04:18:30,771 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:18:36,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=54666.666666666664, ans=0.125 2023-11-18 04:18:37,079 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.282e+01 1.076e+02 1.233e+02 1.443e+02 5.591e+02, threshold=2.467e+02, percent-clipped=1.0 2023-11-18 04:18:57,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=54800.0, ans=0.07 2023-11-18 04:19:07,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=54866.666666666664, ans=0.1 2023-11-18 04:19:21,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=54933.333333333336, ans=0.0 2023-11-18 04:19:21,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=54933.333333333336, ans=22.5 2023-11-18 04:19:25,764 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8250, loss[loss=0.1888, simple_loss=0.1859, pruned_loss=0.08246, audio_tagging_loss=0.01343, over 14985.00 frames. ], tot_loss[loss=0.1482, simple_loss=0.1478, pruned_loss=0.06111, audio_tagging_loss=0.01321, over 3047219.15 frames. ], batch size: 54, lr: 3.69e-02, grad_scale: 32.0 2023-11-18 04:19:28,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=55000.0, ans=0.125 2023-11-18 04:19:28,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=55000.0, ans=0.125 2023-11-18 04:19:30,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=55000.0, ans=0.125 2023-11-18 04:19:33,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=55000.0, ans=0.95 2023-11-18 04:19:34,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=55000.0, ans=0.1 2023-11-18 04:20:10,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=55266.666666666664, ans=15.0 2023-11-18 04:20:21,383 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8300, loss[loss=0.1943, simple_loss=0.1939, pruned_loss=0.0881, audio_tagging_loss=0.009248, over 15557.00 frames. ], tot_loss[loss=0.15, simple_loss=0.1497, pruned_loss=0.06202, audio_tagging_loss=0.01309, over 3039161.76 frames. ], batch size: 56, lr: 3.68e-02, grad_scale: 32.0 2023-11-18 04:20:22,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=55333.333333333336, ans=0.125 2023-11-18 04:20:28,704 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.605e+01 1.079e+02 1.222e+02 1.465e+02 2.413e+02, threshold=2.444e+02, percent-clipped=0.0 2023-11-18 04:20:31,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=55400.0, ans=10.0 2023-11-18 04:20:52,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=55466.666666666664, ans=0.125 2023-11-18 04:21:17,279 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8350, loss[loss=0.1552, simple_loss=0.1614, pruned_loss=0.06641, audio_tagging_loss=0.008127, over 15883.00 frames. ], tot_loss[loss=0.1505, simple_loss=0.1506, pruned_loss=0.06214, audio_tagging_loss=0.01302, over 3049937.54 frames. ], batch size: 57, lr: 3.68e-02, grad_scale: 32.0 2023-11-18 04:21:19,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=55666.666666666664, ans=0.125 2023-11-18 04:21:22,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=55666.666666666664, ans=0.125 2023-11-18 04:22:14,449 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8400, loss[loss=0.1016, simple_loss=0.1008, pruned_loss=0.03777, audio_tagging_loss=0.01341, over 15058.00 frames. ], tot_loss[loss=0.1503, simple_loss=0.1506, pruned_loss=0.06201, audio_tagging_loss=0.01303, over 3043323.29 frames. ], batch size: 60, lr: 3.67e-02, grad_scale: 32.0 2023-11-18 04:22:21,887 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.450e+01 1.072e+02 1.183e+02 1.364e+02 2.045e+02, threshold=2.367e+02, percent-clipped=0.0 2023-11-18 04:22:35,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=56133.333333333336, ans=0.125 2023-11-18 04:22:39,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=56133.333333333336, ans=0.05 2023-11-18 04:23:09,888 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8450, loss[loss=0.2021, simple_loss=0.2008, pruned_loss=0.08974, audio_tagging_loss=0.012, over 15355.00 frames. ], tot_loss[loss=0.1486, simple_loss=0.1483, pruned_loss=0.06118, audio_tagging_loss=0.01325, over 3047456.03 frames. ], batch size: 54, lr: 3.67e-02, grad_scale: 32.0 2023-11-18 04:23:31,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=56466.666666666664, ans=0.125 2023-11-18 04:23:43,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=56533.333333333336, ans=0.1 2023-11-18 04:23:47,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=56533.333333333336, ans=0.125 2023-11-18 04:23:52,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=56533.333333333336, ans=0.125 2023-11-18 04:24:05,594 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8500, loss[loss=0.07848, simple_loss=0.06965, pruned_loss=0.02922, audio_tagging_loss=0.01443, over 14246.00 frames. ], tot_loss[loss=0.149, simple_loss=0.1486, pruned_loss=0.06151, audio_tagging_loss=0.01322, over 3052958.66 frames. ], batch size: 55, lr: 3.66e-02, grad_scale: 32.0 2023-11-18 04:24:13,491 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 1.081e+02 1.253e+02 1.521e+02 2.592e+02, threshold=2.506e+02, percent-clipped=2.0 2023-11-18 04:24:15,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=56733.333333333336, ans=0.0 2023-11-18 04:24:34,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=56800.0, ans=0.125 2023-11-18 04:24:54,857 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.77 vs. limit=15.0 2023-11-18 04:25:00,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=57000.0, ans=0.125 2023-11-18 04:25:02,316 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8550, loss[loss=0.124, simple_loss=0.1249, pruned_loss=0.04967, audio_tagging_loss=0.01188, over 14456.00 frames. ], tot_loss[loss=0.1478, simple_loss=0.1476, pruned_loss=0.06076, audio_tagging_loss=0.01329, over 3045780.50 frames. ], batch size: 56, lr: 3.65e-02, grad_scale: 32.0 2023-11-18 04:25:17,578 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.03 vs. limit=15.0 2023-11-18 04:25:28,650 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.96 vs. limit=12.0 2023-11-18 04:25:40,699 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.71 vs. limit=15.0 2023-11-18 04:25:41,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=57200.0, ans=0.1 2023-11-18 04:25:58,794 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8600, loss[loss=0.1625, simple_loss=0.1624, pruned_loss=0.07028, audio_tagging_loss=0.01097, over 15216.00 frames. ], tot_loss[loss=0.1481, simple_loss=0.1477, pruned_loss=0.06085, audio_tagging_loss=0.01345, over 3046232.44 frames. ], batch size: 54, lr: 3.65e-02, grad_scale: 32.0 2023-11-18 04:25:59,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=57333.333333333336, ans=15.0 2023-11-18 04:26:06,163 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.067e+01 1.036e+02 1.166e+02 1.373e+02 2.331e+02, threshold=2.332e+02, percent-clipped=0.0 2023-11-18 04:26:07,681 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.08 vs. limit=15.0 2023-11-18 04:26:37,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=57533.333333333336, ans=0.0 2023-11-18 04:26:45,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=57600.0, ans=0.125 2023-11-18 04:26:48,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=57600.0, ans=0.0 2023-11-18 04:26:55,061 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8650, loss[loss=0.1165, simple_loss=0.1024, pruned_loss=0.04818, audio_tagging_loss=0.01708, over 15166.00 frames. ], tot_loss[loss=0.1469, simple_loss=0.1467, pruned_loss=0.06008, audio_tagging_loss=0.01352, over 3045250.63 frames. ], batch size: 57, lr: 3.64e-02, grad_scale: 32.0 2023-11-18 04:26:55,665 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.47 vs. limit=15.0 2023-11-18 04:27:16,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=57800.0, ans=0.2 2023-11-18 04:27:51,161 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8700, loss[loss=0.1239, simple_loss=0.125, pruned_loss=0.04761, audio_tagging_loss=0.01378, over 15690.00 frames. ], tot_loss[loss=0.148, simple_loss=0.1474, pruned_loss=0.0607, audio_tagging_loss=0.01355, over 3045718.67 frames. ], batch size: 59, lr: 3.64e-02, grad_scale: 32.0 2023-11-18 04:27:56,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=58000.0, ans=0.125 2023-11-18 04:27:58,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=58000.0, ans=6.0 2023-11-18 04:27:59,164 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.784e+01 1.148e+02 1.309e+02 1.555e+02 2.620e+02, threshold=2.618e+02, percent-clipped=1.0 2023-11-18 04:28:02,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=58066.666666666664, ans=0.125 2023-11-18 04:28:20,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=58133.333333333336, ans=0.125 2023-11-18 04:28:34,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=58200.0, ans=0.125 2023-11-18 04:28:36,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=58266.666666666664, ans=0.2 2023-11-18 04:28:47,666 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8750, loss[loss=0.1532, simple_loss=0.1494, pruned_loss=0.06224, audio_tagging_loss=0.01627, over 15083.00 frames. ], tot_loss[loss=0.151, simple_loss=0.1509, pruned_loss=0.062, audio_tagging_loss=0.01357, over 3043390.35 frames. ], batch size: 57, lr: 3.63e-02, grad_scale: 32.0 2023-11-18 04:28:50,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=58333.333333333336, ans=0.125 2023-11-18 04:28:57,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=58400.0, ans=0.0 2023-11-18 04:28:58,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=58400.0, ans=0.2 2023-11-18 04:29:08,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=58466.666666666664, ans=0.2 2023-11-18 04:29:14,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=58466.666666666664, ans=0.0 2023-11-18 04:29:42,977 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8800, loss[loss=0.1498, simple_loss=0.152, pruned_loss=0.05895, audio_tagging_loss=0.01481, over 13827.00 frames. ], tot_loss[loss=0.1498, simple_loss=0.1494, pruned_loss=0.06139, audio_tagging_loss=0.01371, over 3043616.66 frames. ], batch size: 52, lr: 3.62e-02, grad_scale: 32.0 2023-11-18 04:29:50,757 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 9.076e+01 1.175e+02 1.354e+02 1.562e+02 2.721e+02, threshold=2.708e+02, percent-clipped=1.0 2023-11-18 04:30:21,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=58866.666666666664, ans=0.07 2023-11-18 04:30:39,313 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8850, loss[loss=0.1415, simple_loss=0.1385, pruned_loss=0.05904, audio_tagging_loss=0.01316, over 14554.00 frames. ], tot_loss[loss=0.1483, simple_loss=0.148, pruned_loss=0.06057, audio_tagging_loss=0.0137, over 3041483.98 frames. ], batch size: 55, lr: 3.62e-02, grad_scale: 32.0 2023-11-18 04:30:47,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=59000.0, ans=0.125 2023-11-18 04:30:51,661 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:31:25,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=59266.666666666664, ans=0.2 2023-11-18 04:31:25,743 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.96 vs. limit=22.5 2023-11-18 04:31:35,259 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8900, loss[loss=0.1381, simple_loss=0.1501, pruned_loss=0.05279, audio_tagging_loss=0.01021, over 14409.00 frames. ], tot_loss[loss=0.1481, simple_loss=0.1482, pruned_loss=0.06056, audio_tagging_loss=0.01341, over 3043290.76 frames. ], batch size: 55, lr: 3.61e-02, grad_scale: 32.0 2023-11-18 04:31:39,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=59333.333333333336, ans=0.125 2023-11-18 04:31:43,298 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.181e+01 1.040e+02 1.138e+02 1.318e+02 1.926e+02, threshold=2.277e+02, percent-clipped=0.0 2023-11-18 04:32:16,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=59533.333333333336, ans=0.125 2023-11-18 04:32:18,241 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2023-11-18 04:32:24,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=59600.0, ans=0.125 2023-11-18 04:32:30,350 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.41 vs. limit=15.0 2023-11-18 04:32:30,877 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 8950, loss[loss=0.1346, simple_loss=0.139, pruned_loss=0.05234, audio_tagging_loss=0.01279, over 14821.00 frames. ], tot_loss[loss=0.1479, simple_loss=0.1485, pruned_loss=0.06048, audio_tagging_loss=0.01324, over 3044560.87 frames. ], batch size: 56, lr: 3.61e-02, grad_scale: 32.0 2023-11-18 04:32:46,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=59733.333333333336, ans=0.1 2023-11-18 04:32:49,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=59733.333333333336, ans=0.125 2023-11-18 04:32:49,587 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.60 vs. limit=15.0 2023-11-18 04:32:53,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=59800.0, ans=0.125 2023-11-18 04:33:10,455 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.97 vs. limit=12.0 2023-11-18 04:33:12,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=59866.666666666664, ans=0.0 2023-11-18 04:33:12,573 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.16 vs. limit=15.0 2023-11-18 04:33:21,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=59933.333333333336, ans=0.07 2023-11-18 04:33:26,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=60000.0, ans=0.0 2023-11-18 04:33:27,436 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9000, loss[loss=0.1495, simple_loss=0.1574, pruned_loss=0.06022, audio_tagging_loss=0.01059, over 14141.00 frames. ], tot_loss[loss=0.1475, simple_loss=0.1481, pruned_loss=0.06036, audio_tagging_loss=0.01313, over 3055324.69 frames. ], batch size: 53, lr: 3.60e-02, grad_scale: 32.0 2023-11-18 04:33:27,437 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 04:33:48,822 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8301, 5.8537, 5.9321, 5.9766], device='cuda:2') 2023-11-18 04:34:01,196 INFO [train_asr.py:1147] (2/4) Epoch 1, validation: loss=0.0967, simple_loss=0.07481, pruned_loss=0.01931, audio_tagging_loss=0.03999, over 4681554.00 frames. 2023-11-18 04:34:01,196 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 04:34:09,102 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 1.047e+02 1.193e+02 1.407e+02 2.407e+02, threshold=2.385e+02, percent-clipped=1.0 2023-11-18 04:34:12,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=60066.666666666664, ans=0.125 2023-11-18 04:34:16,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=60066.666666666664, ans=0.1 2023-11-18 04:34:21,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=60066.666666666664, ans=0.125 2023-11-18 04:34:31,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=60133.333333333336, ans=0.125 2023-11-18 04:34:46,187 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.55 vs. limit=12.0 2023-11-18 04:34:57,626 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9050, loss[loss=0.1314, simple_loss=0.1364, pruned_loss=0.05261, audio_tagging_loss=0.01058, over 15384.00 frames. ], tot_loss[loss=0.1479, simple_loss=0.1483, pruned_loss=0.06059, audio_tagging_loss=0.01311, over 3061431.43 frames. ], batch size: 58, lr: 3.59e-02, grad_scale: 32.0 2023-11-18 04:35:04,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=60333.333333333336, ans=0.125 2023-11-18 04:35:14,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=60400.0, ans=0.125 2023-11-18 04:35:21,481 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.99 vs. limit=15.0 2023-11-18 04:35:24,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=60466.666666666664, ans=0.125 2023-11-18 04:35:28,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=60466.666666666664, ans=0.1 2023-11-18 04:35:28,790 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.44 vs. limit=15.0 2023-11-18 04:35:30,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=60533.333333333336, ans=0.125 2023-11-18 04:35:47,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=60600.0, ans=0.2 2023-11-18 04:35:52,805 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9100, loss[loss=0.159, simple_loss=0.1742, pruned_loss=0.06339, audio_tagging_loss=0.008475, over 14710.00 frames. ], tot_loss[loss=0.147, simple_loss=0.1472, pruned_loss=0.06028, audio_tagging_loss=0.0131, over 3062917.87 frames. ], batch size: 57, lr: 3.59e-02, grad_scale: 32.0 2023-11-18 04:35:56,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=60666.666666666664, ans=0.1 2023-11-18 04:36:00,844 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.380e+01 1.098e+02 1.291e+02 1.456e+02 2.208e+02, threshold=2.583e+02, percent-clipped=0.0 2023-11-18 04:36:02,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=60666.666666666664, ans=0.0 2023-11-18 04:36:08,924 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.62 vs. limit=22.5 2023-11-18 04:36:11,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=60733.333333333336, ans=0.125 2023-11-18 04:36:30,872 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.57 vs. limit=15.0 2023-11-18 04:36:31,972 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=12.0 2023-11-18 04:36:35,851 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=6.769e+00 2023-11-18 04:36:43,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=60933.333333333336, ans=0.1 2023-11-18 04:36:48,965 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9150, loss[loss=0.1472, simple_loss=0.1549, pruned_loss=0.05834, audio_tagging_loss=0.01145, over 14999.00 frames. ], tot_loss[loss=0.1462, simple_loss=0.1466, pruned_loss=0.05985, audio_tagging_loss=0.01302, over 3054209.75 frames. ], batch size: 57, lr: 3.58e-02, grad_scale: 32.0 2023-11-18 04:36:50,770 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.91 vs. limit=22.5 2023-11-18 04:37:08,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=61066.666666666664, ans=0.125 2023-11-18 04:37:13,506 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.54 vs. limit=22.5 2023-11-18 04:37:26,983 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.89 vs. limit=15.0 2023-11-18 04:37:32,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=61266.666666666664, ans=0.125 2023-11-18 04:37:41,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=61266.666666666664, ans=0.125 2023-11-18 04:37:44,926 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9200, loss[loss=0.132, simple_loss=0.1292, pruned_loss=0.05244, audio_tagging_loss=0.0149, over 15393.00 frames. ], tot_loss[loss=0.1474, simple_loss=0.1479, pruned_loss=0.06049, audio_tagging_loss=0.013, over 3051945.46 frames. ], batch size: 57, lr: 3.58e-02, grad_scale: 32.0 2023-11-18 04:37:45,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=61333.333333333336, ans=0.125 2023-11-18 04:37:47,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=61333.333333333336, ans=0.1 2023-11-18 04:37:48,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=61333.333333333336, ans=0.125 2023-11-18 04:37:52,955 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.523e+01 1.127e+02 1.318e+02 1.536e+02 2.303e+02, threshold=2.636e+02, percent-clipped=0.0 2023-11-18 04:38:02,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=61400.0, ans=0.125 2023-11-18 04:38:08,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=61466.666666666664, ans=0.07 2023-11-18 04:38:11,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=61466.666666666664, ans=0.025 2023-11-18 04:38:19,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=61533.333333333336, ans=0.02 2023-11-18 04:38:29,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=61600.0, ans=0.2 2023-11-18 04:38:41,845 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9250, loss[loss=0.136, simple_loss=0.1322, pruned_loss=0.05383, audio_tagging_loss=0.01607, over 15156.00 frames. ], tot_loss[loss=0.1461, simple_loss=0.1466, pruned_loss=0.05969, audio_tagging_loss=0.01306, over 3047562.61 frames. ], batch size: 55, lr: 3.57e-02, grad_scale: 32.0 2023-11-18 04:38:46,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=61666.666666666664, ans=0.0 2023-11-18 04:38:47,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=61666.666666666664, ans=0.125 2023-11-18 04:39:09,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=61800.0, ans=0.125 2023-11-18 04:39:10,074 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.90 vs. limit=6.0 2023-11-18 04:39:11,106 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2023-11-18 04:39:18,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=61866.666666666664, ans=0.09899494936611666 2023-11-18 04:39:27,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=61933.333333333336, ans=0.125 2023-11-18 04:39:37,694 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9300, loss[loss=0.1161, simple_loss=0.105, pruned_loss=0.0473, audio_tagging_loss=0.01626, over 14684.00 frames. ], tot_loss[loss=0.1438, simple_loss=0.144, pruned_loss=0.0586, audio_tagging_loss=0.01322, over 3045102.01 frames. ], batch size: 57, lr: 3.57e-02, grad_scale: 32.0 2023-11-18 04:39:45,029 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.224e+01 1.082e+02 1.160e+02 1.352e+02 1.912e+02, threshold=2.319e+02, percent-clipped=0.0 2023-11-18 04:39:45,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=62000.0, ans=0.0 2023-11-18 04:39:49,883 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.98 vs. limit=12.0 2023-11-18 04:39:57,809 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.77 vs. limit=22.5 2023-11-18 04:40:01,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=62133.333333333336, ans=0.125 2023-11-18 04:40:05,290 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.24 vs. limit=12.0 2023-11-18 04:40:18,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=62200.0, ans=0.0 2023-11-18 04:40:22,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=62266.666666666664, ans=0.125 2023-11-18 04:40:25,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=62266.666666666664, ans=0.0 2023-11-18 04:40:28,169 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.69 vs. limit=15.0 2023-11-18 04:40:32,984 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9350, loss[loss=0.1589, simple_loss=0.1673, pruned_loss=0.0632, audio_tagging_loss=0.01212, over 14883.00 frames. ], tot_loss[loss=0.1448, simple_loss=0.1454, pruned_loss=0.05892, audio_tagging_loss=0.01314, over 3047366.36 frames. ], batch size: 55, lr: 3.56e-02, grad_scale: 32.0 2023-11-18 04:40:36,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=62333.333333333336, ans=0.0 2023-11-18 04:40:38,436 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.54 vs. limit=15.0 2023-11-18 04:40:44,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=62400.0, ans=0.2 2023-11-18 04:40:52,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=62400.0, ans=0.0 2023-11-18 04:41:03,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=62466.666666666664, ans=0.125 2023-11-18 04:41:08,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=62533.333333333336, ans=0.09899494936611666 2023-11-18 04:41:29,989 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9400, loss[loss=0.1109, simple_loss=0.1055, pruned_loss=0.0429, audio_tagging_loss=0.0152, over 14957.00 frames. ], tot_loss[loss=0.1456, simple_loss=0.1459, pruned_loss=0.05936, audio_tagging_loss=0.01331, over 3040034.78 frames. ], batch size: 58, lr: 3.55e-02, grad_scale: 32.0 2023-11-18 04:41:33,496 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:41:37,994 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.990e+01 1.022e+02 1.168e+02 1.353e+02 2.252e+02, threshold=2.336e+02, percent-clipped=0.0 2023-11-18 04:41:43,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=62733.333333333336, ans=0.0 2023-11-18 04:41:43,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=62733.333333333336, ans=0.125 2023-11-18 04:41:48,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=62733.333333333336, ans=0.5 2023-11-18 04:41:55,626 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.29 vs. limit=15.0 2023-11-18 04:42:01,350 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.85 vs. limit=6.0 2023-11-18 04:42:07,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=62866.666666666664, ans=0.2 2023-11-18 04:42:17,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=62933.333333333336, ans=0.0 2023-11-18 04:42:19,738 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=9.315e+00 2023-11-18 04:42:25,905 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9450, loss[loss=0.1183, simple_loss=0.1132, pruned_loss=0.0451, audio_tagging_loss=0.01659, over 14172.00 frames. ], tot_loss[loss=0.1469, simple_loss=0.1474, pruned_loss=0.05989, audio_tagging_loss=0.01335, over 3040771.08 frames. ], batch size: 56, lr: 3.55e-02, grad_scale: 32.0 2023-11-18 04:42:25,917 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:42:32,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=63000.0, ans=0.0 2023-11-18 04:42:45,247 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.30 vs. limit=22.5 2023-11-18 04:42:52,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=63133.333333333336, ans=0.125 2023-11-18 04:43:05,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=63200.0, ans=0.125 2023-11-18 04:43:06,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=63200.0, ans=0.0 2023-11-18 04:43:21,285 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9500, loss[loss=0.1566, simple_loss=0.1505, pruned_loss=0.06531, audio_tagging_loss=0.01602, over 15921.00 frames. ], tot_loss[loss=0.1473, simple_loss=0.1478, pruned_loss=0.06015, audio_tagging_loss=0.01332, over 3041836.62 frames. ], batch size: 60, lr: 3.54e-02, grad_scale: 32.0 2023-11-18 04:43:29,296 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.749e+01 1.117e+02 1.291e+02 1.412e+02 2.358e+02, threshold=2.583e+02, percent-clipped=1.0 2023-11-18 04:43:37,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=63400.0, ans=0.1 2023-11-18 04:43:38,009 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.12 vs. limit=6.0 2023-11-18 04:43:41,706 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.72 vs. limit=12.0 2023-11-18 04:43:43,768 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.85 vs. limit=6.0 2023-11-18 04:43:51,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=63466.666666666664, ans=0.025 2023-11-18 04:43:54,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=63533.333333333336, ans=0.05 2023-11-18 04:43:57,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=63533.333333333336, ans=0.1 2023-11-18 04:43:58,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=63533.333333333336, ans=0.125 2023-11-18 04:44:10,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=63600.0, ans=0.125 2023-11-18 04:44:15,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=63600.0, ans=0.125 2023-11-18 04:44:17,716 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9550, loss[loss=0.1463, simple_loss=0.1412, pruned_loss=0.0609, audio_tagging_loss=0.01481, over 14488.00 frames. ], tot_loss[loss=0.1475, simple_loss=0.148, pruned_loss=0.06007, audio_tagging_loss=0.01345, over 3040976.38 frames. ], batch size: 57, lr: 3.54e-02, grad_scale: 32.0 2023-11-18 04:44:23,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=63666.666666666664, ans=0.07 2023-11-18 04:44:31,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=63733.333333333336, ans=0.125 2023-11-18 04:44:44,827 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.67 vs. limit=15.0 2023-11-18 04:44:47,117 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.76 vs. limit=22.5 2023-11-18 04:45:11,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=63933.333333333336, ans=0.0 2023-11-18 04:45:14,619 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9600, loss[loss=0.1848, simple_loss=0.184, pruned_loss=0.08344, audio_tagging_loss=0.009342, over 15946.00 frames. ], tot_loss[loss=0.1463, simple_loss=0.1467, pruned_loss=0.05949, audio_tagging_loss=0.0135, over 3034907.31 frames. ], batch size: 60, lr: 3.53e-02, grad_scale: 32.0 2023-11-18 04:45:22,001 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 1.076e+02 1.212e+02 1.383e+02 1.987e+02, threshold=2.424e+02, percent-clipped=0.0 2023-11-18 04:45:27,631 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:45:35,899 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.62 vs. limit=6.0 2023-11-18 04:45:48,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=64200.0, ans=0.125 2023-11-18 04:45:48,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=64200.0, ans=0.0 2023-11-18 04:45:49,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=64200.0, ans=0.0 2023-11-18 04:46:03,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=64266.666666666664, ans=0.2 2023-11-18 04:46:09,778 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9650, loss[loss=0.1326, simple_loss=0.1291, pruned_loss=0.05179, audio_tagging_loss=0.01626, over 14654.00 frames. ], tot_loss[loss=0.1461, simple_loss=0.1465, pruned_loss=0.05933, audio_tagging_loss=0.01347, over 3040977.37 frames. ], batch size: 56, lr: 3.53e-02, grad_scale: 32.0 2023-11-18 04:46:19,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=64333.333333333336, ans=0.0 2023-11-18 04:46:19,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=64333.333333333336, ans=0.1 2023-11-18 04:46:29,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=64400.0, ans=0.04949747468305833 2023-11-18 04:46:31,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=64466.666666666664, ans=0.1 2023-11-18 04:46:33,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=64466.666666666664, ans=0.125 2023-11-18 04:46:36,975 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.20 vs. limit=15.0 2023-11-18 04:46:46,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=64533.333333333336, ans=0.125 2023-11-18 04:46:49,346 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:47:06,182 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9700, loss[loss=0.1237, simple_loss=0.1293, pruned_loss=0.04816, audio_tagging_loss=0.01091, over 15733.00 frames. ], tot_loss[loss=0.1455, simple_loss=0.1464, pruned_loss=0.05907, audio_tagging_loss=0.01326, over 3042212.57 frames. ], batch size: 61, lr: 3.52e-02, grad_scale: 32.0 2023-11-18 04:47:13,617 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.844e+01 1.077e+02 1.252e+02 1.398e+02 2.198e+02, threshold=2.504e+02, percent-clipped=0.0 2023-11-18 04:47:21,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=64733.333333333336, ans=0.125 2023-11-18 04:47:34,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=64800.0, ans=0.2 2023-11-18 04:47:43,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=64866.666666666664, ans=0.1 2023-11-18 04:48:01,905 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9750, loss[loss=0.1488, simple_loss=0.1557, pruned_loss=0.05726, audio_tagging_loss=0.01365, over 14787.00 frames. ], tot_loss[loss=0.1439, simple_loss=0.1448, pruned_loss=0.05832, audio_tagging_loss=0.01323, over 3034228.12 frames. ], batch size: 57, lr: 3.51e-02, grad_scale: 32.0 2023-11-18 04:48:02,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=65000.0, ans=0.0 2023-11-18 04:48:10,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65000.0, ans=0.1 2023-11-18 04:48:21,792 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2023-11-18 04:48:27,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=65133.333333333336, ans=0.05 2023-11-18 04:48:39,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=65200.0, ans=0.125 2023-11-18 04:48:57,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=65333.333333333336, ans=0.2 2023-11-18 04:48:58,258 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9800, loss[loss=0.1323, simple_loss=0.129, pruned_loss=0.05556, audio_tagging_loss=0.01231, over 14531.00 frames. ], tot_loss[loss=0.1445, simple_loss=0.1455, pruned_loss=0.05859, audio_tagging_loss=0.01315, over 3032527.16 frames. ], batch size: 56, lr: 3.51e-02, grad_scale: 32.0 2023-11-18 04:49:01,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=65333.333333333336, ans=0.95 2023-11-18 04:49:02,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=65333.333333333336, ans=0.125 2023-11-18 04:49:03,188 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.21 vs. limit=22.5 2023-11-18 04:49:06,098 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.876e+01 1.068e+02 1.214e+02 1.427e+02 2.483e+02, threshold=2.428e+02, percent-clipped=0.0 2023-11-18 04:49:06,677 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.81 vs. limit=15.0 2023-11-18 04:49:07,439 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:49:16,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=65400.0, ans=0.2 2023-11-18 04:49:48,837 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:49:54,135 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9850, loss[loss=0.1585, simple_loss=0.1542, pruned_loss=0.07002, audio_tagging_loss=0.01138, over 15403.00 frames. ], tot_loss[loss=0.1448, simple_loss=0.1459, pruned_loss=0.05878, audio_tagging_loss=0.01307, over 3037636.33 frames. ], batch size: 57, lr: 3.50e-02, grad_scale: 32.0 2023-11-18 04:49:58,599 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.20 vs. limit=15.0 2023-11-18 04:49:59,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=65666.66666666667, ans=0.1 2023-11-18 04:50:01,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=65666.66666666667, ans=0.2 2023-11-18 04:50:03,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=65666.66666666667, ans=0.125 2023-11-18 04:50:07,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=65733.33333333333, ans=0.125 2023-11-18 04:50:08,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=65733.33333333333, ans=0.1 2023-11-18 04:50:13,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=65733.33333333333, ans=0.0 2023-11-18 04:50:13,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=65733.33333333333, ans=0.125 2023-11-18 04:50:23,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=65800.0, ans=0.125 2023-11-18 04:50:50,827 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9900, loss[loss=0.109, simple_loss=0.1034, pruned_loss=0.04195, audio_tagging_loss=0.0153, over 15113.00 frames. ], tot_loss[loss=0.1445, simple_loss=0.1458, pruned_loss=0.05856, audio_tagging_loss=0.01302, over 3038113.89 frames. ], batch size: 58, lr: 3.50e-02, grad_scale: 32.0 2023-11-18 04:50:58,263 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.989e+01 1.084e+02 1.192e+02 1.374e+02 2.032e+02, threshold=2.383e+02, percent-clipped=0.0 2023-11-18 04:50:58,808 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.20 vs. limit=15.0 2023-11-18 04:51:18,121 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=15.0 2023-11-18 04:51:46,592 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 9950, loss[loss=0.135, simple_loss=0.1272, pruned_loss=0.05668, audio_tagging_loss=0.0147, over 14722.00 frames. ], tot_loss[loss=0.144, simple_loss=0.1452, pruned_loss=0.05837, audio_tagging_loss=0.01308, over 3041802.87 frames. ], batch size: 56, lr: 3.49e-02, grad_scale: 32.0 2023-11-18 04:51:52,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=66333.33333333333, ans=0.125 2023-11-18 04:52:18,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=66466.66666666667, ans=0.2 2023-11-18 04:52:19,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=66533.33333333333, ans=0.125 2023-11-18 04:52:25,794 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:52:43,101 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10000, loss[loss=0.1113, simple_loss=0.113, pruned_loss=0.04261, audio_tagging_loss=0.01224, over 15036.00 frames. ], tot_loss[loss=0.1445, simple_loss=0.1458, pruned_loss=0.05851, audio_tagging_loss=0.01306, over 3043182.03 frames. ], batch size: 55, lr: 3.49e-02, grad_scale: 32.0 2023-11-18 04:52:46,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=66666.66666666667, ans=0.2 2023-11-18 04:52:46,771 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.50 vs. limit=22.5 2023-11-18 04:52:50,996 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.963e+01 1.074e+02 1.249e+02 1.429e+02 2.064e+02, threshold=2.499e+02, percent-clipped=0.0 2023-11-18 04:53:02,890 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=15.0 2023-11-18 04:53:17,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=66866.66666666667, ans=0.0 2023-11-18 04:53:20,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=66866.66666666667, ans=0.125 2023-11-18 04:53:25,197 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.66 vs. limit=15.0 2023-11-18 04:53:36,700 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=4.78 vs. limit=15.0 2023-11-18 04:53:39,077 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10050, loss[loss=0.1274, simple_loss=0.1311, pruned_loss=0.05121, audio_tagging_loss=0.01063, over 15597.00 frames. ], tot_loss[loss=0.144, simple_loss=0.1451, pruned_loss=0.05826, audio_tagging_loss=0.01316, over 3041305.66 frames. ], batch size: 57, lr: 3.48e-02, grad_scale: 32.0 2023-11-18 04:53:54,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=67066.66666666667, ans=0.125 2023-11-18 04:53:59,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=67066.66666666667, ans=6.0 2023-11-18 04:54:06,868 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2023-11-18 04:54:15,748 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.72 vs. limit=22.5 2023-11-18 04:54:23,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=67266.66666666667, ans=0.1 2023-11-18 04:54:24,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=67266.66666666667, ans=0.125 2023-11-18 04:54:30,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=67266.66666666667, ans=0.125 2023-11-18 04:54:30,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=67266.66666666667, ans=0.125 2023-11-18 04:54:34,772 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10100, loss[loss=0.176, simple_loss=0.1806, pruned_loss=0.07318, audio_tagging_loss=0.01253, over 14760.00 frames. ], tot_loss[loss=0.1432, simple_loss=0.1443, pruned_loss=0.05774, audio_tagging_loss=0.01335, over 3056768.22 frames. ], batch size: 55, lr: 3.47e-02, grad_scale: 32.0 2023-11-18 04:54:42,865 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.958e+01 1.070e+02 1.242e+02 1.409e+02 2.518e+02, threshold=2.485e+02, percent-clipped=1.0 2023-11-18 04:54:51,736 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.93 vs. limit=5.0 2023-11-18 04:54:53,588 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.32 vs. limit=22.5 2023-11-18 04:55:14,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=67533.33333333333, ans=0.1 2023-11-18 04:55:20,747 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:55:27,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=67600.0, ans=0.125 2023-11-18 04:55:30,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=67666.66666666667, ans=0.125 2023-11-18 04:55:31,490 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10150, loss[loss=0.1257, simple_loss=0.128, pruned_loss=0.04669, audio_tagging_loss=0.015, over 13760.00 frames. ], tot_loss[loss=0.1438, simple_loss=0.145, pruned_loss=0.05797, audio_tagging_loss=0.01328, over 3055128.96 frames. ], batch size: 54, lr: 3.47e-02, grad_scale: 32.0 2023-11-18 04:55:47,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=67733.33333333333, ans=0.125 2023-11-18 04:55:48,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=67733.33333333333, ans=0.125 2023-11-18 04:55:49,242 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.02 vs. limit=10.0 2023-11-18 04:55:51,472 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.11 vs. limit=12.0 2023-11-18 04:55:59,258 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:56:16,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=67933.33333333333, ans=0.125 2023-11-18 04:56:17,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=67933.33333333333, ans=0.04949747468305833 2023-11-18 04:56:18,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=67933.33333333333, ans=22.5 2023-11-18 04:56:26,258 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.74 vs. limit=15.0 2023-11-18 04:56:27,844 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10200, loss[loss=0.1523, simple_loss=0.1467, pruned_loss=0.06081, audio_tagging_loss=0.01818, over 14928.00 frames. ], tot_loss[loss=0.1433, simple_loss=0.1442, pruned_loss=0.05772, audio_tagging_loss=0.01348, over 3054751.08 frames. ], batch size: 56, lr: 3.46e-02, grad_scale: 64.0 2023-11-18 04:56:32,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=68000.0, ans=0.125 2023-11-18 04:56:32,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=68000.0, ans=0.125 2023-11-18 04:56:35,839 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.974e+01 1.095e+02 1.241e+02 1.478e+02 2.822e+02, threshold=2.482e+02, percent-clipped=1.0 2023-11-18 04:56:48,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=68133.33333333333, ans=0.0 2023-11-18 04:56:49,786 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:56:58,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=68133.33333333333, ans=0.0 2023-11-18 04:57:07,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=68200.0, ans=0.0 2023-11-18 04:57:23,707 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10250, loss[loss=0.1526, simple_loss=0.1618, pruned_loss=0.05914, audio_tagging_loss=0.01257, over 15449.00 frames. ], tot_loss[loss=0.1451, simple_loss=0.1461, pruned_loss=0.05857, audio_tagging_loss=0.01353, over 3059060.02 frames. ], batch size: 54, lr: 3.46e-02, grad_scale: 64.0 2023-11-18 04:57:25,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=68333.33333333333, ans=0.125 2023-11-18 04:57:35,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=68400.0, ans=0.2 2023-11-18 04:57:47,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=68466.66666666667, ans=0.0 2023-11-18 04:58:19,294 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10300, loss[loss=0.1292, simple_loss=0.1287, pruned_loss=0.05069, audio_tagging_loss=0.01415, over 15590.00 frames. ], tot_loss[loss=0.1448, simple_loss=0.1458, pruned_loss=0.05827, audio_tagging_loss=0.01364, over 3054053.35 frames. ], batch size: 60, lr: 3.45e-02, grad_scale: 64.0 2023-11-18 04:58:27,351 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.697e+01 1.063e+02 1.210e+02 1.437e+02 2.016e+02, threshold=2.421e+02, percent-clipped=0.0 2023-11-18 04:58:31,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=68733.33333333333, ans=0.0 2023-11-18 04:58:37,165 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2023-11-18 04:58:42,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=68800.0, ans=0.07 2023-11-18 04:58:48,625 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.41 vs. limit=15.0 2023-11-18 04:58:52,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=68866.66666666667, ans=0.0 2023-11-18 04:58:55,178 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.02 vs. limit=15.0 2023-11-18 04:59:15,525 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10350, loss[loss=0.1069, simple_loss=0.1167, pruned_loss=0.03563, audio_tagging_loss=0.01291, over 14882.00 frames. ], tot_loss[loss=0.1454, simple_loss=0.1461, pruned_loss=0.05846, audio_tagging_loss=0.01386, over 3056421.23 frames. ], batch size: 57, lr: 3.45e-02, grad_scale: 64.0 2023-11-18 04:59:25,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=69066.66666666667, ans=0.1 2023-11-18 04:59:28,931 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:59:30,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=69066.66666666667, ans=0.125 2023-11-18 04:59:32,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=69066.66666666667, ans=0.2 2023-11-18 04:59:37,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=69133.33333333333, ans=0.05 2023-11-18 04:59:54,164 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.79 vs. limit=15.0 2023-11-18 05:00:11,310 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10400, loss[loss=0.1672, simple_loss=0.1633, pruned_loss=0.07337, audio_tagging_loss=0.01222, over 15402.00 frames. ], tot_loss[loss=0.1466, simple_loss=0.1473, pruned_loss=0.05912, audio_tagging_loss=0.0138, over 3059689.41 frames. ], batch size: 59, lr: 3.44e-02, grad_scale: 64.0 2023-11-18 05:00:16,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=69333.33333333333, ans=0.125 2023-11-18 05:00:18,711 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.147e+01 1.054e+02 1.220e+02 1.352e+02 2.408e+02, threshold=2.441e+02, percent-clipped=0.0 2023-11-18 05:00:44,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=69533.33333333333, ans=0.0 2023-11-18 05:00:50,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=69533.33333333333, ans=0.0 2023-11-18 05:00:54,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=69533.33333333333, ans=0.0 2023-11-18 05:00:58,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=69600.0, ans=0.0 2023-11-18 05:01:06,931 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10450, loss[loss=0.1512, simple_loss=0.1513, pruned_loss=0.06498, audio_tagging_loss=0.01059, over 15763.00 frames. ], tot_loss[loss=0.1463, simple_loss=0.1473, pruned_loss=0.05895, audio_tagging_loss=0.01365, over 3049494.18 frames. ], batch size: 62, lr: 3.44e-02, grad_scale: 64.0 2023-11-18 05:01:07,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=69666.66666666667, ans=0.0 2023-11-18 05:01:08,578 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.19 vs. limit=15.0 2023-11-18 05:01:09,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=69666.66666666667, ans=0.07 2023-11-18 05:01:11,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=69666.66666666667, ans=0.0 2023-11-18 05:01:23,409 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=15.0 2023-11-18 05:01:24,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=69733.33333333333, ans=0.0 2023-11-18 05:01:31,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=69800.0, ans=0.125 2023-11-18 05:01:37,892 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.27 vs. limit=15.0 2023-11-18 05:01:38,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=69800.0, ans=0.0 2023-11-18 05:01:43,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=69866.66666666667, ans=0.0 2023-11-18 05:01:46,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=69866.66666666667, ans=0.125 2023-11-18 05:02:03,062 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10500, loss[loss=0.1714, simple_loss=0.1798, pruned_loss=0.07055, audio_tagging_loss=0.01094, over 16486.00 frames. ], tot_loss[loss=0.145, simple_loss=0.1462, pruned_loss=0.0585, audio_tagging_loss=0.01344, over 3049260.76 frames. ], batch size: 56, lr: 3.43e-02, grad_scale: 64.0 2023-11-18 05:02:10,492 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.33 vs. limit=15.0 2023-11-18 05:02:10,949 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.463e+01 1.096e+02 1.231e+02 1.432e+02 2.125e+02, threshold=2.461e+02, percent-clipped=0.0 2023-11-18 05:02:11,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=70000.0, ans=0.1 2023-11-18 05:02:12,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=70000.0, ans=0.0 2023-11-18 05:02:13,505 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.42 vs. limit=15.0 2023-11-18 05:02:15,658 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.98 vs. limit=22.5 2023-11-18 05:02:15,710 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.73 vs. limit=22.5 2023-11-18 05:02:16,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=70066.66666666667, ans=0.125 2023-11-18 05:02:38,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=70200.0, ans=0.125 2023-11-18 05:02:46,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=70266.66666666667, ans=0.2 2023-11-18 05:02:50,499 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=16.29 vs. limit=15.0 2023-11-18 05:02:53,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=70266.66666666667, ans=0.1 2023-11-18 05:02:58,454 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10550, loss[loss=0.1364, simple_loss=0.1311, pruned_loss=0.05637, audio_tagging_loss=0.01443, over 15436.00 frames. ], tot_loss[loss=0.1439, simple_loss=0.1454, pruned_loss=0.05793, audio_tagging_loss=0.01329, over 3043106.53 frames. ], batch size: 60, lr: 3.43e-02, grad_scale: 64.0 2023-11-18 05:03:17,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=70400.0, ans=0.125 2023-11-18 05:03:17,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=70400.0, ans=0.2 2023-11-18 05:03:19,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=70466.66666666667, ans=0.125 2023-11-18 05:03:43,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=70600.0, ans=0.125 2023-11-18 05:03:52,517 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.50 vs. limit=15.0 2023-11-18 05:03:53,327 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10600, loss[loss=0.1331, simple_loss=0.1432, pruned_loss=0.04976, audio_tagging_loss=0.01177, over 16274.00 frames. ], tot_loss[loss=0.1444, simple_loss=0.1459, pruned_loss=0.05826, audio_tagging_loss=0.01319, over 3046144.27 frames. ], batch size: 59, lr: 3.42e-02, grad_scale: 64.0 2023-11-18 05:04:01,190 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.606e+01 1.084e+02 1.194e+02 1.358e+02 2.173e+02, threshold=2.389e+02, percent-clipped=0.0 2023-11-18 05:04:13,592 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.26 vs. limit=22.5 2023-11-18 05:04:25,523 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.77 vs. limit=10.0 2023-11-18 05:04:49,533 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10650, loss[loss=0.1574, simple_loss=0.1579, pruned_loss=0.06479, audio_tagging_loss=0.01365, over 15343.00 frames. ], tot_loss[loss=0.144, simple_loss=0.1455, pruned_loss=0.05806, audio_tagging_loss=0.0132, over 3037972.83 frames. ], batch size: 57, lr: 3.41e-02, grad_scale: 64.0 2023-11-18 05:04:55,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=71000.0, ans=0.95 2023-11-18 05:04:57,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=71000.0, ans=0.125 2023-11-18 05:05:01,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=71066.66666666667, ans=0.125 2023-11-18 05:05:04,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=71066.66666666667, ans=0.09899494936611666 2023-11-18 05:05:33,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=71266.66666666667, ans=0.1 2023-11-18 05:05:43,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=71266.66666666667, ans=0.0 2023-11-18 05:05:45,719 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10700, loss[loss=0.107, simple_loss=0.1047, pruned_loss=0.04387, audio_tagging_loss=0.01081, over 15417.00 frames. ], tot_loss[loss=0.1441, simple_loss=0.1456, pruned_loss=0.05817, audio_tagging_loss=0.01315, over 3042356.16 frames. ], batch size: 60, lr: 3.41e-02, grad_scale: 64.0 2023-11-18 05:05:50,337 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=6.05 vs. limit=15.0 2023-11-18 05:05:52,956 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.108e+01 1.048e+02 1.189e+02 1.344e+02 2.146e+02, threshold=2.378e+02, percent-clipped=0.0 2023-11-18 05:06:06,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=71466.66666666667, ans=0.125 2023-11-18 05:06:17,801 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2023-11-18 05:06:27,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=71533.33333333333, ans=0.125 2023-11-18 05:06:38,462 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.72 vs. limit=15.0 2023-11-18 05:06:39,913 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10750, loss[loss=0.1463, simple_loss=0.1438, pruned_loss=0.06069, audio_tagging_loss=0.01377, over 15504.00 frames. ], tot_loss[loss=0.1438, simple_loss=0.1452, pruned_loss=0.05799, audio_tagging_loss=0.01318, over 3045743.40 frames. ], batch size: 60, lr: 3.40e-02, grad_scale: 64.0 2023-11-18 05:07:35,443 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10800, loss[loss=0.1507, simple_loss=0.1537, pruned_loss=0.06023, audio_tagging_loss=0.01358, over 15662.00 frames. ], tot_loss[loss=0.1426, simple_loss=0.1442, pruned_loss=0.05721, audio_tagging_loss=0.01328, over 3044296.02 frames. ], batch size: 56, lr: 3.40e-02, grad_scale: 64.0 2023-11-18 05:07:43,300 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.252e+01 1.082e+02 1.179e+02 1.367e+02 2.142e+02, threshold=2.358e+02, percent-clipped=0.0 2023-11-18 05:08:27,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=72266.66666666667, ans=0.1 2023-11-18 05:08:31,407 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.93 vs. limit=15.0 2023-11-18 05:08:32,017 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10850, loss[loss=0.1602, simple_loss=0.1668, pruned_loss=0.06686, audio_tagging_loss=0.009956, over 17161.00 frames. ], tot_loss[loss=0.1414, simple_loss=0.1431, pruned_loss=0.05664, audio_tagging_loss=0.01324, over 3044935.90 frames. ], batch size: 64, lr: 3.39e-02, grad_scale: 64.0 2023-11-18 05:08:46,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=72400.0, ans=0.125 2023-11-18 05:09:03,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=72533.33333333333, ans=0.0 2023-11-18 05:09:06,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=72533.33333333333, ans=0.125 2023-11-18 05:09:06,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=72533.33333333333, ans=0.2 2023-11-18 05:09:07,313 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.31 vs. limit=12.0 2023-11-18 05:09:20,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=72600.0, ans=0.1 2023-11-18 05:09:21,311 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.41 vs. limit=15.0 2023-11-18 05:09:24,809 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:09:26,868 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10900, loss[loss=0.1552, simple_loss=0.1543, pruned_loss=0.06323, audio_tagging_loss=0.01482, over 15323.00 frames. ], tot_loss[loss=0.1414, simple_loss=0.1431, pruned_loss=0.05659, audio_tagging_loss=0.01328, over 3049531.10 frames. ], batch size: 57, lr: 3.39e-02, grad_scale: 64.0 2023-11-18 05:09:34,702 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.986e+01 1.097e+02 1.219e+02 1.380e+02 2.178e+02, threshold=2.437e+02, percent-clipped=0.0 2023-11-18 05:09:39,367 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.17 vs. limit=10.0 2023-11-18 05:09:52,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=72800.0, ans=0.0 2023-11-18 05:09:56,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=72800.0, ans=0.125 2023-11-18 05:10:22,384 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 10950, loss[loss=0.1609, simple_loss=0.1589, pruned_loss=0.0699, audio_tagging_loss=0.01161, over 15227.00 frames. ], tot_loss[loss=0.141, simple_loss=0.1428, pruned_loss=0.05634, audio_tagging_loss=0.0133, over 3050149.13 frames. ], batch size: 57, lr: 3.38e-02, grad_scale: 64.0 2023-11-18 05:10:24,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=73000.0, ans=0.2 2023-11-18 05:10:31,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=73000.0, ans=0.95 2023-11-18 05:10:49,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=73133.33333333333, ans=0.125 2023-11-18 05:10:54,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=73200.0, ans=0.1 2023-11-18 05:11:18,770 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11000, loss[loss=0.1488, simple_loss=0.1495, pruned_loss=0.06084, audio_tagging_loss=0.01324, over 16473.00 frames. ], tot_loss[loss=0.1416, simple_loss=0.1433, pruned_loss=0.05656, audio_tagging_loss=0.01337, over 3051845.60 frames. ], batch size: 60, lr: 3.38e-02, grad_scale: 64.0 2023-11-18 05:11:26,687 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.286e+01 1.063e+02 1.239e+02 1.487e+02 2.361e+02, threshold=2.479e+02, percent-clipped=0.0 2023-11-18 05:11:28,321 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.35 vs. limit=15.0 2023-11-18 05:11:28,857 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:11:33,545 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.92 vs. limit=15.0 2023-11-18 05:11:36,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=73400.0, ans=0.125 2023-11-18 05:11:52,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=73533.33333333333, ans=0.125 2023-11-18 05:11:55,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=73533.33333333333, ans=0.2 2023-11-18 05:12:13,947 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11050, loss[loss=0.1384, simple_loss=0.1384, pruned_loss=0.05694, audio_tagging_loss=0.01232, over 14752.00 frames. ], tot_loss[loss=0.141, simple_loss=0.1422, pruned_loss=0.05623, audio_tagging_loss=0.01363, over 3045213.89 frames. ], batch size: 57, lr: 3.37e-02, grad_scale: 64.0 2023-11-18 05:12:21,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=73666.66666666667, ans=0.125 2023-11-18 05:12:26,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=73733.33333333333, ans=0.0 2023-11-18 05:12:35,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=73800.0, ans=0.125 2023-11-18 05:12:39,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=73800.0, ans=0.0 2023-11-18 05:12:41,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=73800.0, ans=0.125 2023-11-18 05:13:00,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=73933.33333333333, ans=0.0 2023-11-18 05:13:02,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=73933.33333333333, ans=0.125 2023-11-18 05:13:09,734 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11100, loss[loss=0.1452, simple_loss=0.1451, pruned_loss=0.05664, audio_tagging_loss=0.01598, over 15354.00 frames. ], tot_loss[loss=0.1404, simple_loss=0.1418, pruned_loss=0.05592, audio_tagging_loss=0.01365, over 3043601.79 frames. ], batch size: 60, lr: 3.37e-02, grad_scale: 64.0 2023-11-18 05:13:17,722 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.064e+01 1.115e+02 1.316e+02 1.523e+02 2.373e+02, threshold=2.632e+02, percent-clipped=0.0 2023-11-18 05:13:18,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=74000.0, ans=0.125 2023-11-18 05:13:19,106 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:13:36,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=74133.33333333333, ans=0.125 2023-11-18 05:14:02,060 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2023-11-18 05:14:04,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=74333.33333333333, ans=0.125 2023-11-18 05:14:06,317 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11150, loss[loss=0.1247, simple_loss=0.1209, pruned_loss=0.04915, audio_tagging_loss=0.01513, over 15087.00 frames. ], tot_loss[loss=0.1409, simple_loss=0.1421, pruned_loss=0.05611, audio_tagging_loss=0.01378, over 3042682.96 frames. ], batch size: 58, lr: 3.36e-02, grad_scale: 64.0 2023-11-18 05:14:14,220 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.24 vs. limit=15.0 2023-11-18 05:14:21,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=74400.0, ans=0.1 2023-11-18 05:14:39,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=74533.33333333333, ans=15.0 2023-11-18 05:14:41,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=74533.33333333333, ans=0.125 2023-11-18 05:14:57,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=74600.0, ans=10.0 2023-11-18 05:15:01,601 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11200, loss[loss=0.1213, simple_loss=0.1248, pruned_loss=0.04481, audio_tagging_loss=0.01412, over 14342.00 frames. ], tot_loss[loss=0.1404, simple_loss=0.1414, pruned_loss=0.0558, audio_tagging_loss=0.01386, over 3038848.14 frames. ], batch size: 57, lr: 3.36e-02, grad_scale: 64.0 2023-11-18 05:15:03,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=74666.66666666667, ans=0.0 2023-11-18 05:15:09,622 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.922e+01 1.084e+02 1.213e+02 1.367e+02 1.851e+02, threshold=2.426e+02, percent-clipped=0.0 2023-11-18 05:15:18,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=74733.33333333333, ans=0.1 2023-11-18 05:15:44,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=74866.66666666667, ans=0.125 2023-11-18 05:15:51,414 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.25 vs. limit=15.0 2023-11-18 05:15:53,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=74933.33333333333, ans=0.125 2023-11-18 05:15:57,786 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11250, loss[loss=0.1027, simple_loss=0.1, pruned_loss=0.04217, audio_tagging_loss=0.01057, over 14838.00 frames. ], tot_loss[loss=0.1414, simple_loss=0.1425, pruned_loss=0.05642, audio_tagging_loss=0.01376, over 3032983.93 frames. ], batch size: 57, lr: 3.35e-02, grad_scale: 64.0 2023-11-18 05:16:00,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=75000.0, ans=0.2 2023-11-18 05:16:31,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=75200.0, ans=0.125 2023-11-18 05:16:32,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=75200.0, ans=0.0 2023-11-18 05:16:53,040 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11300, loss[loss=0.1557, simple_loss=0.1464, pruned_loss=0.06552, audio_tagging_loss=0.01698, over 16184.00 frames. ], tot_loss[loss=0.1409, simple_loss=0.1424, pruned_loss=0.05616, audio_tagging_loss=0.01354, over 3038648.11 frames. ], batch size: 60, lr: 3.35e-02, grad_scale: 64.0 2023-11-18 05:17:00,940 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.188e+01 1.067e+02 1.239e+02 1.530e+02 2.211e+02, threshold=2.479e+02, percent-clipped=0.0 2023-11-18 05:17:18,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.40 vs. limit=22.5 2023-11-18 05:17:35,996 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.69 vs. limit=12.0 2023-11-18 05:17:48,717 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11350, loss[loss=0.1082, simple_loss=0.1162, pruned_loss=0.03848, audio_tagging_loss=0.0116, over 15059.00 frames. ], tot_loss[loss=0.1405, simple_loss=0.1426, pruned_loss=0.0559, audio_tagging_loss=0.0133, over 3040735.75 frames. ], batch size: 56, lr: 3.34e-02, grad_scale: 64.0 2023-11-18 05:17:53,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=75666.66666666667, ans=0.125 2023-11-18 05:17:56,693 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.58 vs. limit=15.0 2023-11-18 05:18:05,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=75733.33333333333, ans=0.125 2023-11-18 05:18:09,071 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:18:25,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=75866.66666666667, ans=0.07 2023-11-18 05:18:26,529 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.25 vs. limit=22.5 2023-11-18 05:18:29,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=75866.66666666667, ans=0.2 2023-11-18 05:18:45,300 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11400, loss[loss=0.1556, simple_loss=0.1575, pruned_loss=0.06585, audio_tagging_loss=0.011, over 15153.00 frames. ], tot_loss[loss=0.1409, simple_loss=0.1431, pruned_loss=0.05617, audio_tagging_loss=0.01315, over 3047380.53 frames. ], batch size: 55, lr: 3.34e-02, grad_scale: 64.0 2023-11-18 05:18:46,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=76000.0, ans=0.125 2023-11-18 05:18:52,635 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.156e+01 1.039e+02 1.156e+02 1.287e+02 1.628e+02, threshold=2.311e+02, percent-clipped=0.0 2023-11-18 05:18:56,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=76066.66666666667, ans=0.1 2023-11-18 05:19:04,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=76066.66666666667, ans=0.05 2023-11-18 05:19:06,283 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.56 vs. limit=22.5 2023-11-18 05:19:22,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=76200.0, ans=0.0 2023-11-18 05:19:26,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=76200.0, ans=0.125 2023-11-18 05:19:40,948 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11450, loss[loss=0.15, simple_loss=0.1599, pruned_loss=0.05783, audio_tagging_loss=0.01222, over 14084.00 frames. ], tot_loss[loss=0.1423, simple_loss=0.1445, pruned_loss=0.05698, audio_tagging_loss=0.01305, over 3046890.19 frames. ], batch size: 51, lr: 3.33e-02, grad_scale: 64.0 2023-11-18 05:20:02,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=76466.66666666667, ans=0.125 2023-11-18 05:20:33,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.06 vs. limit=15.0 2023-11-18 05:20:36,077 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11500, loss[loss=0.1813, simple_loss=0.1893, pruned_loss=0.07476, audio_tagging_loss=0.01183, over 15437.00 frames. ], tot_loss[loss=0.1412, simple_loss=0.1434, pruned_loss=0.0564, audio_tagging_loss=0.01308, over 3053797.46 frames. ], batch size: 56, lr: 3.33e-02, grad_scale: 64.0 2023-11-18 05:20:37,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=76666.66666666667, ans=0.0 2023-11-18 05:20:37,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=76666.66666666667, ans=0.2 2023-11-18 05:20:43,407 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.208e+01 1.030e+02 1.194e+02 1.379e+02 2.068e+02, threshold=2.389e+02, percent-clipped=0.0 2023-11-18 05:21:07,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=76800.0, ans=0.125 2023-11-18 05:21:14,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=76866.66666666667, ans=0.0 2023-11-18 05:21:26,359 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=19.38 vs. limit=15.0 2023-11-18 05:21:27,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=76933.33333333333, ans=0.0 2023-11-18 05:21:29,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=76933.33333333333, ans=0.0 2023-11-18 05:21:31,742 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11550, loss[loss=0.1267, simple_loss=0.1273, pruned_loss=0.05135, audio_tagging_loss=0.01174, over 14478.00 frames. ], tot_loss[loss=0.1404, simple_loss=0.1429, pruned_loss=0.056, audio_tagging_loss=0.01299, over 3054798.76 frames. ], batch size: 57, lr: 3.32e-02, grad_scale: 64.0 2023-11-18 05:21:52,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=77066.66666666667, ans=0.125 2023-11-18 05:22:06,033 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:22:13,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=77200.0, ans=0.0 2023-11-18 05:22:28,059 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11600, loss[loss=0.1552, simple_loss=0.1513, pruned_loss=0.06415, audio_tagging_loss=0.01536, over 15253.00 frames. ], tot_loss[loss=0.1407, simple_loss=0.1428, pruned_loss=0.05621, audio_tagging_loss=0.01313, over 3050956.91 frames. ], batch size: 57, lr: 3.32e-02, grad_scale: 64.0 2023-11-18 05:22:35,972 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.930e+01 1.030e+02 1.201e+02 1.372e+02 2.300e+02, threshold=2.402e+02, percent-clipped=0.0 2023-11-18 05:22:45,700 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.175e+01 2023-11-18 05:22:51,353 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2023-11-18 05:22:57,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=77466.66666666667, ans=0.125 2023-11-18 05:23:06,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=77533.33333333333, ans=10.0 2023-11-18 05:23:07,247 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2023-11-18 05:23:11,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=77600.0, ans=0.125 2023-11-18 05:23:13,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=77600.0, ans=0.125 2023-11-18 05:23:23,711 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11650, loss[loss=0.118, simple_loss=0.1289, pruned_loss=0.03986, audio_tagging_loss=0.01369, over 14211.00 frames. ], tot_loss[loss=0.1406, simple_loss=0.1424, pruned_loss=0.05612, audio_tagging_loss=0.01329, over 3059024.30 frames. ], batch size: 55, lr: 3.31e-02, grad_scale: 64.0 2023-11-18 05:23:43,019 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.69 vs. limit=6.0 2023-11-18 05:23:44,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=77800.0, ans=0.0 2023-11-18 05:23:47,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=77800.0, ans=0.125 2023-11-18 05:23:52,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=77800.0, ans=0.125 2023-11-18 05:24:16,484 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.90 vs. limit=8.0 2023-11-18 05:24:18,900 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11700, loss[loss=0.1469, simple_loss=0.1474, pruned_loss=0.05816, audio_tagging_loss=0.01507, over 16523.00 frames. ], tot_loss[loss=0.1397, simple_loss=0.1416, pruned_loss=0.0556, audio_tagging_loss=0.01335, over 3058308.10 frames. ], batch size: 63, lr: 3.31e-02, grad_scale: 64.0 2023-11-18 05:24:26,789 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.375e+01 1.130e+02 1.304e+02 1.460e+02 2.076e+02, threshold=2.607e+02, percent-clipped=0.0 2023-11-18 05:24:28,413 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.88 vs. limit=15.0 2023-11-18 05:24:40,097 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.83 vs. limit=22.5 2023-11-18 05:24:52,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=78200.0, ans=0.0 2023-11-18 05:24:55,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=78200.0, ans=0.1 2023-11-18 05:25:02,264 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.14 vs. limit=6.0 2023-11-18 05:25:07,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=78266.66666666667, ans=0.0 2023-11-18 05:25:09,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=78266.66666666667, ans=0.125 2023-11-18 05:25:14,894 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11750, loss[loss=0.1483, simple_loss=0.1434, pruned_loss=0.06199, audio_tagging_loss=0.01459, over 14910.00 frames. ], tot_loss[loss=0.1392, simple_loss=0.141, pruned_loss=0.05531, audio_tagging_loss=0.01338, over 3047279.55 frames. ], batch size: 55, lr: 3.30e-02, grad_scale: 64.0 2023-11-18 05:25:27,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=78400.0, ans=0.125 2023-11-18 05:25:42,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=78466.66666666667, ans=0.1 2023-11-18 05:25:52,816 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=9.14 vs. limit=15.0 2023-11-18 05:26:01,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=78600.0, ans=0.0 2023-11-18 05:26:11,069 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11800, loss[loss=0.1415, simple_loss=0.1454, pruned_loss=0.05583, audio_tagging_loss=0.01293, over 15587.00 frames. ], tot_loss[loss=0.1392, simple_loss=0.1409, pruned_loss=0.05535, audio_tagging_loss=0.01343, over 3050305.78 frames. ], batch size: 60, lr: 3.30e-02, grad_scale: 32.0 2023-11-18 05:26:19,540 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 1.101e+02 1.270e+02 1.502e+02 2.355e+02, threshold=2.541e+02, percent-clipped=0.0 2023-11-18 05:26:20,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=78733.33333333333, ans=0.2 2023-11-18 05:26:51,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=78866.66666666667, ans=0.1 2023-11-18 05:26:52,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=78866.66666666667, ans=0.1 2023-11-18 05:26:57,465 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.50 vs. limit=22.5 2023-11-18 05:26:59,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=78933.33333333333, ans=0.0 2023-11-18 05:27:06,417 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11850, loss[loss=0.1456, simple_loss=0.1591, pruned_loss=0.05492, audio_tagging_loss=0.0112, over 15259.00 frames. ], tot_loss[loss=0.1399, simple_loss=0.1413, pruned_loss=0.05574, audio_tagging_loss=0.01356, over 3047769.21 frames. ], batch size: 58, lr: 3.29e-02, grad_scale: 32.0 2023-11-18 05:27:08,758 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:27:14,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=79000.0, ans=0.0 2023-11-18 05:27:23,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=79066.66666666667, ans=0.0 2023-11-18 05:27:47,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=79200.0, ans=0.0 2023-11-18 05:27:52,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=79266.66666666667, ans=0.0 2023-11-18 05:28:01,846 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.74 vs. limit=15.0 2023-11-18 05:28:02,171 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11900, loss[loss=0.1212, simple_loss=0.1286, pruned_loss=0.04068, audio_tagging_loss=0.01622, over 15273.00 frames. ], tot_loss[loss=0.1407, simple_loss=0.1421, pruned_loss=0.05595, audio_tagging_loss=0.01365, over 3043452.78 frames. ], batch size: 57, lr: 3.29e-02, grad_scale: 32.0 2023-11-18 05:28:11,746 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.709e+01 1.049e+02 1.249e+02 1.472e+02 4.248e+02, threshold=2.498e+02, percent-clipped=1.0 2023-11-18 05:28:11,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=79333.33333333333, ans=0.0 2023-11-18 05:28:14,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=79400.0, ans=0.125 2023-11-18 05:28:34,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=79466.66666666667, ans=0.04949747468305833 2023-11-18 05:28:42,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=79533.33333333333, ans=0.07 2023-11-18 05:28:58,967 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 11950, loss[loss=0.1587, simple_loss=0.1704, pruned_loss=0.06089, audio_tagging_loss=0.01266, over 15504.00 frames. ], tot_loss[loss=0.1409, simple_loss=0.1421, pruned_loss=0.05606, audio_tagging_loss=0.01377, over 3043950.25 frames. ], batch size: 55, lr: 3.28e-02, grad_scale: 32.0 2023-11-18 05:29:04,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=79666.66666666667, ans=0.1 2023-11-18 05:29:45,294 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.85 vs. limit=22.5 2023-11-18 05:29:51,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=79933.33333333333, ans=0.125 2023-11-18 05:29:55,519 INFO [train_asr.py:1115] (2/4) Epoch 1, batch 12000, loss[loss=0.1446, simple_loss=0.1404, pruned_loss=0.05865, audio_tagging_loss=0.01574, over 15385.00 frames. ], tot_loss[loss=0.1415, simple_loss=0.1431, pruned_loss=0.05621, audio_tagging_loss=0.01367, over 3047076.88 frames. ], batch size: 58, lr: 3.28e-02, grad_scale: 16.0 2023-11-18 05:29:55,520 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 05:30:31,627 INFO [train_asr.py:1147] (2/4) Epoch 1, validation: loss=0.09272, simple_loss=0.07249, pruned_loss=0.01766, audio_tagging_loss=0.03882, over 4681554.00 frames. 2023-11-18 05:30:31,628 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 05:30:41,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=80066.66666666667, ans=0.1 2023-11-18 05:30:42,376 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.048e+01 1.066e+02 1.219e+02 1.451e+02 6.762e+02, threshold=2.438e+02, percent-clipped=1.0 2023-11-18 05:31:38,185 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 0, loss[loss=0.1284, simple_loss=0.1137, pruned_loss=0.0392, audio_tagging_loss=0.03231, over 14848.00 frames. ], tot_loss[loss=0.1284, simple_loss=0.1137, pruned_loss=0.0392, audio_tagging_loss=0.03231, over 14848.00 frames. ], batch size: 56, lr: 3.21e-02, grad_scale: 32.0 2023-11-18 05:31:38,186 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 05:32:10,425 INFO [train_asr.py:1147] (2/4) Epoch 2, validation: loss=0.09083, simple_loss=0.07252, pruned_loss=0.0178, audio_tagging_loss=0.03677, over 4681554.00 frames. 2023-11-18 05:32:10,426 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 05:32:13,182 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=12.0 2023-11-18 05:32:13,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=80160.0, ans=0.125 2023-11-18 05:32:21,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=80226.66666666667, ans=0.0 2023-11-18 05:32:22,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=80226.66666666667, ans=0.1 2023-11-18 05:32:31,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=80293.33333333333, ans=0.0 2023-11-18 05:32:31,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=80293.33333333333, ans=0.125 2023-11-18 05:32:35,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=80293.33333333333, ans=0.09899494936611666 2023-11-18 05:32:47,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=80360.0, ans=0.0 2023-11-18 05:32:57,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=80426.66666666667, ans=0.2 2023-11-18 05:32:58,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=80426.66666666667, ans=0.125 2023-11-18 05:32:59,544 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:33:01,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=80426.66666666667, ans=0.1 2023-11-18 05:33:04,107 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.09 vs. limit=10.0 2023-11-18 05:33:05,837 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 50, loss[loss=0.1692, simple_loss=0.1726, pruned_loss=0.06023, audio_tagging_loss=0.02272, over 15108.00 frames. ], tot_loss[loss=0.1508, simple_loss=0.1412, pruned_loss=0.05505, audio_tagging_loss=0.02508, over 683918.35 frames. ], batch size: 57, lr: 3.21e-02, grad_scale: 32.0 2023-11-18 05:33:12,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=80493.33333333333, ans=0.125 2023-11-18 05:33:14,765 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.01 vs. limit=22.5 2023-11-18 05:33:23,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=80560.0, ans=0.125 2023-11-18 05:33:46,330 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 9.563e+01 1.150e+02 1.281e+02 1.485e+02 2.294e+02, threshold=2.563e+02, percent-clipped=0.0 2023-11-18 05:33:49,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=80760.0, ans=0.125 2023-11-18 05:33:51,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=80760.0, ans=0.125 2023-11-18 05:33:59,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=80760.0, ans=0.1 2023-11-18 05:34:01,932 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 100, loss[loss=0.1148, simple_loss=0.1054, pruned_loss=0.03711, audio_tagging_loss=0.02497, over 13963.00 frames. ], tot_loss[loss=0.1487, simple_loss=0.1392, pruned_loss=0.0545, audio_tagging_loss=0.02464, over 1205183.30 frames. ], batch size: 54, lr: 3.20e-02, grad_scale: 32.0 2023-11-18 05:34:09,124 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:34:22,289 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2023-11-18 05:34:22,840 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:34:22,875 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:34:23,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=80893.33333333333, ans=0.125 2023-11-18 05:34:27,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=80960.0, ans=0.0 2023-11-18 05:34:30,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=80960.0, ans=0.125 2023-11-18 05:34:48,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=81093.33333333333, ans=0.2 2023-11-18 05:34:49,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=81093.33333333333, ans=0.2 2023-11-18 05:34:51,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=81093.33333333333, ans=0.0 2023-11-18 05:34:55,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=81093.33333333333, ans=0.2 2023-11-18 05:34:57,994 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 150, loss[loss=0.1458, simple_loss=0.1453, pruned_loss=0.05971, audio_tagging_loss=0.01349, over 16143.00 frames. ], tot_loss[loss=0.1456, simple_loss=0.1393, pruned_loss=0.05401, audio_tagging_loss=0.02201, over 1608352.76 frames. ], batch size: 62, lr: 3.20e-02, grad_scale: 32.0 2023-11-18 05:35:16,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=81226.66666666667, ans=0.125 2023-11-18 05:35:16,816 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2023-11-18 05:35:23,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=81293.33333333333, ans=0.1 2023-11-18 05:35:38,723 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.932e+01 1.103e+02 1.211e+02 1.385e+02 1.770e+02, threshold=2.422e+02, percent-clipped=0.0 2023-11-18 05:35:40,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=81360.0, ans=15.0 2023-11-18 05:35:53,222 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.71 vs. limit=15.0 2023-11-18 05:35:54,754 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 200, loss[loss=0.1043, simple_loss=0.1064, pruned_loss=0.03426, audio_tagging_loss=0.01685, over 14126.00 frames. ], tot_loss[loss=0.1433, simple_loss=0.1393, pruned_loss=0.05428, audio_tagging_loss=0.01933, over 1929648.33 frames. ], batch size: 55, lr: 3.19e-02, grad_scale: 32.0 2023-11-18 05:35:58,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=81493.33333333333, ans=0.125 2023-11-18 05:36:26,969 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2023-11-18 05:36:42,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=81760.0, ans=0.125 2023-11-18 05:36:51,495 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 250, loss[loss=0.1153, simple_loss=0.1216, pruned_loss=0.04302, audio_tagging_loss=0.01146, over 14463.00 frames. ], tot_loss[loss=0.143, simple_loss=0.1415, pruned_loss=0.05499, audio_tagging_loss=0.01726, over 2180527.83 frames. ], batch size: 57, lr: 3.19e-02, grad_scale: 32.0 2023-11-18 05:37:03,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=81893.33333333333, ans=0.0 2023-11-18 05:37:10,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=81893.33333333333, ans=0.125 2023-11-18 05:37:22,573 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.36 vs. limit=22.5 2023-11-18 05:37:24,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=82026.66666666667, ans=0.1 2023-11-18 05:37:27,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=82026.66666666667, ans=0.1 2023-11-18 05:37:31,582 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 1.100e+02 1.268e+02 1.445e+02 2.035e+02, threshold=2.536e+02, percent-clipped=0.0 2023-11-18 05:37:34,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=82026.66666666667, ans=0.2 2023-11-18 05:37:36,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=82093.33333333333, ans=0.0 2023-11-18 05:37:44,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=82093.33333333333, ans=0.1 2023-11-18 05:37:47,889 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 300, loss[loss=0.09621, simple_loss=0.0953, pruned_loss=0.03419, audio_tagging_loss=0.01438, over 15276.00 frames. ], tot_loss[loss=0.1419, simple_loss=0.1414, pruned_loss=0.05514, audio_tagging_loss=0.01602, over 2375958.83 frames. ], batch size: 58, lr: 3.18e-02, grad_scale: 32.0 2023-11-18 05:37:48,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=82160.0, ans=0.125 2023-11-18 05:37:56,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=82160.0, ans=0.0 2023-11-18 05:38:18,633 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.83 vs. limit=22.5 2023-11-18 05:38:29,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=82360.0, ans=0.125 2023-11-18 05:38:36,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=82426.66666666667, ans=0.05 2023-11-18 05:38:43,908 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 350, loss[loss=0.1273, simple_loss=0.1298, pruned_loss=0.047, audio_tagging_loss=0.0154, over 15311.00 frames. ], tot_loss[loss=0.144, simple_loss=0.1446, pruned_loss=0.0565, audio_tagging_loss=0.01517, over 2528342.96 frames. ], batch size: 58, lr: 3.18e-02, grad_scale: 32.0 2023-11-18 05:38:45,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=82493.33333333333, ans=0.1 2023-11-18 05:38:47,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=82493.33333333333, ans=0.2 2023-11-18 05:39:24,852 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 1.093e+02 1.219e+02 1.382e+02 1.971e+02, threshold=2.439e+02, percent-clipped=0.0 2023-11-18 05:39:26,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=82693.33333333333, ans=0.0 2023-11-18 05:39:40,366 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 400, loss[loss=0.1922, simple_loss=0.2018, pruned_loss=0.08028, audio_tagging_loss=0.01106, over 15347.00 frames. ], tot_loss[loss=0.1434, simple_loss=0.1448, pruned_loss=0.05643, audio_tagging_loss=0.01457, over 2642939.99 frames. ], batch size: 56, lr: 3.17e-02, grad_scale: 32.0 2023-11-18 05:40:29,298 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.56 vs. limit=10.0 2023-11-18 05:40:32,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=83093.33333333333, ans=0.0 2023-11-18 05:40:34,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=83093.33333333333, ans=0.0 2023-11-18 05:40:36,421 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 450, loss[loss=0.1589, simple_loss=0.1628, pruned_loss=0.065, audio_tagging_loss=0.01248, over 15408.00 frames. ], tot_loss[loss=0.141, simple_loss=0.1427, pruned_loss=0.05542, audio_tagging_loss=0.01425, over 2733116.96 frames. ], batch size: 57, lr: 3.17e-02, grad_scale: 32.0 2023-11-18 05:40:44,885 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.98 vs. limit=6.0 2023-11-18 05:40:52,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=83226.66666666667, ans=0.2 2023-11-18 05:41:01,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=83293.33333333333, ans=0.125 2023-11-18 05:41:16,693 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.351e+01 1.050e+02 1.181e+02 1.351e+02 2.147e+02, threshold=2.363e+02, percent-clipped=0.0 2023-11-18 05:41:23,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=83426.66666666667, ans=0.125 2023-11-18 05:41:32,207 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 500, loss[loss=0.09941, simple_loss=0.0972, pruned_loss=0.03338, audio_tagging_loss=0.01743, over 16176.00 frames. ], tot_loss[loss=0.1391, simple_loss=0.1409, pruned_loss=0.05456, audio_tagging_loss=0.01408, over 2795308.55 frames. ], batch size: 62, lr: 3.16e-02, grad_scale: 32.0 2023-11-18 05:41:32,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=83493.33333333333, ans=0.125 2023-11-18 05:41:36,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=83493.33333333333, ans=0.2 2023-11-18 05:41:52,807 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.35 vs. limit=6.0 2023-11-18 05:41:59,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=83626.66666666667, ans=0.125 2023-11-18 05:42:06,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=83693.33333333333, ans=0.0 2023-11-18 05:42:10,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=83693.33333333333, ans=10.0 2023-11-18 05:42:13,190 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.09 vs. limit=22.5 2023-11-18 05:42:14,347 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.91 vs. limit=15.0 2023-11-18 05:42:21,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=83760.0, ans=0.125 2023-11-18 05:42:27,860 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 550, loss[loss=0.1691, simple_loss=0.1665, pruned_loss=0.073, audio_tagging_loss=0.01281, over 15334.00 frames. ], tot_loss[loss=0.1391, simple_loss=0.1413, pruned_loss=0.05451, audio_tagging_loss=0.01389, over 2850813.57 frames. ], batch size: 58, lr: 3.16e-02, grad_scale: 32.0 2023-11-18 05:42:32,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=83826.66666666667, ans=0.125 2023-11-18 05:42:42,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=83893.33333333333, ans=0.125 2023-11-18 05:42:58,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=83960.0, ans=0.125 2023-11-18 05:42:59,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=83960.0, ans=0.0 2023-11-18 05:42:59,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=83960.0, ans=0.125 2023-11-18 05:43:02,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=84026.66666666667, ans=0.125 2023-11-18 05:43:08,662 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 9.035e+01 1.150e+02 1.343e+02 1.676e+02 2.273e+02, threshold=2.687e+02, percent-clipped=0.0 2023-11-18 05:43:25,070 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 600, loss[loss=0.14, simple_loss=0.1485, pruned_loss=0.05402, audio_tagging_loss=0.01175, over 15010.00 frames. ], tot_loss[loss=0.1383, simple_loss=0.141, pruned_loss=0.05404, audio_tagging_loss=0.01372, over 2896039.18 frames. ], batch size: 57, lr: 3.15e-02, grad_scale: 32.0 2023-11-18 05:43:36,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=84226.66666666667, ans=0.025 2023-11-18 05:43:37,286 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.71 vs. limit=8.0 2023-11-18 05:43:41,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=84226.66666666667, ans=0.125 2023-11-18 05:43:49,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=84293.33333333333, ans=0.125 2023-11-18 05:44:02,024 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.70 vs. limit=15.0 2023-11-18 05:44:02,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=84360.0, ans=0.125 2023-11-18 05:44:07,889 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.36 vs. limit=22.5 2023-11-18 05:44:08,708 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.468e+00 2023-11-18 05:44:21,963 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 650, loss[loss=0.106, simple_loss=0.1099, pruned_loss=0.03712, audio_tagging_loss=0.01393, over 16073.00 frames. ], tot_loss[loss=0.1388, simple_loss=0.1418, pruned_loss=0.05437, audio_tagging_loss=0.01357, over 2930647.08 frames. ], batch size: 63, lr: 3.15e-02, grad_scale: 32.0 2023-11-18 05:44:22,607 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.59 vs. limit=22.5 2023-11-18 05:44:23,560 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=15.0 2023-11-18 05:44:50,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=84626.66666666667, ans=0.125 2023-11-18 05:45:01,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=84693.33333333333, ans=0.1 2023-11-18 05:45:01,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=84693.33333333333, ans=0.125 2023-11-18 05:45:02,897 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.174e+01 1.067e+02 1.188e+02 1.445e+02 2.872e+02, threshold=2.375e+02, percent-clipped=1.0 2023-11-18 05:45:06,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=84760.0, ans=0.125 2023-11-18 05:45:10,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=84760.0, ans=0.0 2023-11-18 05:45:13,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=84760.0, ans=0.1 2023-11-18 05:45:17,839 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 700, loss[loss=0.1703, simple_loss=0.1834, pruned_loss=0.06704, audio_tagging_loss=0.01159, over 15487.00 frames. ], tot_loss[loss=0.1391, simple_loss=0.1422, pruned_loss=0.05458, audio_tagging_loss=0.01338, over 2953417.11 frames. ], batch size: 57, lr: 3.14e-02, grad_scale: 32.0 2023-11-18 05:45:28,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=84893.33333333333, ans=0.2 2023-11-18 05:45:44,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=84960.0, ans=0.125 2023-11-18 05:45:55,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=85026.66666666667, ans=0.125 2023-11-18 05:46:04,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=85093.33333333333, ans=0.1 2023-11-18 05:46:11,066 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.53 vs. limit=6.0 2023-11-18 05:46:15,209 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 750, loss[loss=0.1422, simple_loss=0.1463, pruned_loss=0.05711, audio_tagging_loss=0.01198, over 15439.00 frames. ], tot_loss[loss=0.1392, simple_loss=0.1425, pruned_loss=0.0546, audio_tagging_loss=0.01336, over 2971192.03 frames. ], batch size: 57, lr: 3.14e-02, grad_scale: 32.0 2023-11-18 05:46:28,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=85226.66666666667, ans=0.125 2023-11-18 05:46:37,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=85293.33333333333, ans=0.0 2023-11-18 05:46:56,129 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.449e+01 1.066e+02 1.181e+02 1.360e+02 2.052e+02, threshold=2.361e+02, percent-clipped=0.0 2023-11-18 05:47:03,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=85426.66666666667, ans=0.0 2023-11-18 05:47:09,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=85426.66666666667, ans=0.125 2023-11-18 05:47:11,523 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 800, loss[loss=0.1046, simple_loss=0.0995, pruned_loss=0.03711, audio_tagging_loss=0.01776, over 16589.00 frames. ], tot_loss[loss=0.1385, simple_loss=0.1414, pruned_loss=0.05432, audio_tagging_loss=0.01346, over 2987092.86 frames. ], batch size: 64, lr: 3.14e-02, grad_scale: 32.0 2023-11-18 05:47:13,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=85493.33333333333, ans=0.2 2023-11-18 05:47:22,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=85560.0, ans=0.1 2023-11-18 05:47:33,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=85626.66666666667, ans=0.125 2023-11-18 05:47:52,044 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.60 vs. limit=15.0 2023-11-18 05:47:54,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=85693.33333333333, ans=0.125 2023-11-18 05:47:59,603 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.11 vs. limit=15.0 2023-11-18 05:48:07,702 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 850, loss[loss=0.1468, simple_loss=0.1415, pruned_loss=0.05854, audio_tagging_loss=0.01756, over 15043.00 frames. ], tot_loss[loss=0.1393, simple_loss=0.1419, pruned_loss=0.05477, audio_tagging_loss=0.01352, over 3002901.92 frames. ], batch size: 57, lr: 3.13e-02, grad_scale: 32.0 2023-11-18 05:48:09,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=85826.66666666667, ans=0.125 2023-11-18 05:48:11,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=85826.66666666667, ans=0.0 2023-11-18 05:48:17,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=85826.66666666667, ans=0.125 2023-11-18 05:48:23,981 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.59 vs. limit=15.0 2023-11-18 05:48:30,209 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.94 vs. limit=22.5 2023-11-18 05:48:48,457 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.063e+01 1.087e+02 1.227e+02 1.407e+02 2.790e+02, threshold=2.454e+02, percent-clipped=1.0 2023-11-18 05:48:56,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=86093.33333333333, ans=0.0 2023-11-18 05:49:05,238 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 900, loss[loss=0.1576, simple_loss=0.175, pruned_loss=0.06081, audio_tagging_loss=0.009288, over 16135.00 frames. ], tot_loss[loss=0.1389, simple_loss=0.1416, pruned_loss=0.05465, audio_tagging_loss=0.01348, over 3011690.65 frames. ], batch size: 57, lr: 3.13e-02, grad_scale: 32.0 2023-11-18 05:49:27,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=86293.33333333333, ans=0.125 2023-11-18 05:49:39,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=86360.0, ans=0.0 2023-11-18 05:50:01,366 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 950, loss[loss=0.154, simple_loss=0.1556, pruned_loss=0.06346, audio_tagging_loss=0.01276, over 14592.00 frames. ], tot_loss[loss=0.1398, simple_loss=0.143, pruned_loss=0.05509, audio_tagging_loss=0.01325, over 3024939.87 frames. ], batch size: 53, lr: 3.12e-02, grad_scale: 32.0 2023-11-18 05:50:03,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=86493.33333333333, ans=0.0 2023-11-18 05:50:04,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=86493.33333333333, ans=0.0 2023-11-18 05:50:05,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=86493.33333333333, ans=0.1 2023-11-18 05:50:42,203 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.269e+01 1.077e+02 1.200e+02 1.388e+02 2.127e+02, threshold=2.401e+02, percent-clipped=0.0 2023-11-18 05:50:57,283 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1000, loss[loss=0.1625, simple_loss=0.1754, pruned_loss=0.0655, audio_tagging_loss=0.009334, over 14965.00 frames. ], tot_loss[loss=0.1392, simple_loss=0.1426, pruned_loss=0.05478, audio_tagging_loss=0.01314, over 3030354.05 frames. ], batch size: 55, lr: 3.12e-02, grad_scale: 32.0 2023-11-18 05:50:59,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=86826.66666666667, ans=0.0 2023-11-18 05:51:01,098 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.84 vs. limit=10.0 2023-11-18 05:51:10,120 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.21 vs. limit=15.0 2023-11-18 05:51:21,427 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:51:26,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=86960.0, ans=0.125 2023-11-18 05:51:31,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=87026.66666666667, ans=0.125 2023-11-18 05:51:37,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=87026.66666666667, ans=0.125 2023-11-18 05:51:39,663 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.91 vs. limit=6.0 2023-11-18 05:51:53,417 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1050, loss[loss=0.1241, simple_loss=0.1287, pruned_loss=0.04489, audio_tagging_loss=0.01487, over 14896.00 frames. ], tot_loss[loss=0.139, simple_loss=0.1424, pruned_loss=0.05483, audio_tagging_loss=0.013, over 3027214.81 frames. ], batch size: 55, lr: 3.11e-02, grad_scale: 32.0 2023-11-18 05:51:56,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=87160.0, ans=0.0 2023-11-18 05:52:06,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=87226.66666666667, ans=0.125 2023-11-18 05:52:09,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=87226.66666666667, ans=0.0 2023-11-18 05:52:20,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=87293.33333333333, ans=0.0 2023-11-18 05:52:24,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=87293.33333333333, ans=0.125 2023-11-18 05:52:34,265 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.017e+01 1.050e+02 1.244e+02 1.396e+02 2.108e+02, threshold=2.488e+02, percent-clipped=0.0 2023-11-18 05:52:46,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=87426.66666666667, ans=0.1 2023-11-18 05:52:50,688 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1100, loss[loss=0.09153, simple_loss=0.09325, pruned_loss=0.03076, audio_tagging_loss=0.01415, over 15423.00 frames. ], tot_loss[loss=0.1382, simple_loss=0.1416, pruned_loss=0.05446, audio_tagging_loss=0.01295, over 3035861.32 frames. ], batch size: 58, lr: 3.11e-02, grad_scale: 32.0 2023-11-18 05:52:52,879 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:52:54,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=87493.33333333333, ans=0.125 2023-11-18 05:52:54,514 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.60 vs. limit=22.5 2023-11-18 05:53:11,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=87626.66666666667, ans=0.125 2023-11-18 05:53:22,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=87626.66666666667, ans=0.1 2023-11-18 05:53:28,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=87693.33333333333, ans=0.0 2023-11-18 05:53:45,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=87826.66666666667, ans=0.125 2023-11-18 05:53:46,907 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1150, loss[loss=0.1269, simple_loss=0.1345, pruned_loss=0.04791, audio_tagging_loss=0.01179, over 15865.00 frames. ], tot_loss[loss=0.1372, simple_loss=0.1409, pruned_loss=0.05388, audio_tagging_loss=0.01292, over 3034358.51 frames. ], batch size: 58, lr: 3.10e-02, grad_scale: 32.0 2023-11-18 05:53:59,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=87893.33333333333, ans=0.125 2023-11-18 05:54:07,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=87893.33333333333, ans=0.125 2023-11-18 05:54:18,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=87960.0, ans=0.125 2023-11-18 05:54:28,327 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 1.025e+02 1.107e+02 1.275e+02 1.816e+02, threshold=2.214e+02, percent-clipped=0.0 2023-11-18 05:54:36,485 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=12.0 2023-11-18 05:54:43,951 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1200, loss[loss=0.1338, simple_loss=0.1467, pruned_loss=0.04938, audio_tagging_loss=0.01105, over 14739.00 frames. ], tot_loss[loss=0.1383, simple_loss=0.142, pruned_loss=0.05441, audio_tagging_loss=0.01293, over 3038645.75 frames. ], batch size: 55, lr: 3.10e-02, grad_scale: 32.0 2023-11-18 05:54:51,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=88160.0, ans=0.2 2023-11-18 05:55:39,725 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.43 vs. limit=22.5 2023-11-18 05:55:40,045 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1250, loss[loss=0.1288, simple_loss=0.1244, pruned_loss=0.05126, audio_tagging_loss=0.01535, over 14648.00 frames. ], tot_loss[loss=0.1384, simple_loss=0.1422, pruned_loss=0.05442, audio_tagging_loss=0.01293, over 3036271.18 frames. ], batch size: 56, lr: 3.09e-02, grad_scale: 32.0 2023-11-18 05:55:45,535 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:55:47,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=88493.33333333333, ans=0.2 2023-11-18 05:55:54,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=88560.0, ans=0.1 2023-11-18 05:56:05,858 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=15.0 2023-11-18 05:56:20,666 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.599e+01 1.015e+02 1.167e+02 1.344e+02 2.286e+02, threshold=2.335e+02, percent-clipped=1.0 2023-11-18 05:56:25,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=88760.0, ans=10.0 2023-11-18 05:56:25,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=88760.0, ans=0.125 2023-11-18 05:56:36,653 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1300, loss[loss=0.1203, simple_loss=0.1194, pruned_loss=0.04674, audio_tagging_loss=0.01385, over 14628.00 frames. ], tot_loss[loss=0.1376, simple_loss=0.1418, pruned_loss=0.0539, audio_tagging_loss=0.01283, over 3035775.84 frames. ], batch size: 55, lr: 3.09e-02, grad_scale: 32.0 2023-11-18 05:57:08,013 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2023-11-18 05:57:13,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=89026.66666666667, ans=0.0 2023-11-18 05:57:23,859 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.36 vs. limit=6.0 2023-11-18 05:57:33,128 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1350, loss[loss=0.09513, simple_loss=0.09339, pruned_loss=0.0357, audio_tagging_loss=0.01274, over 14754.00 frames. ], tot_loss[loss=0.1369, simple_loss=0.1408, pruned_loss=0.05367, audio_tagging_loss=0.01283, over 3038133.57 frames. ], batch size: 57, lr: 3.09e-02, grad_scale: 32.0 2023-11-18 05:57:34,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=89160.0, ans=0.0 2023-11-18 05:57:35,800 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2023-11-18 05:57:44,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=89226.66666666667, ans=0.0 2023-11-18 05:57:47,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=89226.66666666667, ans=0.125 2023-11-18 05:57:53,916 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.99 vs. limit=15.0 2023-11-18 05:58:14,276 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 1.078e+02 1.206e+02 1.341e+02 1.953e+02, threshold=2.412e+02, percent-clipped=0.0 2023-11-18 05:58:14,315 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:58:29,863 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1400, loss[loss=0.1201, simple_loss=0.1289, pruned_loss=0.04198, audio_tagging_loss=0.0136, over 16482.00 frames. ], tot_loss[loss=0.137, simple_loss=0.1409, pruned_loss=0.05362, audio_tagging_loss=0.01292, over 3037036.17 frames. ], batch size: 63, lr: 3.08e-02, grad_scale: 32.0 2023-11-18 05:58:36,495 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.49 vs. limit=15.0 2023-11-18 05:58:37,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=89493.33333333333, ans=0.0 2023-11-18 05:58:47,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=89560.0, ans=0.0 2023-11-18 05:58:52,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=89626.66666666667, ans=0.0 2023-11-18 05:58:59,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=89626.66666666667, ans=0.1 2023-11-18 05:59:08,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=89693.33333333333, ans=0.2 2023-11-18 05:59:19,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=89760.0, ans=0.025 2023-11-18 05:59:24,878 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.28 vs. limit=15.0 2023-11-18 05:59:25,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=89826.66666666667, ans=0.125 2023-11-18 05:59:27,054 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1450, loss[loss=0.1767, simple_loss=0.1872, pruned_loss=0.06978, audio_tagging_loss=0.01327, over 16169.00 frames. ], tot_loss[loss=0.1342, simple_loss=0.1378, pruned_loss=0.05216, audio_tagging_loss=0.0131, over 3041121.57 frames. ], batch size: 58, lr: 3.08e-02, grad_scale: 32.0 2023-11-18 05:59:27,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=89826.66666666667, ans=0.0 2023-11-18 05:59:38,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=89893.33333333333, ans=0.1 2023-11-18 05:59:39,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=89893.33333333333, ans=0.125 2023-11-18 05:59:48,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=89960.0, ans=0.0 2023-11-18 05:59:54,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=89960.0, ans=0.125 2023-11-18 05:59:55,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=89960.0, ans=0.125 2023-11-18 05:59:56,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=89960.0, ans=0.5 2023-11-18 06:00:07,187 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.834e+01 1.064e+02 1.199e+02 1.327e+02 1.919e+02, threshold=2.398e+02, percent-clipped=0.0 2023-11-18 06:00:09,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=90026.66666666667, ans=0.125 2023-11-18 06:00:18,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=90093.33333333333, ans=0.125 2023-11-18 06:00:22,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=90160.0, ans=0.2 2023-11-18 06:00:23,240 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1500, loss[loss=0.08975, simple_loss=0.08526, pruned_loss=0.03017, audio_tagging_loss=0.01695, over 15354.00 frames. ], tot_loss[loss=0.1353, simple_loss=0.1387, pruned_loss=0.05276, audio_tagging_loss=0.01314, over 3050184.83 frames. ], batch size: 61, lr: 3.07e-02, grad_scale: 32.0 2023-11-18 06:00:26,140 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2023-11-18 06:00:34,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=90226.66666666667, ans=0.0 2023-11-18 06:00:36,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=90226.66666666667, ans=0.2 2023-11-18 06:00:37,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=90226.66666666667, ans=0.1 2023-11-18 06:00:45,091 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=9.080e+00 2023-11-18 06:00:51,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=90293.33333333333, ans=0.2 2023-11-18 06:01:00,009 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.32 vs. limit=6.0 2023-11-18 06:01:08,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90426.66666666667, ans=0.1 2023-11-18 06:01:15,645 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.91 vs. limit=6.0 2023-11-18 06:01:17,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90426.66666666667, ans=0.1 2023-11-18 06:01:19,520 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1550, loss[loss=0.1547, simple_loss=0.1547, pruned_loss=0.06513, audio_tagging_loss=0.01223, over 14707.00 frames. ], tot_loss[loss=0.1363, simple_loss=0.1396, pruned_loss=0.05319, audio_tagging_loss=0.01328, over 3040495.18 frames. ], batch size: 56, lr: 3.07e-02, grad_scale: 32.0 2023-11-18 06:01:32,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=90560.0, ans=0.2 2023-11-18 06:01:35,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=90560.0, ans=0.125 2023-11-18 06:01:46,862 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2023-11-18 06:01:52,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=90693.33333333333, ans=0.0 2023-11-18 06:02:00,212 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 1.048e+02 1.182e+02 1.332e+02 1.868e+02, threshold=2.363e+02, percent-clipped=0.0 2023-11-18 06:02:05,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=90760.0, ans=0.125 2023-11-18 06:02:15,845 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1600, loss[loss=0.1386, simple_loss=0.1477, pruned_loss=0.05565, audio_tagging_loss=0.009051, over 14474.00 frames. ], tot_loss[loss=0.1373, simple_loss=0.1409, pruned_loss=0.05352, audio_tagging_loss=0.0133, over 3043718.70 frames. ], batch size: 55, lr: 3.06e-02, grad_scale: 32.0 2023-11-18 06:02:24,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=90826.66666666667, ans=0.95 2023-11-18 06:02:28,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=90893.33333333333, ans=0.125 2023-11-18 06:02:35,670 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=15.0 2023-11-18 06:02:47,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=90960.0, ans=0.125 2023-11-18 06:03:11,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=91160.0, ans=0.125 2023-11-18 06:03:11,759 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.07 vs. limit=10.0 2023-11-18 06:03:12,235 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1650, loss[loss=0.1348, simple_loss=0.1411, pruned_loss=0.05007, audio_tagging_loss=0.01416, over 15464.00 frames. ], tot_loss[loss=0.1369, simple_loss=0.1406, pruned_loss=0.05325, audio_tagging_loss=0.01333, over 3049055.72 frames. ], batch size: 56, lr: 3.06e-02, grad_scale: 32.0 2023-11-18 06:03:14,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=91160.0, ans=0.0 2023-11-18 06:03:38,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=91293.33333333333, ans=0.0 2023-11-18 06:03:45,807 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.89 vs. limit=12.0 2023-11-18 06:03:53,160 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.238e+01 1.052e+02 1.201e+02 1.408e+02 1.916e+02, threshold=2.401e+02, percent-clipped=0.0 2023-11-18 06:04:04,109 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.04 vs. limit=15.0 2023-11-18 06:04:04,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=91426.66666666667, ans=0.2 2023-11-18 06:04:09,192 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1700, loss[loss=0.1201, simple_loss=0.1133, pruned_loss=0.04734, audio_tagging_loss=0.01612, over 15038.00 frames. ], tot_loss[loss=0.136, simple_loss=0.1396, pruned_loss=0.05277, audio_tagging_loss=0.01338, over 3045258.10 frames. ], batch size: 58, lr: 3.06e-02, grad_scale: 32.0 2023-11-18 06:04:09,832 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2023-11-18 06:04:25,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=91560.0, ans=0.0 2023-11-18 06:04:26,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=91560.0, ans=0.125 2023-11-18 06:04:50,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=91693.33333333333, ans=0.125 2023-11-18 06:04:53,327 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.97 vs. limit=15.0 2023-11-18 06:05:00,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=91760.0, ans=0.125 2023-11-18 06:05:06,098 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1750, loss[loss=0.1528, simple_loss=0.1623, pruned_loss=0.0612, audio_tagging_loss=0.01043, over 14238.00 frames. ], tot_loss[loss=0.1358, simple_loss=0.1397, pruned_loss=0.0527, audio_tagging_loss=0.01323, over 3051853.08 frames. ], batch size: 52, lr: 3.05e-02, grad_scale: 32.0 2023-11-18 06:05:26,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=91893.33333333333, ans=0.07 2023-11-18 06:05:32,990 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.29 vs. limit=10.0 2023-11-18 06:05:37,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=91960.0, ans=0.125 2023-11-18 06:05:47,230 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.450e+01 1.111e+02 1.237e+02 1.383e+02 2.082e+02, threshold=2.473e+02, percent-clipped=0.0 2023-11-18 06:06:02,403 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1800, loss[loss=0.1157, simple_loss=0.1173, pruned_loss=0.04538, audio_tagging_loss=0.01168, over 15496.00 frames. ], tot_loss[loss=0.1356, simple_loss=0.1397, pruned_loss=0.05266, audio_tagging_loss=0.01304, over 3054023.85 frames. ], batch size: 59, lr: 3.05e-02, grad_scale: 32.0 2023-11-18 06:06:03,113 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.83 vs. limit=15.0 2023-11-18 06:06:09,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=92160.0, ans=0.0 2023-11-18 06:06:24,410 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2023-11-18 06:06:29,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=92293.33333333333, ans=0.1 2023-11-18 06:06:38,335 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.01 vs. limit=22.5 2023-11-18 06:06:57,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=92426.66666666667, ans=0.1 2023-11-18 06:06:59,362 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=15.0 2023-11-18 06:06:59,992 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1850, loss[loss=0.1636, simple_loss=0.1759, pruned_loss=0.06269, audio_tagging_loss=0.01292, over 14638.00 frames. ], tot_loss[loss=0.1361, simple_loss=0.1402, pruned_loss=0.05306, audio_tagging_loss=0.01295, over 3044588.50 frames. ], batch size: 53, lr: 3.04e-02, grad_scale: 32.0 2023-11-18 06:07:00,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=92493.33333333333, ans=0.125 2023-11-18 06:07:21,129 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:07:23,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=92626.66666666667, ans=0.125 2023-11-18 06:07:31,846 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.41 vs. limit=15.0 2023-11-18 06:07:40,260 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.965e+01 1.045e+02 1.179e+02 1.336e+02 1.806e+02, threshold=2.358e+02, percent-clipped=0.0 2023-11-18 06:07:52,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=92760.0, ans=0.1 2023-11-18 06:07:55,801 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1900, loss[loss=0.1034, simple_loss=0.1045, pruned_loss=0.03743, audio_tagging_loss=0.01369, over 15099.00 frames. ], tot_loss[loss=0.1348, simple_loss=0.1388, pruned_loss=0.05241, audio_tagging_loss=0.01299, over 3046431.34 frames. ], batch size: 56, lr: 3.04e-02, grad_scale: 32.0 2023-11-18 06:08:51,657 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 1950, loss[loss=0.1514, simple_loss=0.1569, pruned_loss=0.05942, audio_tagging_loss=0.0135, over 15621.00 frames. ], tot_loss[loss=0.1346, simple_loss=0.1382, pruned_loss=0.0525, audio_tagging_loss=0.013, over 3045511.20 frames. ], batch size: 58, lr: 3.03e-02, grad_scale: 32.0 2023-11-18 06:09:08,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=93226.66666666667, ans=0.0 2023-11-18 06:09:11,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=93226.66666666667, ans=0.0 2023-11-18 06:09:32,958 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 1.040e+02 1.152e+02 1.328e+02 1.978e+02, threshold=2.303e+02, percent-clipped=0.0 2023-11-18 06:09:33,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=93360.0, ans=0.05 2023-11-18 06:09:34,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=93360.0, ans=0.04949747468305833 2023-11-18 06:09:49,682 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2000, loss[loss=0.1137, simple_loss=0.1246, pruned_loss=0.04064, audio_tagging_loss=0.01078, over 14914.00 frames. ], tot_loss[loss=0.1346, simple_loss=0.1383, pruned_loss=0.05249, audio_tagging_loss=0.01299, over 3047613.27 frames. ], batch size: 55, lr: 3.03e-02, grad_scale: 64.0 2023-11-18 06:09:54,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=93493.33333333333, ans=0.2 2023-11-18 06:09:55,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=93493.33333333333, ans=0.0 2023-11-18 06:10:00,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=93560.0, ans=0.125 2023-11-18 06:10:00,368 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.16 vs. limit=6.0 2023-11-18 06:10:09,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=93560.0, ans=0.0 2023-11-18 06:10:18,443 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.98 vs. limit=15.0 2023-11-18 06:10:45,928 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2050, loss[loss=0.1241, simple_loss=0.1286, pruned_loss=0.04904, audio_tagging_loss=0.0108, over 15749.00 frames. ], tot_loss[loss=0.135, simple_loss=0.1388, pruned_loss=0.05265, audio_tagging_loss=0.01293, over 3045670.46 frames. ], batch size: 60, lr: 3.03e-02, grad_scale: 64.0 2023-11-18 06:11:13,077 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.07 vs. limit=22.5 2023-11-18 06:11:24,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=94026.66666666667, ans=0.125 2023-11-18 06:11:26,279 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.950e+01 1.053e+02 1.201e+02 1.345e+02 1.920e+02, threshold=2.401e+02, percent-clipped=0.0 2023-11-18 06:11:41,153 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2100, loss[loss=0.1362, simple_loss=0.1424, pruned_loss=0.05412, audio_tagging_loss=0.01088, over 14855.00 frames. ], tot_loss[loss=0.1348, simple_loss=0.1384, pruned_loss=0.05246, audio_tagging_loss=0.01313, over 3044691.56 frames. ], batch size: 55, lr: 3.02e-02, grad_scale: 64.0 2023-11-18 06:11:43,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=94160.0, ans=0.0 2023-11-18 06:11:43,861 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.05 vs. limit=15.0 2023-11-18 06:11:57,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=94226.66666666667, ans=0.1 2023-11-18 06:12:14,458 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=15.0 2023-11-18 06:12:17,709 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.90 vs. limit=15.0 2023-11-18 06:12:21,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=94360.0, ans=0.125 2023-11-18 06:12:28,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=94426.66666666667, ans=0.1 2023-11-18 06:12:29,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=94426.66666666667, ans=0.0 2023-11-18 06:12:37,005 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2150, loss[loss=0.1276, simple_loss=0.1245, pruned_loss=0.05148, audio_tagging_loss=0.01391, over 14465.00 frames. ], tot_loss[loss=0.1344, simple_loss=0.1378, pruned_loss=0.05238, audio_tagging_loss=0.01312, over 3042177.38 frames. ], batch size: 55, lr: 3.02e-02, grad_scale: 64.0 2023-11-18 06:12:48,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=94560.0, ans=0.0 2023-11-18 06:12:53,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=94560.0, ans=0.1 2023-11-18 06:13:02,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=94626.66666666667, ans=0.5 2023-11-18 06:13:10,150 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:13:11,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=94693.33333333333, ans=0.025 2023-11-18 06:13:15,457 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.56 vs. limit=15.0 2023-11-18 06:13:16,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=94693.33333333333, ans=0.2 2023-11-18 06:13:18,102 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.172e+01 1.043e+02 1.205e+02 1.372e+02 2.009e+02, threshold=2.410e+02, percent-clipped=0.0 2023-11-18 06:13:30,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=94760.0, ans=0.05 2023-11-18 06:13:32,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=94760.0, ans=0.0 2023-11-18 06:13:34,783 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2200, loss[loss=0.1443, simple_loss=0.1418, pruned_loss=0.05759, audio_tagging_loss=0.01583, over 13881.00 frames. ], tot_loss[loss=0.1342, simple_loss=0.1376, pruned_loss=0.05216, audio_tagging_loss=0.01321, over 3040684.56 frames. ], batch size: 55, lr: 3.01e-02, grad_scale: 64.0 2023-11-18 06:13:41,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=94826.66666666667, ans=0.125 2023-11-18 06:13:47,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=94893.33333333333, ans=0.125 2023-11-18 06:13:52,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=94893.33333333333, ans=0.125 2023-11-18 06:13:58,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=94960.0, ans=0.125 2023-11-18 06:14:16,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=95026.66666666667, ans=0.0 2023-11-18 06:14:18,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=95093.33333333333, ans=0.0 2023-11-18 06:14:30,423 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2250, loss[loss=0.1517, simple_loss=0.1549, pruned_loss=0.06275, audio_tagging_loss=0.01154, over 15977.00 frames. ], tot_loss[loss=0.1351, simple_loss=0.1387, pruned_loss=0.05258, audio_tagging_loss=0.01316, over 3047328.51 frames. ], batch size: 57, lr: 3.01e-02, grad_scale: 32.0 2023-11-18 06:14:44,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=95226.66666666667, ans=0.125 2023-11-18 06:14:50,295 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.72 vs. limit=22.5 2023-11-18 06:14:52,550 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=15.0 2023-11-18 06:14:58,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=95293.33333333333, ans=0.125 2023-11-18 06:15:01,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=95293.33333333333, ans=0.0 2023-11-18 06:15:07,578 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.77 vs. limit=15.0 2023-11-18 06:15:12,288 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.736e+01 1.062e+02 1.230e+02 1.401e+02 2.481e+02, threshold=2.461e+02, percent-clipped=1.0 2023-11-18 06:15:17,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=95426.66666666667, ans=0.125 2023-11-18 06:15:26,914 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2300, loss[loss=0.1307, simple_loss=0.141, pruned_loss=0.04996, audio_tagging_loss=0.01026, over 14296.00 frames. ], tot_loss[loss=0.1354, simple_loss=0.1391, pruned_loss=0.05259, audio_tagging_loss=0.01321, over 3046852.30 frames. ], batch size: 54, lr: 3.01e-02, grad_scale: 32.0 2023-11-18 06:15:31,315 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-11-18 06:15:39,284 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.01 vs. limit=6.0 2023-11-18 06:15:43,392 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:15:53,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=95626.66666666667, ans=0.125 2023-11-18 06:16:04,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=95693.33333333333, ans=0.125 2023-11-18 06:16:07,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=95693.33333333333, ans=0.2 2023-11-18 06:16:12,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=95760.0, ans=15.0 2023-11-18 06:16:13,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=95760.0, ans=0.125 2023-11-18 06:16:15,590 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:16:24,199 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2350, loss[loss=0.122, simple_loss=0.1234, pruned_loss=0.04397, audio_tagging_loss=0.01633, over 15070.00 frames. ], tot_loss[loss=0.1379, simple_loss=0.1421, pruned_loss=0.05375, audio_tagging_loss=0.01314, over 3048896.24 frames. ], batch size: 57, lr: 3.00e-02, grad_scale: 32.0 2023-11-18 06:16:26,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=95826.66666666667, ans=0.0 2023-11-18 06:16:31,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=95826.66666666667, ans=0.125 2023-11-18 06:16:37,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=95893.33333333333, ans=0.125 2023-11-18 06:16:45,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=95960.0, ans=0.125 2023-11-18 06:17:06,433 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.925e+01 1.033e+02 1.167e+02 1.342e+02 2.194e+02, threshold=2.335e+02, percent-clipped=0.0 2023-11-18 06:17:14,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=96093.33333333333, ans=0.1 2023-11-18 06:17:20,437 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2400, loss[loss=0.135, simple_loss=0.1354, pruned_loss=0.05149, audio_tagging_loss=0.01578, over 14048.00 frames. ], tot_loss[loss=0.1365, simple_loss=0.1405, pruned_loss=0.05298, audio_tagging_loss=0.01322, over 3046710.75 frames. ], batch size: 54, lr: 3.00e-02, grad_scale: 32.0 2023-11-18 06:17:31,187 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.70 vs. limit=22.5 2023-11-18 06:17:31,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=96226.66666666667, ans=0.125 2023-11-18 06:17:57,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=96360.0, ans=0.0 2023-11-18 06:17:58,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=96360.0, ans=0.125 2023-11-18 06:18:16,570 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2450, loss[loss=0.1838, simple_loss=0.1929, pruned_loss=0.07537, audio_tagging_loss=0.01198, over 14809.00 frames. ], tot_loss[loss=0.1359, simple_loss=0.1397, pruned_loss=0.05279, audio_tagging_loss=0.01328, over 3056069.37 frames. ], batch size: 54, lr: 2.99e-02, grad_scale: 32.0 2023-11-18 06:18:28,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=96560.0, ans=0.1 2023-11-18 06:18:40,115 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.838e+00 2023-11-18 06:18:45,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=96626.66666666667, ans=0.125 2023-11-18 06:18:46,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=96626.66666666667, ans=0.125 2023-11-18 06:18:46,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=96626.66666666667, ans=0.1 2023-11-18 06:18:47,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=96626.66666666667, ans=0.04949747468305833 2023-11-18 06:18:58,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.568e+01 1.053e+02 1.171e+02 1.330e+02 1.894e+02, threshold=2.342e+02, percent-clipped=0.0 2023-11-18 06:19:01,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=96760.0, ans=0.2 2023-11-18 06:19:13,724 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2500, loss[loss=0.1571, simple_loss=0.1685, pruned_loss=0.06242, audio_tagging_loss=0.01045, over 15071.00 frames. ], tot_loss[loss=0.1351, simple_loss=0.1389, pruned_loss=0.0523, audio_tagging_loss=0.01331, over 3057350.10 frames. ], batch size: 57, lr: 2.99e-02, grad_scale: 32.0 2023-11-18 06:19:31,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=96893.33333333333, ans=0.2 2023-11-18 06:19:44,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=96960.0, ans=0.2 2023-11-18 06:19:45,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=96960.0, ans=0.125 2023-11-18 06:19:46,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=97026.66666666667, ans=0.125 2023-11-18 06:19:48,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=97026.66666666667, ans=0.125 2023-11-18 06:20:01,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=97093.33333333333, ans=15.0 2023-11-18 06:20:10,033 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2550, loss[loss=0.126, simple_loss=0.1336, pruned_loss=0.04802, audio_tagging_loss=0.01119, over 15234.00 frames. ], tot_loss[loss=0.1343, simple_loss=0.1384, pruned_loss=0.05202, audio_tagging_loss=0.0131, over 3057221.17 frames. ], batch size: 57, lr: 2.98e-02, grad_scale: 32.0 2023-11-18 06:20:13,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=97160.0, ans=0.125 2023-11-18 06:20:26,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=97226.66666666667, ans=0.125 2023-11-18 06:20:30,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=97226.66666666667, ans=0.125 2023-11-18 06:20:34,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=97293.33333333333, ans=0.0 2023-11-18 06:20:36,198 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=15.0 2023-11-18 06:20:46,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=97360.0, ans=0.09899494936611666 2023-11-18 06:20:51,781 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.040e+01 1.036e+02 1.193e+02 1.343e+02 1.842e+02, threshold=2.386e+02, percent-clipped=0.0 2023-11-18 06:21:02,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=97426.66666666667, ans=0.125 2023-11-18 06:21:05,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=97493.33333333333, ans=0.0 2023-11-18 06:21:06,241 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2600, loss[loss=0.1478, simple_loss=0.1507, pruned_loss=0.05934, audio_tagging_loss=0.01316, over 14829.00 frames. ], tot_loss[loss=0.134, simple_loss=0.1384, pruned_loss=0.05187, audio_tagging_loss=0.01297, over 3054105.78 frames. ], batch size: 56, lr: 2.98e-02, grad_scale: 32.0 2023-11-18 06:21:28,367 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.81 vs. limit=6.0 2023-11-18 06:21:29,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=97626.66666666667, ans=0.0 2023-11-18 06:21:33,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=97626.66666666667, ans=15.0 2023-11-18 06:21:42,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=97693.33333333333, ans=0.0 2023-11-18 06:21:44,061 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.55 vs. limit=15.0 2023-11-18 06:21:48,931 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2023-11-18 06:22:00,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=97760.0, ans=0.125 2023-11-18 06:22:02,761 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2650, loss[loss=0.1435, simple_loss=0.1463, pruned_loss=0.05852, audio_tagging_loss=0.01184, over 15516.00 frames. ], tot_loss[loss=0.1349, simple_loss=0.1392, pruned_loss=0.05246, audio_tagging_loss=0.01284, over 3048107.99 frames. ], batch size: 59, lr: 2.98e-02, grad_scale: 32.0 2023-11-18 06:22:30,721 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.00 vs. limit=6.0 2023-11-18 06:22:33,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=97960.0, ans=0.125 2023-11-18 06:22:43,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=98026.66666666667, ans=0.125 2023-11-18 06:22:44,861 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.930e+01 1.069e+02 1.231e+02 1.397e+02 2.138e+02, threshold=2.463e+02, percent-clipped=0.0 2023-11-18 06:22:59,922 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2700, loss[loss=0.1023, simple_loss=0.114, pruned_loss=0.02951, audio_tagging_loss=0.01574, over 15148.00 frames. ], tot_loss[loss=0.1354, simple_loss=0.1402, pruned_loss=0.0526, audio_tagging_loss=0.01264, over 3055823.80 frames. ], batch size: 56, lr: 2.97e-02, grad_scale: 32.0 2023-11-18 06:23:12,256 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.70 vs. limit=15.0 2023-11-18 06:23:18,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=98226.66666666667, ans=0.125 2023-11-18 06:23:19,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=98226.66666666667, ans=0.0 2023-11-18 06:23:55,658 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.95 vs. limit=15.0 2023-11-18 06:23:56,188 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2750, loss[loss=0.1304, simple_loss=0.143, pruned_loss=0.04682, audio_tagging_loss=0.01207, over 15689.00 frames. ], tot_loss[loss=0.1344, simple_loss=0.1391, pruned_loss=0.05224, audio_tagging_loss=0.01263, over 3056846.75 frames. ], batch size: 59, lr: 2.97e-02, grad_scale: 32.0 2023-11-18 06:23:59,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=98493.33333333333, ans=0.125 2023-11-18 06:24:02,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=98493.33333333333, ans=0.0 2023-11-18 06:24:05,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=98493.33333333333, ans=0.125 2023-11-18 06:24:06,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=98560.0, ans=0.125 2023-11-18 06:24:10,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=98560.0, ans=0.1 2023-11-18 06:24:19,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=98626.66666666667, ans=0.0 2023-11-18 06:24:29,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=98693.33333333333, ans=0.125 2023-11-18 06:24:37,490 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 1.008e+02 1.194e+02 1.354e+02 1.877e+02, threshold=2.388e+02, percent-clipped=0.0 2023-11-18 06:24:42,361 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:24:49,691 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.73 vs. limit=15.0 2023-11-18 06:24:52,452 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2800, loss[loss=0.1291, simple_loss=0.1317, pruned_loss=0.04663, audio_tagging_loss=0.01662, over 13918.00 frames. ], tot_loss[loss=0.1347, simple_loss=0.1394, pruned_loss=0.05236, audio_tagging_loss=0.01268, over 3049570.32 frames. ], batch size: 53, lr: 2.96e-02, grad_scale: 32.0 2023-11-18 06:24:52,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=98826.66666666667, ans=0.0 2023-11-18 06:25:07,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=98893.33333333333, ans=0.2 2023-11-18 06:25:09,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=98893.33333333333, ans=0.125 2023-11-18 06:25:26,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=99026.66666666667, ans=0.125 2023-11-18 06:25:38,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=99093.33333333333, ans=0.125 2023-11-18 06:25:48,864 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2850, loss[loss=0.129, simple_loss=0.1461, pruned_loss=0.04688, audio_tagging_loss=0.009044, over 15545.00 frames. ], tot_loss[loss=0.1354, simple_loss=0.1402, pruned_loss=0.05259, audio_tagging_loss=0.01273, over 3037203.10 frames. ], batch size: 57, lr: 2.96e-02, grad_scale: 32.0 2023-11-18 06:26:11,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=99293.33333333333, ans=0.0 2023-11-18 06:26:26,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=99360.0, ans=0.5 2023-11-18 06:26:28,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=99360.0, ans=0.2 2023-11-18 06:26:29,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=99360.0, ans=0.125 2023-11-18 06:26:30,370 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-18 06:26:30,676 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.061e+01 1.055e+02 1.276e+02 1.437e+02 2.072e+02, threshold=2.552e+02, percent-clipped=0.0 2023-11-18 06:26:42,665 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.05 vs. limit=22.5 2023-11-18 06:26:45,308 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2900, loss[loss=0.113, simple_loss=0.1096, pruned_loss=0.04141, audio_tagging_loss=0.0168, over 15870.00 frames. ], tot_loss[loss=0.1345, simple_loss=0.1393, pruned_loss=0.05208, audio_tagging_loss=0.01278, over 3033159.94 frames. ], batch size: 60, lr: 2.96e-02, grad_scale: 32.0 2023-11-18 06:27:06,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=99560.0, ans=0.125 2023-11-18 06:27:42,250 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 2950, loss[loss=0.1462, simple_loss=0.1474, pruned_loss=0.05961, audio_tagging_loss=0.01292, over 14316.00 frames. ], tot_loss[loss=0.1353, simple_loss=0.14, pruned_loss=0.05255, audio_tagging_loss=0.01278, over 3030214.95 frames. ], batch size: 53, lr: 2.95e-02, grad_scale: 16.0 2023-11-18 06:27:43,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=99826.66666666667, ans=0.0 2023-11-18 06:27:53,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=99893.33333333333, ans=0.0 2023-11-18 06:27:53,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=99893.33333333333, ans=0.125 2023-11-18 06:28:04,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=99960.0, ans=0.125 2023-11-18 06:28:13,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=99960.0, ans=0.125 2023-11-18 06:28:16,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=100026.66666666667, ans=22.5 2023-11-18 06:28:25,095 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.080e+01 1.058e+02 1.250e+02 1.448e+02 1.793e+02, threshold=2.500e+02, percent-clipped=0.0 2023-11-18 06:28:36,949 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.20 vs. limit=6.0 2023-11-18 06:28:38,684 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3000, loss[loss=0.1227, simple_loss=0.123, pruned_loss=0.04862, audio_tagging_loss=0.01254, over 15213.00 frames. ], tot_loss[loss=0.1346, simple_loss=0.1389, pruned_loss=0.05221, audio_tagging_loss=0.0129, over 3032838.60 frames. ], batch size: 58, lr: 2.95e-02, grad_scale: 16.0 2023-11-18 06:28:38,684 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 06:28:58,731 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.8056, 5.8340, 5.8449, 5.9410], device='cuda:2') 2023-11-18 06:29:12,299 INFO [train_asr.py:1147] (2/4) Epoch 2, validation: loss=0.0901, simple_loss=0.07118, pruned_loss=0.01674, audio_tagging_loss=0.03777, over 4681554.00 frames. 2023-11-18 06:29:12,300 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 06:29:26,104 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.00 vs. limit=6.0 2023-11-18 06:29:46,825 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:30:08,714 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3050, loss[loss=0.1201, simple_loss=0.1326, pruned_loss=0.04552, audio_tagging_loss=0.008331, over 15212.00 frames. ], tot_loss[loss=0.1358, simple_loss=0.1404, pruned_loss=0.05265, audio_tagging_loss=0.01295, over 3036032.93 frames. ], batch size: 57, lr: 2.94e-02, grad_scale: 16.0 2023-11-18 06:30:14,823 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=7.984e+00 2023-11-18 06:30:39,947 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:30:47,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=100693.33333333333, ans=0.0 2023-11-18 06:30:51,083 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.130e+01 1.055e+02 1.164e+02 1.306e+02 1.882e+02, threshold=2.329e+02, percent-clipped=0.0 2023-11-18 06:30:51,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=100693.33333333333, ans=0.0 2023-11-18 06:31:04,640 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3100, loss[loss=0.1571, simple_loss=0.1646, pruned_loss=0.0632, audio_tagging_loss=0.0116, over 14850.00 frames. ], tot_loss[loss=0.1353, simple_loss=0.1395, pruned_loss=0.0523, audio_tagging_loss=0.01319, over 3035218.79 frames. ], batch size: 54, lr: 2.94e-02, grad_scale: 16.0 2023-11-18 06:31:16,785 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.36 vs. limit=15.0 2023-11-18 06:31:26,565 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.28 vs. limit=15.0 2023-11-18 06:31:33,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=100960.0, ans=0.0 2023-11-18 06:31:38,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=101026.66666666667, ans=0.1 2023-11-18 06:31:41,706 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.80 vs. limit=15.0 2023-11-18 06:31:49,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=101093.33333333333, ans=0.2 2023-11-18 06:31:50,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=101093.33333333333, ans=0.125 2023-11-18 06:32:00,117 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3150, loss[loss=0.1257, simple_loss=0.1317, pruned_loss=0.04462, audio_tagging_loss=0.0152, over 16338.00 frames. ], tot_loss[loss=0.1355, simple_loss=0.1402, pruned_loss=0.05223, audio_tagging_loss=0.01322, over 3043157.75 frames. ], batch size: 60, lr: 2.94e-02, grad_scale: 16.0 2023-11-18 06:32:33,489 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.53 vs. limit=15.0 2023-11-18 06:32:34,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=101360.0, ans=0.125 2023-11-18 06:32:35,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=101360.0, ans=0.125 2023-11-18 06:32:43,611 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.183e+01 1.035e+02 1.177e+02 1.341e+02 1.863e+02, threshold=2.355e+02, percent-clipped=0.0 2023-11-18 06:32:52,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=101426.66666666667, ans=0.125 2023-11-18 06:32:58,090 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3200, loss[loss=0.138, simple_loss=0.1428, pruned_loss=0.05278, audio_tagging_loss=0.01378, over 15963.00 frames. ], tot_loss[loss=0.1343, simple_loss=0.1386, pruned_loss=0.05164, audio_tagging_loss=0.01331, over 3040863.97 frames. ], batch size: 56, lr: 2.93e-02, grad_scale: 32.0 2023-11-18 06:32:58,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=101493.33333333333, ans=0.125 2023-11-18 06:33:08,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=101560.0, ans=0.125 2023-11-18 06:33:18,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=101560.0, ans=0.125 2023-11-18 06:33:24,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=101626.66666666667, ans=0.0 2023-11-18 06:33:27,306 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.83 vs. limit=10.0 2023-11-18 06:33:38,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=101693.33333333333, ans=0.0 2023-11-18 06:33:44,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=101760.0, ans=0.0 2023-11-18 06:33:45,793 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.43 vs. limit=15.0 2023-11-18 06:33:47,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=101760.0, ans=0.1 2023-11-18 06:33:54,468 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3250, loss[loss=0.1108, simple_loss=0.1084, pruned_loss=0.03975, audio_tagging_loss=0.01685, over 15644.00 frames. ], tot_loss[loss=0.1323, simple_loss=0.1363, pruned_loss=0.0506, audio_tagging_loss=0.01358, over 3037503.23 frames. ], batch size: 60, lr: 2.93e-02, grad_scale: 32.0 2023-11-18 06:34:05,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=101893.33333333333, ans=0.2 2023-11-18 06:34:30,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=102026.66666666667, ans=0.0 2023-11-18 06:34:31,315 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.85 vs. limit=15.0 2023-11-18 06:34:37,267 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.501e+01 1.067e+02 1.209e+02 1.454e+02 2.188e+02, threshold=2.419e+02, percent-clipped=0.0 2023-11-18 06:34:37,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=102026.66666666667, ans=0.125 2023-11-18 06:34:37,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=102026.66666666667, ans=0.125 2023-11-18 06:34:38,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=102093.33333333333, ans=10.0 2023-11-18 06:34:50,103 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3300, loss[loss=0.1222, simple_loss=0.1294, pruned_loss=0.04681, audio_tagging_loss=0.01069, over 15485.00 frames. ], tot_loss[loss=0.1325, simple_loss=0.1364, pruned_loss=0.05074, audio_tagging_loss=0.01353, over 3039145.05 frames. ], batch size: 57, lr: 2.93e-02, grad_scale: 32.0 2023-11-18 06:34:51,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=102160.0, ans=0.125 2023-11-18 06:34:55,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=102160.0, ans=0.0 2023-11-18 06:35:10,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=102226.66666666667, ans=0.0 2023-11-18 06:35:12,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=102293.33333333333, ans=0.2 2023-11-18 06:35:14,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=102293.33333333333, ans=0.0 2023-11-18 06:35:21,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=102293.33333333333, ans=0.07 2023-11-18 06:35:34,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=102426.66666666667, ans=0.125 2023-11-18 06:35:35,669 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.06 vs. limit=15.0 2023-11-18 06:35:36,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=102426.66666666667, ans=0.0 2023-11-18 06:35:46,861 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3350, loss[loss=0.1192, simple_loss=0.1263, pruned_loss=0.04536, audio_tagging_loss=0.01066, over 15510.00 frames. ], tot_loss[loss=0.1326, simple_loss=0.1371, pruned_loss=0.05086, audio_tagging_loss=0.01314, over 3045105.71 frames. ], batch size: 59, lr: 2.92e-02, grad_scale: 32.0 2023-11-18 06:35:58,108 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.00 vs. limit=22.5 2023-11-18 06:35:58,554 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.00 vs. limit=6.0 2023-11-18 06:36:30,157 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.335e+01 1.052e+02 1.183e+02 1.313e+02 1.850e+02, threshold=2.366e+02, percent-clipped=0.0 2023-11-18 06:36:35,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=102760.0, ans=0.0 2023-11-18 06:36:35,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=102760.0, ans=0.125 2023-11-18 06:36:44,257 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3400, loss[loss=0.1333, simple_loss=0.1433, pruned_loss=0.05125, audio_tagging_loss=0.01042, over 14901.00 frames. ], tot_loss[loss=0.1332, simple_loss=0.1385, pruned_loss=0.05118, audio_tagging_loss=0.0128, over 3046462.66 frames. ], batch size: 54, lr: 2.92e-02, grad_scale: 32.0 2023-11-18 06:37:18,139 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2023-11-18 06:37:18,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=103026.66666666667, ans=0.125 2023-11-18 06:37:37,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=103093.33333333333, ans=0.1 2023-11-18 06:37:39,619 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3450, loss[loss=0.1511, simple_loss=0.1621, pruned_loss=0.05861, audio_tagging_loss=0.01143, over 15489.00 frames. ], tot_loss[loss=0.1335, simple_loss=0.1388, pruned_loss=0.05145, audio_tagging_loss=0.01266, over 3051617.55 frames. ], batch size: 57, lr: 2.91e-02, grad_scale: 32.0 2023-11-18 06:37:41,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=103160.0, ans=0.5 2023-11-18 06:37:55,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=103226.66666666667, ans=0.02 2023-11-18 06:38:00,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=103226.66666666667, ans=0.125 2023-11-18 06:38:12,808 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2023-11-18 06:38:21,873 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.158e+01 1.088e+02 1.277e+02 1.401e+02 2.193e+02, threshold=2.554e+02, percent-clipped=0.0 2023-11-18 06:38:30,357 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=12.0 2023-11-18 06:38:35,889 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3500, loss[loss=0.1114, simple_loss=0.1071, pruned_loss=0.04086, audio_tagging_loss=0.01701, over 14764.00 frames. ], tot_loss[loss=0.1328, simple_loss=0.138, pruned_loss=0.05123, audio_tagging_loss=0.01263, over 3049902.91 frames. ], batch size: 59, lr: 2.91e-02, grad_scale: 32.0 2023-11-18 06:38:55,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=103560.0, ans=0.0 2023-11-18 06:39:00,070 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2023-11-18 06:39:01,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=103626.66666666667, ans=0.125 2023-11-18 06:39:03,745 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:39:10,962 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=7.967e-03 2023-11-18 06:39:32,482 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3550, loss[loss=0.1519, simple_loss=0.1529, pruned_loss=0.06344, audio_tagging_loss=0.01198, over 15997.00 frames. ], tot_loss[loss=0.132, simple_loss=0.1368, pruned_loss=0.05103, audio_tagging_loss=0.01263, over 3047899.85 frames. ], batch size: 58, lr: 2.91e-02, grad_scale: 32.0 2023-11-18 06:39:32,707 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:39:55,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=103960.0, ans=0.125 2023-11-18 06:40:07,404 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.43 vs. limit=22.5 2023-11-18 06:40:09,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=104026.66666666667, ans=0.125 2023-11-18 06:40:15,310 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.977e+01 9.988e+01 1.160e+02 1.284e+02 2.391e+02, threshold=2.320e+02, percent-clipped=0.0 2023-11-18 06:40:28,311 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3600, loss[loss=0.1137, simple_loss=0.1151, pruned_loss=0.0452, audio_tagging_loss=0.01099, over 15219.00 frames. ], tot_loss[loss=0.1303, simple_loss=0.1349, pruned_loss=0.05017, audio_tagging_loss=0.01266, over 3045039.95 frames. ], batch size: 62, lr: 2.90e-02, grad_scale: 32.0 2023-11-18 06:40:29,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=104160.0, ans=0.0 2023-11-18 06:40:31,271 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.01 vs. limit=10.0 2023-11-18 06:40:47,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=104226.66666666667, ans=0.125 2023-11-18 06:40:57,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=104293.33333333333, ans=0.125 2023-11-18 06:41:24,472 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3650, loss[loss=0.1266, simple_loss=0.1338, pruned_loss=0.04872, audio_tagging_loss=0.01095, over 16668.00 frames. ], tot_loss[loss=0.1311, simple_loss=0.1357, pruned_loss=0.05052, audio_tagging_loss=0.01271, over 3043831.89 frames. ], batch size: 61, lr: 2.90e-02, grad_scale: 32.0 2023-11-18 06:41:58,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=104693.33333333333, ans=22.5 2023-11-18 06:42:01,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=104693.33333333333, ans=0.125 2023-11-18 06:42:03,648 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=18.74 vs. limit=22.5 2023-11-18 06:42:07,178 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.564e+01 1.051e+02 1.152e+02 1.363e+02 2.191e+02, threshold=2.304e+02, percent-clipped=0.0 2023-11-18 06:42:20,860 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3700, loss[loss=0.1594, simple_loss=0.1701, pruned_loss=0.06424, audio_tagging_loss=0.01004, over 15411.00 frames. ], tot_loss[loss=0.1335, simple_loss=0.1383, pruned_loss=0.05171, audio_tagging_loss=0.01267, over 3054236.32 frames. ], batch size: 60, lr: 2.90e-02, grad_scale: 32.0 2023-11-18 06:42:22,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=104826.66666666667, ans=0.2 2023-11-18 06:42:29,411 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.43 vs. limit=6.0 2023-11-18 06:42:32,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=104893.33333333333, ans=0.0 2023-11-18 06:42:34,974 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.73 vs. limit=15.0 2023-11-18 06:42:37,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=104893.33333333333, ans=0.125 2023-11-18 06:43:17,350 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3750, loss[loss=0.1536, simple_loss=0.1575, pruned_loss=0.06372, audio_tagging_loss=0.01113, over 14289.00 frames. ], tot_loss[loss=0.1338, simple_loss=0.1386, pruned_loss=0.05173, audio_tagging_loss=0.01278, over 3054504.47 frames. ], batch size: 52, lr: 2.89e-02, grad_scale: 32.0 2023-11-18 06:43:34,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=105226.66666666667, ans=0.125 2023-11-18 06:43:45,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=105293.33333333333, ans=0.2 2023-11-18 06:43:47,587 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=22.5 2023-11-18 06:43:48,855 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.52 vs. limit=15.0 2023-11-18 06:43:56,417 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:43:56,959 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.53 vs. limit=22.5 2023-11-18 06:44:00,706 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.685e+01 1.091e+02 1.248e+02 1.454e+02 2.022e+02, threshold=2.495e+02, percent-clipped=0.0 2023-11-18 06:44:02,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=105426.66666666667, ans=0.2 2023-11-18 06:44:03,421 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.91 vs. limit=22.5 2023-11-18 06:44:14,163 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3800, loss[loss=0.1377, simple_loss=0.141, pruned_loss=0.05166, audio_tagging_loss=0.01553, over 14942.00 frames. ], tot_loss[loss=0.1347, simple_loss=0.1396, pruned_loss=0.05208, audio_tagging_loss=0.0128, over 3052663.71 frames. ], batch size: 56, lr: 2.89e-02, grad_scale: 32.0 2023-11-18 06:44:31,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=105560.0, ans=0.125 2023-11-18 06:44:32,927 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.20 vs. limit=15.0 2023-11-18 06:44:59,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=105760.0, ans=0.0 2023-11-18 06:45:08,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=105760.0, ans=0.125 2023-11-18 06:45:10,938 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3850, loss[loss=0.1802, simple_loss=0.1877, pruned_loss=0.07586, audio_tagging_loss=0.01048, over 15465.00 frames. ], tot_loss[loss=0.1352, simple_loss=0.1401, pruned_loss=0.05229, audio_tagging_loss=0.01286, over 3051206.53 frames. ], batch size: 56, lr: 2.88e-02, grad_scale: 32.0 2023-11-18 06:45:11,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=105826.66666666667, ans=0.1 2023-11-18 06:45:28,229 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.26 vs. limit=10.0 2023-11-18 06:45:40,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=105960.0, ans=0.125 2023-11-18 06:45:53,834 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 1.026e+02 1.153e+02 1.299e+02 2.070e+02, threshold=2.305e+02, percent-clipped=0.0 2023-11-18 06:45:56,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=106093.33333333333, ans=0.0 2023-11-18 06:46:06,617 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3900, loss[loss=0.19, simple_loss=0.2078, pruned_loss=0.07656, audio_tagging_loss=0.009499, over 16001.00 frames. ], tot_loss[loss=0.1347, simple_loss=0.1395, pruned_loss=0.05213, audio_tagging_loss=0.01285, over 3051114.43 frames. ], batch size: 55, lr: 2.88e-02, grad_scale: 32.0 2023-11-18 06:46:06,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=106160.0, ans=0.2 2023-11-18 06:46:12,958 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.77 vs. limit=10.0 2023-11-18 06:46:27,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=106226.66666666667, ans=0.2 2023-11-18 06:46:36,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=106293.33333333333, ans=0.2 2023-11-18 06:46:43,670 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.98 vs. limit=15.0 2023-11-18 06:47:01,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=106426.66666666667, ans=0.1 2023-11-18 06:47:03,423 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 3950, loss[loss=0.1529, simple_loss=0.1565, pruned_loss=0.06221, audio_tagging_loss=0.01247, over 15995.00 frames. ], tot_loss[loss=0.1347, simple_loss=0.1395, pruned_loss=0.05196, audio_tagging_loss=0.01298, over 3051702.40 frames. ], batch size: 59, lr: 2.88e-02, grad_scale: 32.0 2023-11-18 06:47:18,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=106560.0, ans=0.1 2023-11-18 06:47:40,286 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.33 vs. limit=12.0 2023-11-18 06:47:43,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=106693.33333333333, ans=0.0 2023-11-18 06:47:43,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=106693.33333333333, ans=0.1 2023-11-18 06:47:48,874 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 1.025e+02 1.127e+02 1.249e+02 1.832e+02, threshold=2.254e+02, percent-clipped=0.0 2023-11-18 06:47:51,679 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.72 vs. limit=10.0 2023-11-18 06:47:52,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=106760.0, ans=0.07 2023-11-18 06:47:53,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=106760.0, ans=0.0 2023-11-18 06:48:02,293 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4000, loss[loss=0.1189, simple_loss=0.1171, pruned_loss=0.04756, audio_tagging_loss=0.01284, over 15112.00 frames. ], tot_loss[loss=0.1357, simple_loss=0.1406, pruned_loss=0.05241, audio_tagging_loss=0.01302, over 3044844.61 frames. ], batch size: 58, lr: 2.87e-02, grad_scale: 32.0 2023-11-18 06:48:21,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=106893.33333333333, ans=10.0 2023-11-18 06:48:27,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=106960.0, ans=0.0 2023-11-18 06:48:29,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=106960.0, ans=0.125 2023-11-18 06:48:30,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=106960.0, ans=0.0 2023-11-18 06:48:49,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=107093.33333333333, ans=0.0 2023-11-18 06:48:54,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=107093.33333333333, ans=0.07 2023-11-18 06:48:56,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=107093.33333333333, ans=0.0 2023-11-18 06:48:58,327 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2023-11-18 06:48:58,671 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4050, loss[loss=0.1091, simple_loss=0.1094, pruned_loss=0.04093, audio_tagging_loss=0.01352, over 15632.00 frames. ], tot_loss[loss=0.1339, simple_loss=0.1384, pruned_loss=0.05151, audio_tagging_loss=0.01322, over 3049207.90 frames. ], batch size: 60, lr: 2.87e-02, grad_scale: 32.0 2023-11-18 06:48:59,810 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:49:03,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=107160.0, ans=0.125 2023-11-18 06:49:08,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=107160.0, ans=0.5 2023-11-18 06:49:26,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=107293.33333333333, ans=0.0 2023-11-18 06:49:35,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=107360.0, ans=0.2 2023-11-18 06:49:41,710 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.845e+01 1.077e+02 1.199e+02 1.331e+02 2.496e+02, threshold=2.397e+02, percent-clipped=1.0 2023-11-18 06:49:54,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=107493.33333333333, ans=0.1 2023-11-18 06:49:55,690 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4100, loss[loss=0.1374, simple_loss=0.1506, pruned_loss=0.05121, audio_tagging_loss=0.01088, over 16837.00 frames. ], tot_loss[loss=0.1338, simple_loss=0.1383, pruned_loss=0.05145, audio_tagging_loss=0.01317, over 3044543.50 frames. ], batch size: 62, lr: 2.87e-02, grad_scale: 32.0 2023-11-18 06:49:56,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=107493.33333333333, ans=0.0 2023-11-18 06:50:05,170 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.55 vs. limit=15.0 2023-11-18 06:50:15,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=107560.0, ans=0.1 2023-11-18 06:50:46,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=107760.0, ans=0.0 2023-11-18 06:50:50,525 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.25 vs. limit=12.0 2023-11-18 06:50:51,908 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4150, loss[loss=0.105, simple_loss=0.1092, pruned_loss=0.03718, audio_tagging_loss=0.0132, over 14193.00 frames. ], tot_loss[loss=0.1329, simple_loss=0.1377, pruned_loss=0.05107, audio_tagging_loss=0.01301, over 3041324.40 frames. ], batch size: 55, lr: 2.86e-02, grad_scale: 32.0 2023-11-18 06:50:53,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=107826.66666666667, ans=0.0 2023-11-18 06:50:54,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=107826.66666666667, ans=0.1 2023-11-18 06:51:28,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=108026.66666666667, ans=0.1 2023-11-18 06:51:29,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=108026.66666666667, ans=0.125 2023-11-18 06:51:31,967 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:51:32,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=108026.66666666667, ans=0.125 2023-11-18 06:51:35,152 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.314e+01 1.032e+02 1.149e+02 1.336e+02 2.371e+02, threshold=2.297e+02, percent-clipped=0.0 2023-11-18 06:51:43,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=108093.33333333333, ans=0.125 2023-11-18 06:51:48,694 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4200, loss[loss=0.1249, simple_loss=0.1305, pruned_loss=0.04319, audio_tagging_loss=0.01644, over 14603.00 frames. ], tot_loss[loss=0.1317, simple_loss=0.1369, pruned_loss=0.05037, audio_tagging_loss=0.01294, over 3042660.41 frames. ], batch size: 54, lr: 2.86e-02, grad_scale: 32.0 2023-11-18 06:51:58,812 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.15 vs. limit=15.0 2023-11-18 06:51:59,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=108226.66666666667, ans=0.0 2023-11-18 06:52:03,681 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.42 vs. limit=15.0 2023-11-18 06:52:10,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=108293.33333333333, ans=0.125 2023-11-18 06:52:15,424 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.75 vs. limit=15.0 2023-11-18 06:52:33,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=108426.66666666667, ans=0.125 2023-11-18 06:52:44,349 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4250, loss[loss=0.1262, simple_loss=0.1383, pruned_loss=0.04685, audio_tagging_loss=0.01017, over 14935.00 frames. ], tot_loss[loss=0.1327, simple_loss=0.1379, pruned_loss=0.05089, audio_tagging_loss=0.01286, over 3051467.99 frames. ], batch size: 55, lr: 2.85e-02, grad_scale: 32.0 2023-11-18 06:52:57,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=108560.0, ans=0.09899494936611666 2023-11-18 06:52:57,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=108560.0, ans=0.1 2023-11-18 06:53:02,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=108560.0, ans=0.125 2023-11-18 06:53:26,871 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 9.019e+01 1.076e+02 1.189e+02 1.301e+02 1.957e+02, threshold=2.378e+02, percent-clipped=0.0 2023-11-18 06:53:41,483 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4300, loss[loss=0.1228, simple_loss=0.1251, pruned_loss=0.04336, audio_tagging_loss=0.01688, over 15377.00 frames. ], tot_loss[loss=0.1339, simple_loss=0.1394, pruned_loss=0.05145, audio_tagging_loss=0.01272, over 3055080.41 frames. ], batch size: 58, lr: 2.85e-02, grad_scale: 32.0 2023-11-18 06:54:11,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=108960.0, ans=0.1 2023-11-18 06:54:12,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=108960.0, ans=0.125 2023-11-18 06:54:21,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=109026.66666666667, ans=0.1 2023-11-18 06:54:36,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=109160.0, ans=0.125 2023-11-18 06:54:37,463 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4350, loss[loss=0.1429, simple_loss=0.1496, pruned_loss=0.05615, audio_tagging_loss=0.01194, over 14726.00 frames. ], tot_loss[loss=0.1339, simple_loss=0.1397, pruned_loss=0.05153, audio_tagging_loss=0.01251, over 3051769.73 frames. ], batch size: 57, lr: 2.85e-02, grad_scale: 32.0 2023-11-18 06:54:42,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=109160.0, ans=0.1 2023-11-18 06:54:43,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=109160.0, ans=0.1 2023-11-18 06:54:44,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=109160.0, ans=0.2 2023-11-18 06:55:04,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=109293.33333333333, ans=0.0 2023-11-18 06:55:20,485 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.140e+01 1.013e+02 1.155e+02 1.315e+02 2.105e+02, threshold=2.309e+02, percent-clipped=0.0 2023-11-18 06:55:28,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=109426.66666666667, ans=0.09899494936611666 2023-11-18 06:55:30,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=109426.66666666667, ans=10.0 2023-11-18 06:55:33,458 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4400, loss[loss=0.1482, simple_loss=0.1601, pruned_loss=0.0554, audio_tagging_loss=0.01271, over 16195.00 frames. ], tot_loss[loss=0.1331, simple_loss=0.1391, pruned_loss=0.05113, audio_tagging_loss=0.01247, over 3049871.53 frames. ], batch size: 59, lr: 2.84e-02, grad_scale: 32.0 2023-11-18 06:55:45,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=109560.0, ans=0.0 2023-11-18 06:56:11,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=109693.33333333333, ans=0.5 2023-11-18 06:56:28,762 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.66 vs. limit=15.0 2023-11-18 06:56:29,210 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4450, loss[loss=0.1853, simple_loss=0.197, pruned_loss=0.07702, audio_tagging_loss=0.009791, over 15571.00 frames. ], tot_loss[loss=0.1329, simple_loss=0.1388, pruned_loss=0.0511, audio_tagging_loss=0.01243, over 3050160.75 frames. ], batch size: 56, lr: 2.84e-02, grad_scale: 32.0 2023-11-18 06:56:29,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=109826.66666666667, ans=0.1 2023-11-18 06:56:43,069 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.92 vs. limit=6.0 2023-11-18 06:56:46,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=109893.33333333333, ans=0.125 2023-11-18 06:56:54,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=109960.0, ans=0.0 2023-11-18 06:57:06,422 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.94 vs. limit=15.0 2023-11-18 06:57:11,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=110026.66666666667, ans=0.09899494936611666 2023-11-18 06:57:11,964 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.260e+01 1.110e+02 1.220e+02 1.457e+02 2.260e+02, threshold=2.440e+02, percent-clipped=0.0 2023-11-18 06:57:17,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=110093.33333333333, ans=15.0 2023-11-18 06:57:26,484 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4500, loss[loss=0.1474, simple_loss=0.1517, pruned_loss=0.05801, audio_tagging_loss=0.01353, over 14669.00 frames. ], tot_loss[loss=0.1337, simple_loss=0.1399, pruned_loss=0.05137, audio_tagging_loss=0.01236, over 3055465.24 frames. ], batch size: 55, lr: 2.84e-02, grad_scale: 32.0 2023-11-18 06:57:49,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=110293.33333333333, ans=15.0 2023-11-18 06:58:09,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=110360.0, ans=0.125 2023-11-18 06:58:20,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=110426.66666666667, ans=0.05 2023-11-18 06:58:22,425 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4550, loss[loss=0.1716, simple_loss=0.1812, pruned_loss=0.07102, audio_tagging_loss=0.01002, over 15447.00 frames. ], tot_loss[loss=0.1333, simple_loss=0.1395, pruned_loss=0.05111, audio_tagging_loss=0.01244, over 3047235.17 frames. ], batch size: 56, lr: 2.83e-02, grad_scale: 32.0 2023-11-18 06:58:29,906 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.80 vs. limit=15.0 2023-11-18 06:58:31,194 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.52 vs. limit=15.0 2023-11-18 06:58:32,155 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.03 vs. limit=12.0 2023-11-18 06:58:58,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=110693.33333333333, ans=0.125 2023-11-18 06:59:05,467 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.958e+01 1.009e+02 1.148e+02 1.280e+02 1.877e+02, threshold=2.295e+02, percent-clipped=0.0 2023-11-18 06:59:05,505 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:59:18,757 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4600, loss[loss=0.1185, simple_loss=0.1152, pruned_loss=0.04479, audio_tagging_loss=0.01611, over 14370.00 frames. ], tot_loss[loss=0.1317, simple_loss=0.1374, pruned_loss=0.05038, audio_tagging_loss=0.01266, over 3056326.15 frames. ], batch size: 57, lr: 2.83e-02, grad_scale: 32.0 2023-11-18 06:59:35,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=110893.33333333333, ans=0.0 2023-11-18 06:59:49,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=110960.0, ans=0.0 2023-11-18 06:59:50,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=110960.0, ans=0.1 2023-11-18 07:00:00,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=111026.66666666667, ans=0.09899494936611666 2023-11-18 07:00:15,442 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4650, loss[loss=0.129, simple_loss=0.1297, pruned_loss=0.0465, audio_tagging_loss=0.01768, over 14672.00 frames. ], tot_loss[loss=0.1307, simple_loss=0.1356, pruned_loss=0.04991, audio_tagging_loss=0.01301, over 3053838.44 frames. ], batch size: 57, lr: 2.83e-02, grad_scale: 32.0 2023-11-18 07:00:22,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=111160.0, ans=0.125 2023-11-18 07:00:27,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=111226.66666666667, ans=0.125 2023-11-18 07:00:28,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=111226.66666666667, ans=0.0 2023-11-18 07:00:31,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=111226.66666666667, ans=0.2 2023-11-18 07:00:44,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=111293.33333333333, ans=0.125 2023-11-18 07:00:51,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=111360.0, ans=0.04949747468305833 2023-11-18 07:00:58,028 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.738e+01 1.062e+02 1.161e+02 1.332e+02 2.161e+02, threshold=2.322e+02, percent-clipped=0.0 2023-11-18 07:01:10,899 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4700, loss[loss=0.1868, simple_loss=0.1968, pruned_loss=0.07774, audio_tagging_loss=0.01064, over 15667.00 frames. ], tot_loss[loss=0.1305, simple_loss=0.1355, pruned_loss=0.04965, audio_tagging_loss=0.01311, over 3052763.01 frames. ], batch size: 59, lr: 2.82e-02, grad_scale: 32.0 2023-11-18 07:01:18,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=111493.33333333333, ans=0.125 2023-11-18 07:01:23,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=111560.0, ans=0.2 2023-11-18 07:01:31,886 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.27 vs. limit=12.0 2023-11-18 07:01:33,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=111626.66666666667, ans=0.0 2023-11-18 07:01:43,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=111626.66666666667, ans=0.125 2023-11-18 07:01:44,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=111693.33333333333, ans=0.0 2023-11-18 07:01:51,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=111693.33333333333, ans=0.125 2023-11-18 07:01:52,021 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.43 vs. limit=15.0 2023-11-18 07:01:52,029 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.87 vs. limit=15.0 2023-11-18 07:01:52,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=111693.33333333333, ans=0.1 2023-11-18 07:02:06,896 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4750, loss[loss=0.1432, simple_loss=0.145, pruned_loss=0.05364, audio_tagging_loss=0.01704, over 15327.00 frames. ], tot_loss[loss=0.1302, simple_loss=0.1352, pruned_loss=0.0494, audio_tagging_loss=0.01323, over 3043830.85 frames. ], batch size: 56, lr: 2.82e-02, grad_scale: 32.0 2023-11-18 07:02:24,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=111893.33333333333, ans=0.125 2023-11-18 07:02:30,282 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2023-11-18 07:02:49,768 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.988e+01 1.063e+02 1.146e+02 1.305e+02 1.876e+02, threshold=2.292e+02, percent-clipped=0.0 2023-11-18 07:02:58,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=112093.33333333333, ans=0.125 2023-11-18 07:03:03,779 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4800, loss[loss=0.1122, simple_loss=0.1158, pruned_loss=0.04018, audio_tagging_loss=0.0141, over 16191.00 frames. ], tot_loss[loss=0.1313, simple_loss=0.136, pruned_loss=0.04999, audio_tagging_loss=0.01334, over 3049414.95 frames. ], batch size: 62, lr: 2.82e-02, grad_scale: 32.0 2023-11-18 07:03:15,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=112226.66666666667, ans=0.1 2023-11-18 07:03:17,410 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.19 vs. limit=22.5 2023-11-18 07:03:22,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=112226.66666666667, ans=0.1 2023-11-18 07:03:55,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=112426.66666666667, ans=0.0 2023-11-18 07:03:59,950 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4850, loss[loss=0.1583, simple_loss=0.1618, pruned_loss=0.06513, audio_tagging_loss=0.01228, over 15333.00 frames. ], tot_loss[loss=0.1311, simple_loss=0.1355, pruned_loss=0.04988, audio_tagging_loss=0.01346, over 3049121.92 frames. ], batch size: 54, lr: 2.81e-02, grad_scale: 32.0 2023-11-18 07:04:04,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=112493.33333333333, ans=10.0 2023-11-18 07:04:12,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=112560.0, ans=0.1 2023-11-18 07:04:31,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=112626.66666666667, ans=0.0 2023-11-18 07:04:42,630 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 1.047e+02 1.164e+02 1.344e+02 1.766e+02, threshold=2.328e+02, percent-clipped=0.0 2023-11-18 07:04:43,946 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:04:56,041 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4900, loss[loss=0.1231, simple_loss=0.1311, pruned_loss=0.0471, audio_tagging_loss=0.0105, over 15221.00 frames. ], tot_loss[loss=0.131, simple_loss=0.1358, pruned_loss=0.05005, audio_tagging_loss=0.01309, over 3047228.95 frames. ], batch size: 57, lr: 2.81e-02, grad_scale: 32.0 2023-11-18 07:04:57,859 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.78 vs. limit=15.0 2023-11-18 07:04:58,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=112826.66666666667, ans=0.125 2023-11-18 07:04:59,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_na.min_abs, batch_count=112826.66666666667, ans=0.02 2023-11-18 07:05:15,636 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2023-11-18 07:05:17,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=112960.0, ans=0.1 2023-11-18 07:05:38,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=113026.66666666667, ans=0.025 2023-11-18 07:05:43,421 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=11.19 vs. limit=12.0 2023-11-18 07:05:47,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=113093.33333333333, ans=0.0 2023-11-18 07:05:51,906 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 4950, loss[loss=0.1319, simple_loss=0.1383, pruned_loss=0.05088, audio_tagging_loss=0.01191, over 16551.00 frames. ], tot_loss[loss=0.1304, simple_loss=0.1354, pruned_loss=0.04976, audio_tagging_loss=0.0129, over 3046625.22 frames. ], batch size: 61, lr: 2.81e-02, grad_scale: 64.0 2023-11-18 07:06:00,372 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2023-11-18 07:06:19,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=113293.33333333333, ans=0.125 2023-11-18 07:06:33,151 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.63 vs. limit=6.0 2023-11-18 07:06:34,754 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.114e+01 1.049e+02 1.181e+02 1.339e+02 2.582e+02, threshold=2.362e+02, percent-clipped=1.0 2023-11-18 07:06:43,338 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2023-11-18 07:06:47,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=113493.33333333333, ans=0.125 2023-11-18 07:06:48,213 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5000, loss[loss=0.122, simple_loss=0.1342, pruned_loss=0.04282, audio_tagging_loss=0.01206, over 15334.00 frames. ], tot_loss[loss=0.1303, simple_loss=0.1357, pruned_loss=0.04971, audio_tagging_loss=0.01272, over 3040625.96 frames. ], batch size: 57, lr: 2.80e-02, grad_scale: 64.0 2023-11-18 07:06:54,848 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.01 vs. limit=22.5 2023-11-18 07:07:04,828 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.06 vs. limit=22.5 2023-11-18 07:07:08,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=113560.0, ans=0.5 2023-11-18 07:07:15,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=113626.66666666667, ans=0.125 2023-11-18 07:07:44,835 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5050, loss[loss=0.133, simple_loss=0.1412, pruned_loss=0.05101, audio_tagging_loss=0.0114, over 15975.00 frames. ], tot_loss[loss=0.1307, simple_loss=0.1363, pruned_loss=0.04991, audio_tagging_loss=0.01264, over 3038810.97 frames. ], batch size: 57, lr: 2.80e-02, grad_scale: 64.0 2023-11-18 07:08:02,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=113893.33333333333, ans=0.2 2023-11-18 07:08:19,166 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.46 vs. limit=15.0 2023-11-18 07:08:27,547 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.042e+01 1.016e+02 1.164e+02 1.342e+02 1.810e+02, threshold=2.328e+02, percent-clipped=0.0 2023-11-18 07:08:33,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=114093.33333333333, ans=0.0 2023-11-18 07:08:41,044 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5100, loss[loss=0.1342, simple_loss=0.1459, pruned_loss=0.0507, audio_tagging_loss=0.01057, over 14898.00 frames. ], tot_loss[loss=0.1305, simple_loss=0.1359, pruned_loss=0.04988, audio_tagging_loss=0.01264, over 3041030.60 frames. ], batch size: 55, lr: 2.79e-02, grad_scale: 64.0 2023-11-18 07:08:43,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=114160.0, ans=0.2 2023-11-18 07:08:50,694 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.40 vs. limit=15.0 2023-11-18 07:08:53,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=114226.66666666667, ans=0.0 2023-11-18 07:09:07,091 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.69 vs. limit=15.0 2023-11-18 07:09:24,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=114360.0, ans=0.0 2023-11-18 07:09:37,375 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5150, loss[loss=0.1386, simple_loss=0.1578, pruned_loss=0.04925, audio_tagging_loss=0.01049, over 16103.00 frames. ], tot_loss[loss=0.1305, simple_loss=0.1362, pruned_loss=0.0498, audio_tagging_loss=0.01263, over 3044672.83 frames. ], batch size: 61, lr: 2.79e-02, grad_scale: 16.0 2023-11-18 07:09:43,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=114493.33333333333, ans=0.125 2023-11-18 07:09:54,710 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.939e+00 2023-11-18 07:09:58,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=114626.66666666667, ans=0.0 2023-11-18 07:10:04,222 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.53 vs. limit=15.0 2023-11-18 07:10:16,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=114693.33333333333, ans=0.0 2023-11-18 07:10:22,527 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.209e+01 1.018e+02 1.157e+02 1.320e+02 3.492e+02, threshold=2.315e+02, percent-clipped=2.0 2023-11-18 07:10:30,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=114760.0, ans=0.0 2023-11-18 07:10:30,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=114760.0, ans=0.125 2023-11-18 07:10:30,752 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.23 vs. limit=15.0 2023-11-18 07:10:33,997 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5200, loss[loss=0.1232, simple_loss=0.1246, pruned_loss=0.04538, audio_tagging_loss=0.01552, over 16022.00 frames. ], tot_loss[loss=0.1307, simple_loss=0.1367, pruned_loss=0.04988, audio_tagging_loss=0.01248, over 3038836.65 frames. ], batch size: 62, lr: 2.79e-02, grad_scale: 32.0 2023-11-18 07:10:48,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=114893.33333333333, ans=0.125 2023-11-18 07:10:51,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=114893.33333333333, ans=0.0 2023-11-18 07:10:58,141 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.05 vs. limit=22.5 2023-11-18 07:11:08,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=115026.66666666667, ans=0.07 2023-11-18 07:11:21,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=115093.33333333333, ans=0.035 2023-11-18 07:11:30,237 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5250, loss[loss=0.1217, simple_loss=0.1334, pruned_loss=0.04325, audio_tagging_loss=0.01179, over 14822.00 frames. ], tot_loss[loss=0.1314, simple_loss=0.1376, pruned_loss=0.05025, audio_tagging_loss=0.01237, over 3043665.87 frames. ], batch size: 56, lr: 2.78e-02, grad_scale: 32.0 2023-11-18 07:12:03,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=115360.0, ans=0.2 2023-11-18 07:12:15,374 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.068e+01 1.019e+02 1.120e+02 1.285e+02 1.660e+02, threshold=2.240e+02, percent-clipped=0.0 2023-11-18 07:12:15,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=115426.66666666667, ans=0.0 2023-11-18 07:12:24,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=115426.66666666667, ans=0.125 2023-11-18 07:12:26,632 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5300, loss[loss=0.1736, simple_loss=0.1763, pruned_loss=0.07328, audio_tagging_loss=0.01213, over 15903.00 frames. ], tot_loss[loss=0.1325, simple_loss=0.1388, pruned_loss=0.05082, audio_tagging_loss=0.01226, over 3042317.52 frames. ], batch size: 57, lr: 2.78e-02, grad_scale: 32.0 2023-11-18 07:12:27,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=115493.33333333333, ans=0.125 2023-11-18 07:12:31,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=115493.33333333333, ans=0.1 2023-11-18 07:12:49,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=115626.66666666667, ans=0.05 2023-11-18 07:12:55,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=115626.66666666667, ans=0.2 2023-11-18 07:12:57,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=115626.66666666667, ans=0.125 2023-11-18 07:12:57,601 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.62 vs. limit=6.0 2023-11-18 07:13:10,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=115760.0, ans=0.125 2023-11-18 07:13:22,350 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5350, loss[loss=0.1202, simple_loss=0.1243, pruned_loss=0.04811, audio_tagging_loss=0.009941, over 15417.00 frames. ], tot_loss[loss=0.1321, simple_loss=0.1381, pruned_loss=0.0506, audio_tagging_loss=0.01238, over 3035098.68 frames. ], batch size: 57, lr: 2.78e-02, grad_scale: 32.0 2023-11-18 07:13:35,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=115893.33333333333, ans=0.2 2023-11-18 07:13:37,625 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.99 vs. limit=22.5 2023-11-18 07:13:45,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=115960.0, ans=0.125 2023-11-18 07:13:58,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=116026.66666666667, ans=0.125 2023-11-18 07:13:59,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=116026.66666666667, ans=0.1 2023-11-18 07:14:06,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=116026.66666666667, ans=0.125 2023-11-18 07:14:08,565 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.439e+01 1.052e+02 1.204e+02 1.359e+02 2.060e+02, threshold=2.407e+02, percent-clipped=0.0 2023-11-18 07:14:20,306 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5400, loss[loss=0.1547, simple_loss=0.1681, pruned_loss=0.05719, audio_tagging_loss=0.01344, over 14911.00 frames. ], tot_loss[loss=0.1319, simple_loss=0.1382, pruned_loss=0.05037, audio_tagging_loss=0.01237, over 3043736.58 frames. ], batch size: 55, lr: 2.77e-02, grad_scale: 32.0 2023-11-18 07:14:49,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=116293.33333333333, ans=0.0 2023-11-18 07:15:13,826 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.72 vs. limit=15.0 2023-11-18 07:15:16,454 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5450, loss[loss=0.1622, simple_loss=0.1766, pruned_loss=0.06261, audio_tagging_loss=0.01132, over 15561.00 frames. ], tot_loss[loss=0.1306, simple_loss=0.1366, pruned_loss=0.04963, audio_tagging_loss=0.01262, over 3034078.11 frames. ], batch size: 58, lr: 2.77e-02, grad_scale: 32.0 2023-11-18 07:15:29,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=116560.0, ans=0.125 2023-11-18 07:15:44,988 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.88 vs. limit=6.0 2023-11-18 07:15:54,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=116693.33333333333, ans=0.125 2023-11-18 07:16:01,586 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.036e+01 1.012e+02 1.167e+02 1.341e+02 1.969e+02, threshold=2.335e+02, percent-clipped=0.0 2023-11-18 07:16:10,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=116760.0, ans=0.125 2023-11-18 07:16:12,379 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5500, loss[loss=0.1324, simple_loss=0.1268, pruned_loss=0.05347, audio_tagging_loss=0.01555, over 16497.00 frames. ], tot_loss[loss=0.1307, simple_loss=0.1367, pruned_loss=0.04964, audio_tagging_loss=0.01273, over 3041121.62 frames. ], batch size: 62, lr: 2.77e-02, grad_scale: 32.0 2023-11-18 07:16:44,105 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:16:48,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=117026.66666666667, ans=0.1 2023-11-18 07:17:08,005 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5550, loss[loss=0.1376, simple_loss=0.1365, pruned_loss=0.04999, audio_tagging_loss=0.01933, over 15354.00 frames. ], tot_loss[loss=0.1316, simple_loss=0.1375, pruned_loss=0.04999, audio_tagging_loss=0.01288, over 3043014.97 frames. ], batch size: 58, lr: 2.76e-02, grad_scale: 32.0 2023-11-18 07:17:17,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=117160.0, ans=0.1 2023-11-18 07:17:21,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=117226.66666666667, ans=0.0 2023-11-18 07:17:32,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=117293.33333333333, ans=0.015 2023-11-18 07:17:36,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=117293.33333333333, ans=0.07 2023-11-18 07:17:46,347 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.72 vs. limit=10.0 2023-11-18 07:17:53,712 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.091e+01 1.048e+02 1.161e+02 1.291e+02 1.886e+02, threshold=2.323e+02, percent-clipped=0.0 2023-11-18 07:17:56,924 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=11.18 vs. limit=15.0 2023-11-18 07:18:05,484 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5600, loss[loss=0.1521, simple_loss=0.1695, pruned_loss=0.05772, audio_tagging_loss=0.009669, over 15552.00 frames. ], tot_loss[loss=0.1311, simple_loss=0.1374, pruned_loss=0.0495, audio_tagging_loss=0.01292, over 3049501.00 frames. ], batch size: 55, lr: 2.76e-02, grad_scale: 32.0 2023-11-18 07:18:21,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=117560.0, ans=0.0 2023-11-18 07:18:21,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=117560.0, ans=0.0 2023-11-18 07:18:32,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=117626.66666666667, ans=0.0 2023-11-18 07:18:45,158 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 07:18:47,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=117693.33333333333, ans=0.125 2023-11-18 07:18:54,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=117760.0, ans=0.125 2023-11-18 07:19:01,201 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5650, loss[loss=0.09946, simple_loss=0.08823, pruned_loss=0.03777, audio_tagging_loss=0.01757, over 15553.00 frames. ], tot_loss[loss=0.1306, simple_loss=0.1368, pruned_loss=0.04921, audio_tagging_loss=0.01301, over 3054482.14 frames. ], batch size: 60, lr: 2.76e-02, grad_scale: 32.0 2023-11-18 07:19:01,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=117826.66666666667, ans=0.125 2023-11-18 07:19:01,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=117826.66666666667, ans=0.125 2023-11-18 07:19:04,100 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.35 vs. limit=12.0 2023-11-18 07:19:11,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=117893.33333333333, ans=0.2 2023-11-18 07:19:19,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=117893.33333333333, ans=0.125 2023-11-18 07:19:19,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=117893.33333333333, ans=0.1 2023-11-18 07:19:31,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=117960.0, ans=0.0 2023-11-18 07:19:32,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=117960.0, ans=0.035 2023-11-18 07:19:36,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=118026.66666666667, ans=0.125 2023-11-18 07:19:46,191 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 1.030e+02 1.132e+02 1.306e+02 2.340e+02, threshold=2.264e+02, percent-clipped=1.0 2023-11-18 07:19:57,378 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5700, loss[loss=0.1247, simple_loss=0.1382, pruned_loss=0.04537, audio_tagging_loss=0.01018, over 15772.00 frames. ], tot_loss[loss=0.1305, simple_loss=0.1363, pruned_loss=0.04942, audio_tagging_loss=0.01292, over 3051169.07 frames. ], batch size: 58, lr: 2.75e-02, grad_scale: 32.0 2023-11-18 07:20:13,979 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.32 vs. limit=15.0 2023-11-18 07:20:20,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=118293.33333333333, ans=0.95 2023-11-18 07:20:20,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=118293.33333333333, ans=0.2 2023-11-18 07:20:28,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=118293.33333333333, ans=0.125 2023-11-18 07:20:47,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=118426.66666666667, ans=0.2 2023-11-18 07:20:53,848 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5750, loss[loss=0.1442, simple_loss=0.1517, pruned_loss=0.05656, audio_tagging_loss=0.01182, over 14281.00 frames. ], tot_loss[loss=0.1305, simple_loss=0.1365, pruned_loss=0.04942, audio_tagging_loss=0.01282, over 3045985.70 frames. ], batch size: 53, lr: 2.75e-02, grad_scale: 32.0 2023-11-18 07:21:01,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=118493.33333333333, ans=0.125 2023-11-18 07:21:05,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=118560.0, ans=0.0 2023-11-18 07:21:21,268 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.00 vs. limit=15.0 2023-11-18 07:21:36,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=118693.33333333333, ans=0.0 2023-11-18 07:21:37,012 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2023-11-18 07:21:38,451 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 1.021e+02 1.146e+02 1.318e+02 2.072e+02, threshold=2.291e+02, percent-clipped=0.0 2023-11-18 07:21:44,272 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.68 vs. limit=15.0 2023-11-18 07:21:49,062 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5800, loss[loss=0.1536, simple_loss=0.1768, pruned_loss=0.0549, audio_tagging_loss=0.0103, over 15180.00 frames. ], tot_loss[loss=0.1301, simple_loss=0.1362, pruned_loss=0.0493, audio_tagging_loss=0.01268, over 3049201.04 frames. ], batch size: 57, lr: 2.75e-02, grad_scale: 32.0 2023-11-18 07:21:52,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=118826.66666666667, ans=0.1 2023-11-18 07:22:05,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=118893.33333333333, ans=0.2 2023-11-18 07:22:24,821 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2023-11-18 07:22:31,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=119026.66666666667, ans=0.0 2023-11-18 07:22:44,824 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5850, loss[loss=0.1418, simple_loss=0.1436, pruned_loss=0.0557, audio_tagging_loss=0.01432, over 14540.00 frames. ], tot_loss[loss=0.1309, simple_loss=0.137, pruned_loss=0.04977, audio_tagging_loss=0.01258, over 3043945.51 frames. ], batch size: 57, lr: 2.74e-02, grad_scale: 32.0 2023-11-18 07:22:56,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=119226.66666666667, ans=0.125 2023-11-18 07:23:14,393 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.17 vs. limit=15.0 2023-11-18 07:23:29,154 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.248e+01 1.029e+02 1.172e+02 1.323e+02 1.755e+02, threshold=2.344e+02, percent-clipped=0.0 2023-11-18 07:23:34,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=119426.66666666667, ans=0.125 2023-11-18 07:23:36,506 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.82 vs. limit=15.0 2023-11-18 07:23:40,986 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5900, loss[loss=0.1648, simple_loss=0.165, pruned_loss=0.0693, audio_tagging_loss=0.01297, over 15485.00 frames. ], tot_loss[loss=0.131, simple_loss=0.1373, pruned_loss=0.04982, audio_tagging_loss=0.0125, over 3051748.24 frames. ], batch size: 58, lr: 2.74e-02, grad_scale: 32.0 2023-11-18 07:24:10,189 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=15.0 2023-11-18 07:24:15,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=119693.33333333333, ans=0.125 2023-11-18 07:24:34,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=119760.0, ans=0.125 2023-11-18 07:24:36,657 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 5950, loss[loss=0.1549, simple_loss=0.1566, pruned_loss=0.06611, audio_tagging_loss=0.01051, over 15144.00 frames. ], tot_loss[loss=0.131, simple_loss=0.1375, pruned_loss=0.04981, audio_tagging_loss=0.01242, over 3046092.06 frames. ], batch size: 57, lr: 2.74e-02, grad_scale: 32.0 2023-11-18 07:24:46,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=119826.66666666667, ans=0.125 2023-11-18 07:25:00,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=119960.0, ans=0.0 2023-11-18 07:25:12,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=120026.66666666667, ans=0.1 2023-11-18 07:25:16,505 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.35 vs. limit=6.0 2023-11-18 07:25:21,029 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.274e+01 1.016e+02 1.169e+02 1.321e+02 1.949e+02, threshold=2.338e+02, percent-clipped=0.0 2023-11-18 07:25:29,484 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.52 vs. limit=12.0 2023-11-18 07:25:32,023 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6000, loss[loss=0.1238, simple_loss=0.1247, pruned_loss=0.04844, audio_tagging_loss=0.01301, over 16042.00 frames. ], tot_loss[loss=0.1305, simple_loss=0.1368, pruned_loss=0.0496, audio_tagging_loss=0.01246, over 3046468.30 frames. ], batch size: 61, lr: 2.73e-02, grad_scale: 32.0 2023-11-18 07:25:32,023 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 07:26:04,343 INFO [train_asr.py:1147] (2/4) Epoch 2, validation: loss=0.08772, simple_loss=0.06916, pruned_loss=0.01519, audio_tagging_loss=0.03794, over 4681554.00 frames. 2023-11-18 07:26:04,343 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 07:26:07,775 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.893e+00 2023-11-18 07:26:30,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=120293.33333333333, ans=0.125 2023-11-18 07:26:33,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=120293.33333333333, ans=0.1 2023-11-18 07:26:36,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=120360.0, ans=0.125 2023-11-18 07:26:39,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=120360.0, ans=0.0 2023-11-18 07:26:44,862 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 07:26:48,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=120426.66666666667, ans=0.1 2023-11-18 07:26:59,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=120493.33333333333, ans=0.0 2023-11-18 07:27:00,584 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6050, loss[loss=0.1181, simple_loss=0.1299, pruned_loss=0.04292, audio_tagging_loss=0.01019, over 15508.00 frames. ], tot_loss[loss=0.1297, simple_loss=0.1358, pruned_loss=0.0493, audio_tagging_loss=0.01253, over 3051893.44 frames. ], batch size: 57, lr: 2.73e-02, grad_scale: 32.0 2023-11-18 07:27:18,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=120560.0, ans=0.1 2023-11-18 07:27:43,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=120693.33333333333, ans=0.0 2023-11-18 07:27:46,215 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.607e+01 1.101e+02 1.234e+02 1.349e+02 2.388e+02, threshold=2.468e+02, percent-clipped=1.0 2023-11-18 07:27:57,582 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6100, loss[loss=0.09354, simple_loss=0.09884, pruned_loss=0.02841, audio_tagging_loss=0.01571, over 14326.00 frames. ], tot_loss[loss=0.1298, simple_loss=0.1362, pruned_loss=0.04932, audio_tagging_loss=0.01235, over 3048172.17 frames. ], batch size: 57, lr: 2.73e-02, grad_scale: 32.0 2023-11-18 07:28:09,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=120893.33333333333, ans=0.125 2023-11-18 07:28:28,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=120960.0, ans=0.0 2023-11-18 07:28:32,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=121026.66666666667, ans=0.05 2023-11-18 07:28:34,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=121026.66666666667, ans=0.125 2023-11-18 07:28:38,373 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.24 vs. limit=10.0 2023-11-18 07:28:54,893 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6150, loss[loss=0.1769, simple_loss=0.1919, pruned_loss=0.0721, audio_tagging_loss=0.008859, over 16452.00 frames. ], tot_loss[loss=0.13, simple_loss=0.1361, pruned_loss=0.0495, audio_tagging_loss=0.01244, over 3045994.72 frames. ], batch size: 61, lr: 2.73e-02, grad_scale: 32.0 2023-11-18 07:28:59,656 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.42 vs. limit=15.0 2023-11-18 07:29:26,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=121293.33333333333, ans=0.0 2023-11-18 07:29:27,230 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.75 vs. limit=15.0 2023-11-18 07:29:38,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=121360.0, ans=0.0 2023-11-18 07:29:40,653 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.796e+01 1.065e+02 1.218e+02 1.371e+02 2.442e+02, threshold=2.436e+02, percent-clipped=0.0 2023-11-18 07:29:52,073 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6200, loss[loss=0.1245, simple_loss=0.1245, pruned_loss=0.04771, audio_tagging_loss=0.01459, over 14933.00 frames. ], tot_loss[loss=0.1304, simple_loss=0.1365, pruned_loss=0.04966, audio_tagging_loss=0.01254, over 3049588.84 frames. ], batch size: 55, lr: 2.72e-02, grad_scale: 32.0 2023-11-18 07:30:12,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=121560.0, ans=0.1 2023-11-18 07:30:18,624 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=3.62 vs. limit=15.0 2023-11-18 07:30:48,808 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6250, loss[loss=0.1436, simple_loss=0.1506, pruned_loss=0.0568, audio_tagging_loss=0.01152, over 15914.00 frames. ], tot_loss[loss=0.1304, simple_loss=0.1361, pruned_loss=0.04956, audio_tagging_loss=0.01274, over 3045818.61 frames. ], batch size: 60, lr: 2.72e-02, grad_scale: 32.0 2023-11-18 07:31:03,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=121893.33333333333, ans=0.2 2023-11-18 07:31:03,719 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2023-11-18 07:31:12,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=121960.0, ans=0.125 2023-11-18 07:31:22,183 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.29 vs. limit=22.5 2023-11-18 07:31:33,887 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 1.000e+02 1.109e+02 1.237e+02 1.670e+02, threshold=2.218e+02, percent-clipped=0.0 2023-11-18 07:31:38,747 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.07 vs. limit=10.0 2023-11-18 07:31:45,288 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6300, loss[loss=0.1344, simple_loss=0.1457, pruned_loss=0.04961, audio_tagging_loss=0.01198, over 16713.00 frames. ], tot_loss[loss=0.1299, simple_loss=0.1358, pruned_loss=0.04921, audio_tagging_loss=0.01279, over 3042973.91 frames. ], batch size: 61, lr: 2.72e-02, grad_scale: 32.0 2023-11-18 07:31:51,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=122160.0, ans=0.0 2023-11-18 07:32:04,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=122226.66666666667, ans=0.125 2023-11-18 07:32:26,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=122360.0, ans=0.0 2023-11-18 07:32:42,036 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6350, loss[loss=0.1691, simple_loss=0.1646, pruned_loss=0.07113, audio_tagging_loss=0.01568, over 14679.00 frames. ], tot_loss[loss=0.1311, simple_loss=0.137, pruned_loss=0.04969, audio_tagging_loss=0.01288, over 3039948.09 frames. ], batch size: 56, lr: 2.71e-02, grad_scale: 32.0 2023-11-18 07:32:47,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=122493.33333333333, ans=0.125 2023-11-18 07:32:53,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=122560.0, ans=0.1 2023-11-18 07:33:00,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=122560.0, ans=0.125 2023-11-18 07:33:00,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=122560.0, ans=6.0 2023-11-18 07:33:02,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=122560.0, ans=0.125 2023-11-18 07:33:22,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=122693.33333333333, ans=0.0 2023-11-18 07:33:27,680 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.961e+01 1.030e+02 1.146e+02 1.327e+02 2.114e+02, threshold=2.291e+02, percent-clipped=0.0 2023-11-18 07:33:32,758 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.85 vs. limit=12.0 2023-11-18 07:33:39,606 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6400, loss[loss=0.1402, simple_loss=0.1464, pruned_loss=0.05456, audio_tagging_loss=0.01243, over 14800.00 frames. ], tot_loss[loss=0.1317, simple_loss=0.1375, pruned_loss=0.05001, audio_tagging_loss=0.01292, over 3041828.02 frames. ], batch size: 55, lr: 2.71e-02, grad_scale: 32.0 2023-11-18 07:33:52,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=122893.33333333333, ans=0.015 2023-11-18 07:34:03,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=122960.0, ans=0.0 2023-11-18 07:34:35,470 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6450, loss[loss=0.119, simple_loss=0.1231, pruned_loss=0.04303, audio_tagging_loss=0.01438, over 16350.00 frames. ], tot_loss[loss=0.1312, simple_loss=0.1369, pruned_loss=0.04969, audio_tagging_loss=0.01302, over 3048455.64 frames. ], batch size: 61, lr: 2.71e-02, grad_scale: 32.0 2023-11-18 07:34:37,332 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.23 vs. limit=15.0 2023-11-18 07:35:16,940 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-11-18 07:35:20,676 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 1.021e+02 1.177e+02 1.311e+02 2.345e+02, threshold=2.354e+02, percent-clipped=1.0 2023-11-18 07:35:24,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=123426.66666666667, ans=0.0 2023-11-18 07:35:30,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=123493.33333333333, ans=0.0 2023-11-18 07:35:31,884 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6500, loss[loss=0.1148, simple_loss=0.1222, pruned_loss=0.04129, audio_tagging_loss=0.01245, over 14860.00 frames. ], tot_loss[loss=0.1314, simple_loss=0.1372, pruned_loss=0.04984, audio_tagging_loss=0.01293, over 3048871.82 frames. ], batch size: 56, lr: 2.70e-02, grad_scale: 32.0 2023-11-18 07:35:59,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=123626.66666666667, ans=0.125 2023-11-18 07:36:06,499 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:36:07,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=123693.33333333333, ans=0.0 2023-11-18 07:36:22,215 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.75 vs. limit=22.5 2023-11-18 07:36:28,332 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6550, loss[loss=0.125, simple_loss=0.1393, pruned_loss=0.04622, audio_tagging_loss=0.009193, over 16271.00 frames. ], tot_loss[loss=0.1309, simple_loss=0.1371, pruned_loss=0.0497, audio_tagging_loss=0.01262, over 3059963.70 frames. ], batch size: 60, lr: 2.70e-02, grad_scale: 32.0 2023-11-18 07:36:28,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=123826.66666666667, ans=0.125 2023-11-18 07:36:48,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=123893.33333333333, ans=0.1 2023-11-18 07:37:13,826 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 9.989e+01 1.139e+02 1.347e+02 1.768e+02, threshold=2.277e+02, percent-clipped=0.0 2023-11-18 07:37:18,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=124093.33333333333, ans=0.025 2023-11-18 07:37:21,598 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.23 vs. limit=15.0 2023-11-18 07:37:25,655 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6600, loss[loss=0.1349, simple_loss=0.1424, pruned_loss=0.05239, audio_tagging_loss=0.01133, over 14268.00 frames. ], tot_loss[loss=0.1307, simple_loss=0.137, pruned_loss=0.04968, audio_tagging_loss=0.0125, over 3050726.79 frames. ], batch size: 55, lr: 2.70e-02, grad_scale: 32.0 2023-11-18 07:37:31,222 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=15.0 2023-11-18 07:37:32,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=124160.0, ans=0.0 2023-11-18 07:37:39,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=124226.66666666667, ans=0.125 2023-11-18 07:37:50,636 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.39 vs. limit=22.5 2023-11-18 07:38:06,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=124360.0, ans=0.0 2023-11-18 07:38:22,510 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6650, loss[loss=0.099, simple_loss=0.1078, pruned_loss=0.03308, audio_tagging_loss=0.01203, over 16101.00 frames. ], tot_loss[loss=0.1298, simple_loss=0.1365, pruned_loss=0.04925, audio_tagging_loss=0.01236, over 3042939.64 frames. ], batch size: 62, lr: 2.69e-02, grad_scale: 32.0 2023-11-18 07:38:22,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=124493.33333333333, ans=0.125 2023-11-18 07:38:43,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=124626.66666666667, ans=0.95 2023-11-18 07:39:02,018 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.57 vs. limit=22.5 2023-11-18 07:39:07,722 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.903e+01 1.041e+02 1.128e+02 1.286e+02 1.870e+02, threshold=2.255e+02, percent-clipped=0.0 2023-11-18 07:39:08,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=124760.0, ans=0.125 2023-11-18 07:39:14,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=124760.0, ans=0.125 2023-11-18 07:39:18,528 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6700, loss[loss=0.1512, simple_loss=0.1653, pruned_loss=0.05927, audio_tagging_loss=0.009257, over 15843.00 frames. ], tot_loss[loss=0.1302, simple_loss=0.1371, pruned_loss=0.04938, audio_tagging_loss=0.01232, over 3044778.86 frames. ], batch size: 59, lr: 2.69e-02, grad_scale: 32.0 2023-11-18 07:39:22,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=124826.66666666667, ans=0.0 2023-11-18 07:39:40,618 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2023-11-18 07:39:44,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=124960.0, ans=0.0 2023-11-18 07:39:45,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=124960.0, ans=0.125 2023-11-18 07:39:48,850 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:39:49,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=124960.0, ans=0.0 2023-11-18 07:39:53,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=125026.66666666667, ans=0.09899494936611666 2023-11-18 07:40:16,335 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6750, loss[loss=0.1191, simple_loss=0.1235, pruned_loss=0.04412, audio_tagging_loss=0.01323, over 14597.00 frames. ], tot_loss[loss=0.1304, simple_loss=0.137, pruned_loss=0.04946, audio_tagging_loss=0.01243, over 3037604.79 frames. ], batch size: 57, lr: 2.69e-02, grad_scale: 32.0 2023-11-18 07:40:23,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=125160.0, ans=0.1 2023-11-18 07:40:31,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=125226.66666666667, ans=0.0 2023-11-18 07:40:37,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=125293.33333333333, ans=0.2 2023-11-18 07:40:38,915 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.59 vs. limit=15.0 2023-11-18 07:40:42,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=125293.33333333333, ans=0.125 2023-11-18 07:41:01,711 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.301e+01 1.017e+02 1.137e+02 1.334e+02 2.157e+02, threshold=2.275e+02, percent-clipped=0.0 2023-11-18 07:41:02,349 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.00 vs. limit=15.0 2023-11-18 07:41:04,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=125426.66666666667, ans=0.125 2023-11-18 07:41:09,339 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.33 vs. limit=10.0 2023-11-18 07:41:13,076 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6800, loss[loss=0.1167, simple_loss=0.13, pruned_loss=0.04046, audio_tagging_loss=0.01127, over 15523.00 frames. ], tot_loss[loss=0.1309, simple_loss=0.1371, pruned_loss=0.04981, audio_tagging_loss=0.0125, over 3042173.47 frames. ], batch size: 56, lr: 2.68e-02, grad_scale: 32.0 2023-11-18 07:41:13,484 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.45 vs. limit=22.5 2023-11-18 07:41:17,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=125493.33333333333, ans=0.1 2023-11-18 07:41:28,479 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.03 vs. limit=6.0 2023-11-18 07:41:32,029 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.013e+00 2023-11-18 07:41:44,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=125626.66666666667, ans=0.0 2023-11-18 07:42:09,007 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6850, loss[loss=0.1315, simple_loss=0.1411, pruned_loss=0.04873, audio_tagging_loss=0.01227, over 14757.00 frames. ], tot_loss[loss=0.1299, simple_loss=0.136, pruned_loss=0.04926, audio_tagging_loss=0.01263, over 3040443.01 frames. ], batch size: 57, lr: 2.68e-02, grad_scale: 32.0 2023-11-18 07:42:24,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=125893.33333333333, ans=0.0 2023-11-18 07:42:36,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=125960.0, ans=0.07 2023-11-18 07:42:43,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=126026.66666666667, ans=0.09899494936611666 2023-11-18 07:42:54,310 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.768e+01 9.921e+01 1.152e+02 1.334e+02 2.003e+02, threshold=2.305e+02, percent-clipped=0.0 2023-11-18 07:43:05,706 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6900, loss[loss=0.1343, simple_loss=0.145, pruned_loss=0.04881, audio_tagging_loss=0.01297, over 15882.00 frames. ], tot_loss[loss=0.1294, simple_loss=0.1356, pruned_loss=0.04896, audio_tagging_loss=0.01258, over 3044922.49 frames. ], batch size: 60, lr: 2.68e-02, grad_scale: 32.0 2023-11-18 07:43:09,212 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.86 vs. limit=10.0 2023-11-18 07:43:19,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=126226.66666666667, ans=0.0 2023-11-18 07:43:32,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=126293.33333333333, ans=0.125 2023-11-18 07:43:32,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=126293.33333333333, ans=0.0 2023-11-18 07:43:35,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=126293.33333333333, ans=0.125 2023-11-18 07:43:44,087 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2023-11-18 07:43:48,927 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 07:43:56,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=126426.66666666667, ans=0.125 2023-11-18 07:44:01,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=126426.66666666667, ans=0.2 2023-11-18 07:44:03,090 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 6950, loss[loss=0.1461, simple_loss=0.1512, pruned_loss=0.05824, audio_tagging_loss=0.01225, over 14464.00 frames. ], tot_loss[loss=0.1309, simple_loss=0.1377, pruned_loss=0.04956, audio_tagging_loss=0.01251, over 3041500.89 frames. ], batch size: 53, lr: 2.68e-02, grad_scale: 32.0 2023-11-18 07:44:04,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=126493.33333333333, ans=0.0 2023-11-18 07:44:33,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=126626.66666666667, ans=0.125 2023-11-18 07:44:34,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=126626.66666666667, ans=0.125 2023-11-18 07:44:42,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=126693.33333333333, ans=0.0 2023-11-18 07:44:48,853 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.915e+01 1.002e+02 1.157e+02 1.287e+02 1.874e+02, threshold=2.315e+02, percent-clipped=0.0 2023-11-18 07:44:54,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=126760.0, ans=0.125 2023-11-18 07:44:55,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=126760.0, ans=0.1 2023-11-18 07:44:59,693 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7000, loss[loss=0.1545, simple_loss=0.1584, pruned_loss=0.06057, audio_tagging_loss=0.01476, over 15665.00 frames. ], tot_loss[loss=0.1307, simple_loss=0.1375, pruned_loss=0.04942, audio_tagging_loss=0.01252, over 3038024.23 frames. ], batch size: 58, lr: 2.67e-02, grad_scale: 32.0 2023-11-18 07:45:17,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=126893.33333333333, ans=0.125 2023-11-18 07:45:21,768 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.45 vs. limit=22.5 2023-11-18 07:45:22,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=126960.0, ans=0.1 2023-11-18 07:45:33,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=127026.66666666667, ans=0.0 2023-11-18 07:45:49,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=127093.33333333333, ans=0.025 2023-11-18 07:45:50,435 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.18 vs. limit=15.0 2023-11-18 07:45:51,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=127093.33333333333, ans=0.0 2023-11-18 07:45:56,130 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7050, loss[loss=0.07604, simple_loss=0.07224, pruned_loss=0.02353, audio_tagging_loss=0.0164, over 15385.00 frames. ], tot_loss[loss=0.1304, simple_loss=0.1374, pruned_loss=0.04915, audio_tagging_loss=0.0126, over 3034631.05 frames. ], batch size: 61, lr: 2.67e-02, grad_scale: 32.0 2023-11-18 07:46:12,913 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:46:19,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=127293.33333333333, ans=0.0 2023-11-18 07:46:20,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=127293.33333333333, ans=0.2 2023-11-18 07:46:20,804 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.48 vs. limit=15.0 2023-11-18 07:46:41,217 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 1.025e+02 1.162e+02 1.246e+02 1.816e+02, threshold=2.324e+02, percent-clipped=0.0 2023-11-18 07:46:41,770 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=15.0 2023-11-18 07:46:53,117 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7100, loss[loss=0.128, simple_loss=0.1369, pruned_loss=0.04821, audio_tagging_loss=0.01131, over 14774.00 frames. ], tot_loss[loss=0.1294, simple_loss=0.1361, pruned_loss=0.04866, audio_tagging_loss=0.01266, over 3040276.62 frames. ], batch size: 55, lr: 2.67e-02, grad_scale: 32.0 2023-11-18 07:47:03,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=127560.0, ans=0.2 2023-11-18 07:47:18,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=127626.66666666667, ans=0.125 2023-11-18 07:47:42,657 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=22.5 2023-11-18 07:47:49,831 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7150, loss[loss=0.07751, simple_loss=0.08288, pruned_loss=0.02499, audio_tagging_loss=0.01108, over 15530.00 frames. ], tot_loss[loss=0.1282, simple_loss=0.1347, pruned_loss=0.04807, audio_tagging_loss=0.01284, over 3046691.08 frames. ], batch size: 60, lr: 2.66e-02, grad_scale: 64.0 2023-11-18 07:47:58,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=127826.66666666667, ans=0.0 2023-11-18 07:48:00,852 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2023-11-18 07:48:26,174 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.26 vs. limit=12.0 2023-11-18 07:48:35,313 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.931e+01 1.022e+02 1.134e+02 1.283e+02 2.595e+02, threshold=2.267e+02, percent-clipped=2.0 2023-11-18 07:48:41,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=128093.33333333333, ans=0.125 2023-11-18 07:48:46,826 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7200, loss[loss=0.1131, simple_loss=0.1046, pruned_loss=0.0454, audio_tagging_loss=0.01539, over 14512.00 frames. ], tot_loss[loss=0.1292, simple_loss=0.1357, pruned_loss=0.04841, audio_tagging_loss=0.01291, over 3049701.85 frames. ], batch size: 58, lr: 2.66e-02, grad_scale: 64.0 2023-11-18 07:49:10,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=128293.33333333333, ans=10.0 2023-11-18 07:49:23,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=128360.0, ans=0.0 2023-11-18 07:49:26,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=128360.0, ans=0.1 2023-11-18 07:49:39,618 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2023-11-18 07:49:42,295 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.04 vs. limit=15.0 2023-11-18 07:49:43,999 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7250, loss[loss=0.1303, simple_loss=0.1312, pruned_loss=0.05153, audio_tagging_loss=0.01323, over 14727.00 frames. ], tot_loss[loss=0.1299, simple_loss=0.1363, pruned_loss=0.04889, audio_tagging_loss=0.01289, over 3044855.88 frames. ], batch size: 59, lr: 2.66e-02, grad_scale: 64.0 2023-11-18 07:49:44,539 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.08 vs. limit=15.0 2023-11-18 07:49:51,062 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.73 vs. limit=15.0 2023-11-18 07:49:57,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=128560.0, ans=0.125 2023-11-18 07:50:05,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=128626.66666666667, ans=0.0 2023-11-18 07:50:12,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=128626.66666666667, ans=0.125 2023-11-18 07:50:29,532 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.726e+01 1.028e+02 1.141e+02 1.257e+02 1.791e+02, threshold=2.282e+02, percent-clipped=0.0 2023-11-18 07:50:33,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=128760.0, ans=0.1 2023-11-18 07:50:40,926 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7300, loss[loss=0.1357, simple_loss=0.1406, pruned_loss=0.05191, audio_tagging_loss=0.01349, over 14748.00 frames. ], tot_loss[loss=0.1294, simple_loss=0.1359, pruned_loss=0.04866, audio_tagging_loss=0.01277, over 3043362.72 frames. ], batch size: 55, lr: 2.65e-02, grad_scale: 64.0 2023-11-18 07:50:52,806 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.66 vs. limit=22.5 2023-11-18 07:51:06,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=128960.0, ans=0.04949747468305833 2023-11-18 07:51:07,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=128960.0, ans=6.0 2023-11-18 07:51:07,706 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.13 vs. limit=12.0 2023-11-18 07:51:15,744 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.31 vs. limit=15.0 2023-11-18 07:51:32,466 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2023-11-18 07:51:37,926 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7350, loss[loss=0.1631, simple_loss=0.1616, pruned_loss=0.07128, audio_tagging_loss=0.01106, over 14727.00 frames. ], tot_loss[loss=0.1309, simple_loss=0.1381, pruned_loss=0.04942, audio_tagging_loss=0.01246, over 3053097.83 frames. ], batch size: 56, lr: 2.65e-02, grad_scale: 64.0 2023-11-18 07:51:43,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=129160.0, ans=0.125 2023-11-18 07:51:46,886 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.74 vs. limit=15.0 2023-11-18 07:52:11,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=129360.0, ans=0.125 2023-11-18 07:52:16,306 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.58 vs. limit=15.0 2023-11-18 07:52:23,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 1.003e+02 1.122e+02 1.264e+02 2.098e+02, threshold=2.243e+02, percent-clipped=0.0 2023-11-18 07:52:28,905 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2023-11-18 07:52:35,935 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7400, loss[loss=0.1133, simple_loss=0.109, pruned_loss=0.0455, audio_tagging_loss=0.01332, over 14375.00 frames. ], tot_loss[loss=0.1293, simple_loss=0.1363, pruned_loss=0.04874, audio_tagging_loss=0.01242, over 3055026.53 frames. ], batch size: 55, lr: 2.65e-02, grad_scale: 64.0 2023-11-18 07:53:00,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=129626.66666666667, ans=0.1 2023-11-18 07:53:00,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=129626.66666666667, ans=0.125 2023-11-18 07:53:29,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=129760.0, ans=0.1 2023-11-18 07:53:32,142 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7450, loss[loss=0.1378, simple_loss=0.1602, pruned_loss=0.04878, audio_tagging_loss=0.008884, over 16274.00 frames. ], tot_loss[loss=0.1298, simple_loss=0.1371, pruned_loss=0.04894, audio_tagging_loss=0.01225, over 3063640.41 frames. ], batch size: 58, lr: 2.65e-02, grad_scale: 64.0 2023-11-18 07:53:47,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=129893.33333333333, ans=0.125 2023-11-18 07:53:52,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=129893.33333333333, ans=0.015 2023-11-18 07:54:15,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=130026.66666666667, ans=0.125 2023-11-18 07:54:17,509 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 1.036e+02 1.151e+02 1.366e+02 1.976e+02, threshold=2.301e+02, percent-clipped=0.0 2023-11-18 07:54:29,347 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7500, loss[loss=0.1216, simple_loss=0.1258, pruned_loss=0.04434, audio_tagging_loss=0.01431, over 15160.00 frames. ], tot_loss[loss=0.13, simple_loss=0.137, pruned_loss=0.04912, audio_tagging_loss=0.01233, over 3058747.74 frames. ], batch size: 57, lr: 2.64e-02, grad_scale: 64.0 2023-11-18 07:55:21,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=130426.66666666667, ans=0.0 2023-11-18 07:55:25,989 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7550, loss[loss=0.1097, simple_loss=0.1176, pruned_loss=0.03993, audio_tagging_loss=0.01102, over 15391.00 frames. ], tot_loss[loss=0.1305, simple_loss=0.1379, pruned_loss=0.04934, audio_tagging_loss=0.01227, over 3055779.39 frames. ], batch size: 59, lr: 2.64e-02, grad_scale: 64.0 2023-11-18 07:55:28,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=130493.33333333333, ans=0.125 2023-11-18 07:55:49,368 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2023-11-18 07:55:51,339 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.74 vs. limit=10.0 2023-11-18 07:56:08,840 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.01 vs. limit=6.0 2023-11-18 07:56:11,648 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2023-11-18 07:56:12,138 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.352e+01 1.032e+02 1.116e+02 1.247e+02 1.797e+02, threshold=2.232e+02, percent-clipped=0.0 2023-11-18 07:56:22,919 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7600, loss[loss=0.1472, simple_loss=0.1515, pruned_loss=0.05836, audio_tagging_loss=0.01311, over 13785.00 frames. ], tot_loss[loss=0.1289, simple_loss=0.136, pruned_loss=0.04852, audio_tagging_loss=0.01239, over 3052268.80 frames. ], batch size: 53, lr: 2.64e-02, grad_scale: 64.0 2023-11-18 07:57:02,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=131026.66666666667, ans=0.125 2023-11-18 07:57:05,593 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2023-11-18 07:57:19,657 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7650, loss[loss=0.1046, simple_loss=0.1034, pruned_loss=0.03672, audio_tagging_loss=0.01612, over 15801.00 frames. ], tot_loss[loss=0.1279, simple_loss=0.1348, pruned_loss=0.04798, audio_tagging_loss=0.01252, over 3050839.03 frames. ], batch size: 61, lr: 2.63e-02, grad_scale: 64.0 2023-11-18 07:57:30,493 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.58 vs. limit=15.0 2023-11-18 07:57:35,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=131226.66666666666, ans=0.0 2023-11-18 07:57:47,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=131293.33333333334, ans=0.0 2023-11-18 07:58:05,096 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.764e+01 1.041e+02 1.157e+02 1.349e+02 1.751e+02, threshold=2.314e+02, percent-clipped=0.0 2023-11-18 07:58:16,545 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7700, loss[loss=0.1321, simple_loss=0.1405, pruned_loss=0.04982, audio_tagging_loss=0.01206, over 14150.00 frames. ], tot_loss[loss=0.1281, simple_loss=0.1354, pruned_loss=0.0479, audio_tagging_loss=0.01249, over 3046161.14 frames. ], batch size: 54, lr: 2.63e-02, grad_scale: 64.0 2023-11-18 07:58:17,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=131493.33333333334, ans=0.0 2023-11-18 07:58:18,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=131493.33333333334, ans=0.1 2023-11-18 07:58:23,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=131493.33333333334, ans=0.125 2023-11-18 07:58:39,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=131626.66666666666, ans=0.125 2023-11-18 07:58:46,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=131626.66666666666, ans=0.2 2023-11-18 07:58:49,075 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.88 vs. limit=6.0 2023-11-18 07:58:56,535 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.95 vs. limit=6.0 2023-11-18 07:58:59,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=131693.33333333334, ans=0.125 2023-11-18 07:59:13,124 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7750, loss[loss=0.11, simple_loss=0.1121, pruned_loss=0.0364, audio_tagging_loss=0.0176, over 15236.00 frames. ], tot_loss[loss=0.1283, simple_loss=0.1354, pruned_loss=0.04796, audio_tagging_loss=0.0126, over 3048792.15 frames. ], batch size: 59, lr: 2.63e-02, grad_scale: 64.0 2023-11-18 07:59:15,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=131826.66666666666, ans=0.0 2023-11-18 07:59:18,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=131826.66666666666, ans=0.015 2023-11-18 07:59:19,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=131826.66666666666, ans=0.1 2023-11-18 07:59:55,163 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.68 vs. limit=15.0 2023-11-18 07:59:58,785 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 1.007e+02 1.110e+02 1.214e+02 2.200e+02, threshold=2.220e+02, percent-clipped=0.0 2023-11-18 08:00:05,976 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.86 vs. limit=6.0 2023-11-18 08:00:09,682 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7800, loss[loss=0.1334, simple_loss=0.142, pruned_loss=0.05268, audio_tagging_loss=0.009746, over 15990.00 frames. ], tot_loss[loss=0.1273, simple_loss=0.1344, pruned_loss=0.04752, audio_tagging_loss=0.01261, over 3045141.70 frames. ], batch size: 60, lr: 2.62e-02, grad_scale: 64.0 2023-11-18 08:00:10,989 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:00:28,212 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.53 vs. limit=22.5 2023-11-18 08:00:32,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=132293.33333333334, ans=0.5 2023-11-18 08:01:07,209 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7850, loss[loss=0.08707, simple_loss=0.08327, pruned_loss=0.02548, audio_tagging_loss=0.01996, over 14545.00 frames. ], tot_loss[loss=0.1276, simple_loss=0.1344, pruned_loss=0.04766, audio_tagging_loss=0.01275, over 3045309.52 frames. ], batch size: 56, lr: 2.62e-02, grad_scale: 64.0 2023-11-18 08:01:09,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=132493.33333333334, ans=0.07 2023-11-18 08:01:10,024 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.41 vs. limit=15.0 2023-11-18 08:01:19,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=132560.0, ans=0.125 2023-11-18 08:01:21,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=132560.0, ans=0.125 2023-11-18 08:01:31,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=132626.66666666666, ans=0.2 2023-11-18 08:01:36,654 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.51 vs. limit=22.5 2023-11-18 08:01:47,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=132693.33333333334, ans=0.0 2023-11-18 08:01:53,780 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.431e+01 1.081e+02 1.200e+02 1.326e+02 3.280e+02, threshold=2.400e+02, percent-clipped=1.0 2023-11-18 08:01:59,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=132760.0, ans=0.2 2023-11-18 08:02:03,869 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7900, loss[loss=0.1033, simple_loss=0.1037, pruned_loss=0.03694, audio_tagging_loss=0.01453, over 16141.00 frames. ], tot_loss[loss=0.1283, simple_loss=0.1351, pruned_loss=0.04793, audio_tagging_loss=0.01285, over 3045234.59 frames. ], batch size: 62, lr: 2.62e-02, grad_scale: 32.0 2023-11-18 08:02:04,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=132826.66666666666, ans=0.125 2023-11-18 08:02:09,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=132826.66666666666, ans=0.125 2023-11-18 08:02:15,994 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:02:18,821 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.99 vs. limit=8.0 2023-11-18 08:02:31,297 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.45 vs. limit=10.0 2023-11-18 08:02:35,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=132960.0, ans=0.125 2023-11-18 08:02:43,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=133026.66666666666, ans=0.125 2023-11-18 08:02:59,853 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 7950, loss[loss=0.1561, simple_loss=0.1654, pruned_loss=0.06065, audio_tagging_loss=0.01278, over 17288.00 frames. ], tot_loss[loss=0.1282, simple_loss=0.1347, pruned_loss=0.04785, audio_tagging_loss=0.01307, over 3041845.99 frames. ], batch size: 62, lr: 2.62e-02, grad_scale: 32.0 2023-11-18 08:03:01,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=133160.0, ans=0.1 2023-11-18 08:03:02,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=133160.0, ans=0.04949747468305833 2023-11-18 08:03:10,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=133226.66666666666, ans=0.125 2023-11-18 08:03:12,350 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:03:18,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=133226.66666666666, ans=0.1 2023-11-18 08:03:48,631 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 1.027e+02 1.116e+02 1.320e+02 1.890e+02, threshold=2.232e+02, percent-clipped=0.0 2023-11-18 08:03:49,217 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2023-11-18 08:03:50,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=133426.66666666666, ans=0.125 2023-11-18 08:03:53,132 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.74 vs. limit=15.0 2023-11-18 08:03:58,891 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8000, loss[loss=0.1205, simple_loss=0.1281, pruned_loss=0.04369, audio_tagging_loss=0.01279, over 14848.00 frames. ], tot_loss[loss=0.127, simple_loss=0.1334, pruned_loss=0.04716, audio_tagging_loss=0.0131, over 3039518.28 frames. ], batch size: 54, lr: 2.61e-02, grad_scale: 32.0 2023-11-18 08:04:06,616 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.81 vs. limit=15.0 2023-11-18 08:04:08,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=133493.33333333334, ans=0.1 2023-11-18 08:04:14,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=133560.0, ans=0.125 2023-11-18 08:04:21,594 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.78 vs. limit=15.0 2023-11-18 08:04:22,697 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.16 vs. limit=10.0 2023-11-18 08:04:30,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=133626.66666666666, ans=0.1 2023-11-18 08:04:44,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=133760.0, ans=0.125 2023-11-18 08:04:56,043 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8050, loss[loss=0.1377, simple_loss=0.1416, pruned_loss=0.05548, audio_tagging_loss=0.01149, over 14750.00 frames. ], tot_loss[loss=0.127, simple_loss=0.133, pruned_loss=0.04728, audio_tagging_loss=0.01316, over 3042872.07 frames. ], batch size: 55, lr: 2.61e-02, grad_scale: 32.0 2023-11-18 08:05:05,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=133893.33333333334, ans=0.2 2023-11-18 08:05:14,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=133893.33333333334, ans=0.0 2023-11-18 08:05:29,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=134026.66666666666, ans=0.0 2023-11-18 08:05:42,354 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.010e+01 1.125e+02 1.300e+02 1.606e+02 2.209e+02, threshold=2.601e+02, percent-clipped=0.0 2023-11-18 08:05:46,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=134093.33333333334, ans=0.125 2023-11-18 08:05:52,005 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8100, loss[loss=0.159, simple_loss=0.1679, pruned_loss=0.06606, audio_tagging_loss=0.009029, over 16231.00 frames. ], tot_loss[loss=0.127, simple_loss=0.1336, pruned_loss=0.04722, audio_tagging_loss=0.01302, over 3041832.76 frames. ], batch size: 59, lr: 2.61e-02, grad_scale: 32.0 2023-11-18 08:05:56,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=134160.0, ans=0.125 2023-11-18 08:06:18,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=134293.33333333334, ans=0.0 2023-11-18 08:06:19,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=134293.33333333334, ans=0.125 2023-11-18 08:06:25,481 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=2.243e+00 2023-11-18 08:06:32,456 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.01 vs. limit=15.0 2023-11-18 08:06:37,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=134426.66666666666, ans=0.1 2023-11-18 08:06:38,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=134426.66666666666, ans=0.2 2023-11-18 08:06:48,379 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8150, loss[loss=0.1316, simple_loss=0.1376, pruned_loss=0.04891, audio_tagging_loss=0.01389, over 15221.00 frames. ], tot_loss[loss=0.1261, simple_loss=0.1328, pruned_loss=0.04689, audio_tagging_loss=0.0128, over 3037612.18 frames. ], batch size: 56, lr: 2.60e-02, grad_scale: 32.0 2023-11-18 08:06:55,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=134493.33333333334, ans=0.125 2023-11-18 08:07:05,750 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:07:13,808 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2023-11-18 08:07:20,629 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.60 vs. limit=6.0 2023-11-18 08:07:27,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=134693.33333333334, ans=0.0 2023-11-18 08:07:34,443 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.378e+01 1.047e+02 1.173e+02 1.336e+02 3.591e+02, threshold=2.346e+02, percent-clipped=1.0 2023-11-18 08:07:44,589 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:07:45,622 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8200, loss[loss=0.1035, simple_loss=0.1052, pruned_loss=0.03713, audio_tagging_loss=0.01377, over 15477.00 frames. ], tot_loss[loss=0.1264, simple_loss=0.1331, pruned_loss=0.04723, audio_tagging_loss=0.01264, over 3048359.52 frames. ], batch size: 60, lr: 2.60e-02, grad_scale: 32.0 2023-11-18 08:07:45,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=134826.66666666666, ans=0.125 2023-11-18 08:08:17,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=135026.66666666666, ans=0.125 2023-11-18 08:08:23,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=135026.66666666666, ans=0.0 2023-11-18 08:08:25,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=135026.66666666666, ans=0.1 2023-11-18 08:08:27,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=135026.66666666666, ans=0.0 2023-11-18 08:08:28,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=135026.66666666666, ans=0.125 2023-11-18 08:08:31,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=135093.33333333334, ans=0.125 2023-11-18 08:08:40,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=135160.0, ans=0.125 2023-11-18 08:08:41,611 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8250, loss[loss=0.1644, simple_loss=0.1655, pruned_loss=0.07007, audio_tagging_loss=0.01159, over 16428.00 frames. ], tot_loss[loss=0.1274, simple_loss=0.134, pruned_loss=0.04794, audio_tagging_loss=0.01244, over 3054915.37 frames. ], batch size: 59, lr: 2.60e-02, grad_scale: 32.0 2023-11-18 08:08:47,610 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2023-11-18 08:08:53,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=135226.66666666666, ans=0.125 2023-11-18 08:08:59,938 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.24 vs. limit=10.0 2023-11-18 08:09:07,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=135293.33333333334, ans=0.0 2023-11-18 08:09:27,856 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.216e+01 1.092e+02 1.238e+02 1.418e+02 2.138e+02, threshold=2.477e+02, percent-clipped=0.0 2023-11-18 08:09:28,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=135426.66666666666, ans=0.125 2023-11-18 08:09:29,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=135426.66666666666, ans=0.0 2023-11-18 08:09:38,159 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8300, loss[loss=0.1212, simple_loss=0.1196, pruned_loss=0.0432, audio_tagging_loss=0.01815, over 16038.00 frames. ], tot_loss[loss=0.1278, simple_loss=0.1346, pruned_loss=0.04802, audio_tagging_loss=0.01248, over 3055892.73 frames. ], batch size: 59, lr: 2.60e-02, grad_scale: 32.0 2023-11-18 08:09:51,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=135560.0, ans=0.125 2023-11-18 08:09:56,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=135560.0, ans=0.0 2023-11-18 08:10:17,383 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.87 vs. limit=22.5 2023-11-18 08:10:18,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=135693.33333333334, ans=0.125 2023-11-18 08:10:21,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=135693.33333333334, ans=0.125 2023-11-18 08:10:25,566 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:10:28,719 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.29 vs. limit=15.0 2023-11-18 08:10:34,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=135826.66666666666, ans=0.125 2023-11-18 08:10:35,051 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8350, loss[loss=0.1027, simple_loss=0.1128, pruned_loss=0.03464, audio_tagging_loss=0.0117, over 14860.00 frames. ], tot_loss[loss=0.1267, simple_loss=0.1334, pruned_loss=0.04752, audio_tagging_loss=0.01245, over 3052753.87 frames. ], batch size: 58, lr: 2.59e-02, grad_scale: 32.0 2023-11-18 08:10:49,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=135893.33333333334, ans=0.125 2023-11-18 08:10:54,473 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2023-11-18 08:11:21,587 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 1.030e+02 1.156e+02 1.318e+02 1.873e+02, threshold=2.311e+02, percent-clipped=0.0 2023-11-18 08:11:31,728 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8400, loss[loss=0.1048, simple_loss=0.1013, pruned_loss=0.034, audio_tagging_loss=0.02015, over 15445.00 frames. ], tot_loss[loss=0.1268, simple_loss=0.1336, pruned_loss=0.04762, audio_tagging_loss=0.01238, over 3046868.09 frames. ], batch size: 59, lr: 2.59e-02, grad_scale: 32.0 2023-11-18 08:11:47,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=136226.66666666666, ans=0.125 2023-11-18 08:11:49,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=136226.66666666666, ans=0.2 2023-11-18 08:12:22,932 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=25.84 vs. limit=22.5 2023-11-18 08:12:28,248 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8450, loss[loss=0.1347, simple_loss=0.1467, pruned_loss=0.05194, audio_tagging_loss=0.009439, over 15307.00 frames. ], tot_loss[loss=0.1266, simple_loss=0.1335, pruned_loss=0.04752, audio_tagging_loss=0.01238, over 3047175.06 frames. ], batch size: 59, lr: 2.59e-02, grad_scale: 32.0 2023-11-18 08:12:45,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=136560.0, ans=0.0 2023-11-18 08:12:57,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=136626.66666666666, ans=0.1 2023-11-18 08:13:01,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=136693.33333333334, ans=0.0 2023-11-18 08:13:09,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=136693.33333333334, ans=0.0 2023-11-18 08:13:11,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=136693.33333333334, ans=10.0 2023-11-18 08:13:12,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=136760.0, ans=0.2 2023-11-18 08:13:14,556 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.166e+01 1.028e+02 1.129e+02 1.256e+02 1.884e+02, threshold=2.258e+02, percent-clipped=0.0 2023-11-18 08:13:15,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=136760.0, ans=0.125 2023-11-18 08:13:25,408 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8500, loss[loss=0.1195, simple_loss=0.1371, pruned_loss=0.03839, audio_tagging_loss=0.01252, over 15380.00 frames. ], tot_loss[loss=0.1273, simple_loss=0.1344, pruned_loss=0.04776, audio_tagging_loss=0.01234, over 3056311.47 frames. ], batch size: 57, lr: 2.59e-02, grad_scale: 32.0 2023-11-18 08:13:35,912 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.82 vs. limit=8.0 2023-11-18 08:13:51,071 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.95 vs. limit=12.0 2023-11-18 08:14:21,092 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8550, loss[loss=0.09307, simple_loss=0.09762, pruned_loss=0.0291, audio_tagging_loss=0.01516, over 15163.00 frames. ], tot_loss[loss=0.1262, simple_loss=0.1331, pruned_loss=0.04713, audio_tagging_loss=0.01253, over 3057271.09 frames. ], batch size: 57, lr: 2.58e-02, grad_scale: 32.0 2023-11-18 08:14:27,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=137160.0, ans=0.125 2023-11-18 08:14:58,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=137360.0, ans=0.125 2023-11-18 08:15:07,825 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.950e+01 1.016e+02 1.105e+02 1.283e+02 1.880e+02, threshold=2.211e+02, percent-clipped=0.0 2023-11-18 08:15:11,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=137426.66666666666, ans=0.2 2023-11-18 08:15:18,204 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8600, loss[loss=0.1148, simple_loss=0.1196, pruned_loss=0.04095, audio_tagging_loss=0.01399, over 16015.00 frames. ], tot_loss[loss=0.1271, simple_loss=0.1342, pruned_loss=0.04749, audio_tagging_loss=0.01254, over 3056900.44 frames. ], batch size: 61, lr: 2.58e-02, grad_scale: 32.0 2023-11-18 08:15:21,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=137493.33333333334, ans=0.1 2023-11-18 08:15:21,730 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.538e+00 2023-11-18 08:15:33,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=137560.0, ans=0.125 2023-11-18 08:15:54,707 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.50 vs. limit=15.0 2023-11-18 08:15:55,964 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.93 vs. limit=15.0 2023-11-18 08:16:12,474 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:16:14,988 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8650, loss[loss=0.1487, simple_loss=0.151, pruned_loss=0.06354, audio_tagging_loss=0.009615, over 15992.00 frames. ], tot_loss[loss=0.128, simple_loss=0.1352, pruned_loss=0.04789, audio_tagging_loss=0.01254, over 3055929.18 frames. ], batch size: 57, lr: 2.58e-02, grad_scale: 32.0 2023-11-18 08:16:22,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=137826.66666666666, ans=0.125 2023-11-18 08:16:35,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=137893.33333333334, ans=0.95 2023-11-18 08:16:42,048 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2023-11-18 08:16:48,942 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.83 vs. limit=10.0 2023-11-18 08:16:50,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=138026.66666666666, ans=0.125 2023-11-18 08:17:01,320 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 1.026e+02 1.125e+02 1.305e+02 1.898e+02, threshold=2.250e+02, percent-clipped=0.0 2023-11-18 08:17:11,114 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8700, loss[loss=0.131, simple_loss=0.1397, pruned_loss=0.04928, audio_tagging_loss=0.01193, over 15794.00 frames. ], tot_loss[loss=0.1299, simple_loss=0.1371, pruned_loss=0.04875, audio_tagging_loss=0.01263, over 3062630.94 frames. ], batch size: 59, lr: 2.57e-02, grad_scale: 32.0 2023-11-18 08:17:28,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=138226.66666666666, ans=0.1 2023-11-18 08:18:05,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=138426.66666666666, ans=0.125 2023-11-18 08:18:07,689 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8750, loss[loss=0.1125, simple_loss=0.1121, pruned_loss=0.04123, audio_tagging_loss=0.0152, over 16193.00 frames. ], tot_loss[loss=0.1284, simple_loss=0.1353, pruned_loss=0.04806, audio_tagging_loss=0.0127, over 3060502.88 frames. ], batch size: 62, lr: 2.57e-02, grad_scale: 32.0 2023-11-18 08:18:09,925 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.84 vs. limit=15.0 2023-11-18 08:18:11,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=138493.33333333334, ans=0.125 2023-11-18 08:18:46,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=138693.33333333334, ans=0.125 2023-11-18 08:18:55,031 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.161e+01 1.039e+02 1.204e+02 1.359e+02 1.963e+02, threshold=2.408e+02, percent-clipped=0.0 2023-11-18 08:19:05,273 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8800, loss[loss=0.1317, simple_loss=0.1473, pruned_loss=0.04699, audio_tagging_loss=0.0111, over 15343.00 frames. ], tot_loss[loss=0.1275, simple_loss=0.1344, pruned_loss=0.04751, audio_tagging_loss=0.01281, over 3055958.93 frames. ], batch size: 55, lr: 2.57e-02, grad_scale: 32.0 2023-11-18 08:19:09,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=138826.66666666666, ans=0.0 2023-11-18 08:19:17,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=138893.33333333334, ans=0.0 2023-11-18 08:19:26,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=138960.0, ans=0.0 2023-11-18 08:19:36,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=138960.0, ans=0.125 2023-11-18 08:19:37,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=138960.0, ans=0.125 2023-11-18 08:20:01,573 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8850, loss[loss=0.1213, simple_loss=0.1346, pruned_loss=0.04447, audio_tagging_loss=0.009552, over 15717.00 frames. ], tot_loss[loss=0.1286, simple_loss=0.1356, pruned_loss=0.04809, audio_tagging_loss=0.01276, over 3058628.91 frames. ], batch size: 60, lr: 2.57e-02, grad_scale: 32.0 2023-11-18 08:20:03,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=139160.0, ans=0.0 2023-11-18 08:20:07,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=139160.0, ans=0.2 2023-11-18 08:20:09,060 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:20:45,154 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=15.0 2023-11-18 08:20:47,634 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.898e+01 1.044e+02 1.177e+02 1.340e+02 1.901e+02, threshold=2.354e+02, percent-clipped=0.0 2023-11-18 08:20:47,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=139426.66666666666, ans=0.125 2023-11-18 08:20:55,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=139426.66666666666, ans=0.0 2023-11-18 08:20:57,238 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8900, loss[loss=0.1265, simple_loss=0.1367, pruned_loss=0.0477, audio_tagging_loss=0.01047, over 15847.00 frames. ], tot_loss[loss=0.128, simple_loss=0.1353, pruned_loss=0.04783, audio_tagging_loss=0.01255, over 3052857.07 frames. ], batch size: 62, lr: 2.56e-02, grad_scale: 32.0 2023-11-18 08:20:58,915 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2023-11-18 08:21:28,366 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:21:33,999 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2023-11-18 08:21:47,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=139760.0, ans=0.1 2023-11-18 08:21:54,083 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 8950, loss[loss=0.1505, simple_loss=0.168, pruned_loss=0.0592, audio_tagging_loss=0.007345, over 15841.00 frames. ], tot_loss[loss=0.1283, simple_loss=0.136, pruned_loss=0.04801, audio_tagging_loss=0.01233, over 3053044.23 frames. ], batch size: 57, lr: 2.56e-02, grad_scale: 32.0 2023-11-18 08:21:55,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=139826.66666666666, ans=0.0 2023-11-18 08:22:08,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=139893.33333333334, ans=0.125 2023-11-18 08:22:28,892 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.54 vs. limit=22.5 2023-11-18 08:22:29,978 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2023-11-18 08:22:30,070 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.64 vs. limit=22.5 2023-11-18 08:22:33,486 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=140026.66666666666, ans=0.0 2023-11-18 08:22:38,056 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.22 vs. limit=15.0 2023-11-18 08:22:41,391 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.130e+01 1.003e+02 1.129e+02 1.259e+02 1.857e+02, threshold=2.258e+02, percent-clipped=0.0 2023-11-18 08:22:43,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=140093.33333333334, ans=0.125 2023-11-18 08:22:44,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140093.33333333334, ans=0.1 2023-11-18 08:22:51,051 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9000, loss[loss=0.09802, simple_loss=0.09824, pruned_loss=0.03399, audio_tagging_loss=0.01491, over 15127.00 frames. ], tot_loss[loss=0.1279, simple_loss=0.1355, pruned_loss=0.04792, audio_tagging_loss=0.01226, over 3048044.25 frames. ], batch size: 60, lr: 2.56e-02, grad_scale: 32.0 2023-11-18 08:22:51,052 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 08:23:26,302 INFO [train_asr.py:1147] (2/4) Epoch 2, validation: loss=0.08723, simple_loss=0.06802, pruned_loss=0.01417, audio_tagging_loss=0.03906, over 4681554.00 frames. 2023-11-18 08:23:26,302 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 08:23:31,214 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.58 vs. limit=6.0 2023-11-18 08:23:39,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=140226.66666666666, ans=0.0 2023-11-18 08:23:53,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=140293.33333333334, ans=0.125 2023-11-18 08:24:04,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=140360.0, ans=0.125 2023-11-18 08:24:22,189 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.42 vs. limit=15.0 2023-11-18 08:24:22,612 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9050, loss[loss=0.1186, simple_loss=0.1309, pruned_loss=0.04445, audio_tagging_loss=0.008715, over 16079.00 frames. ], tot_loss[loss=0.1273, simple_loss=0.1349, pruned_loss=0.04767, audio_tagging_loss=0.01223, over 3049086.68 frames. ], batch size: 60, lr: 2.56e-02, grad_scale: 32.0 2023-11-18 08:24:30,870 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.60 vs. limit=15.0 2023-11-18 08:24:32,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=140560.0, ans=0.125 2023-11-18 08:24:40,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=140560.0, ans=0.125 2023-11-18 08:24:41,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=140560.0, ans=0.2 2023-11-18 08:24:49,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=140626.66666666666, ans=0.125 2023-11-18 08:24:51,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=140626.66666666666, ans=0.125 2023-11-18 08:25:08,332 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 1.025e+02 1.134e+02 1.283e+02 1.776e+02, threshold=2.268e+02, percent-clipped=0.0 2023-11-18 08:25:18,126 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9100, loss[loss=0.1119, simple_loss=0.1215, pruned_loss=0.03726, audio_tagging_loss=0.01384, over 13705.00 frames. ], tot_loss[loss=0.1269, simple_loss=0.1345, pruned_loss=0.04752, audio_tagging_loss=0.01217, over 3048536.59 frames. ], batch size: 53, lr: 2.55e-02, grad_scale: 32.0 2023-11-18 08:25:39,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=140960.0, ans=0.125 2023-11-18 08:25:44,802 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:25:45,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=140960.0, ans=10.0 2023-11-18 08:26:03,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=141093.33333333334, ans=0.07 2023-11-18 08:26:15,064 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9150, loss[loss=0.1429, simple_loss=0.1545, pruned_loss=0.05158, audio_tagging_loss=0.01408, over 13985.00 frames. ], tot_loss[loss=0.1265, simple_loss=0.1342, pruned_loss=0.04716, audio_tagging_loss=0.01222, over 3049862.89 frames. ], batch size: 53, lr: 2.55e-02, grad_scale: 32.0 2023-11-18 08:26:18,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=141160.0, ans=0.125 2023-11-18 08:26:23,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=141160.0, ans=0.125 2023-11-18 08:26:25,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=141226.66666666666, ans=0.125 2023-11-18 08:26:29,714 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.83 vs. limit=22.5 2023-11-18 08:26:32,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=141226.66666666666, ans=0.0 2023-11-18 08:26:35,561 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.13 vs. limit=15.0 2023-11-18 08:26:54,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=141360.0, ans=0.125 2023-11-18 08:26:56,180 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=12.0 2023-11-18 08:27:01,447 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.257e+01 1.062e+02 1.145e+02 1.276e+02 2.030e+02, threshold=2.290e+02, percent-clipped=0.0 2023-11-18 08:27:12,357 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9200, loss[loss=0.08666, simple_loss=0.08222, pruned_loss=0.03124, audio_tagging_loss=0.01431, over 13180.00 frames. ], tot_loss[loss=0.1254, simple_loss=0.1327, pruned_loss=0.04675, audio_tagging_loss=0.0123, over 3044567.96 frames. ], batch size: 53, lr: 2.55e-02, grad_scale: 32.0 2023-11-18 08:28:08,782 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9250, loss[loss=0.1237, simple_loss=0.1253, pruned_loss=0.04404, audio_tagging_loss=0.01703, over 14424.00 frames. ], tot_loss[loss=0.1258, simple_loss=0.1329, pruned_loss=0.04694, audio_tagging_loss=0.01239, over 3051315.43 frames. ], batch size: 56, lr: 2.54e-02, grad_scale: 32.0 2023-11-18 08:28:21,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=141893.33333333334, ans=0.125 2023-11-18 08:28:22,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=141893.33333333334, ans=0.125 2023-11-18 08:28:35,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=141960.0, ans=0.125 2023-11-18 08:28:45,961 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2023-11-18 08:28:54,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=142093.33333333334, ans=0.2 2023-11-18 08:28:54,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=142093.33333333334, ans=0.125 2023-11-18 08:28:55,019 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.910e+01 1.033e+02 1.140e+02 1.302e+02 2.365e+02, threshold=2.281e+02, percent-clipped=1.0 2023-11-18 08:29:04,824 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9300, loss[loss=0.1153, simple_loss=0.1189, pruned_loss=0.04228, audio_tagging_loss=0.01352, over 14673.00 frames. ], tot_loss[loss=0.1258, simple_loss=0.133, pruned_loss=0.04681, audio_tagging_loss=0.0125, over 3047410.74 frames. ], batch size: 56, lr: 2.54e-02, grad_scale: 32.0 2023-11-18 08:29:06,691 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.50 vs. limit=12.0 2023-11-18 08:29:28,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=142293.33333333334, ans=10.0 2023-11-18 08:29:33,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=142293.33333333334, ans=0.05 2023-11-18 08:29:34,536 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.01 vs. limit=15.0 2023-11-18 08:29:35,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=142293.33333333334, ans=0.125 2023-11-18 08:29:43,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=142360.0, ans=0.0 2023-11-18 08:29:58,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=142426.66666666666, ans=0.0 2023-11-18 08:30:01,745 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9350, loss[loss=0.1049, simple_loss=0.1136, pruned_loss=0.03507, audio_tagging_loss=0.01303, over 14742.00 frames. ], tot_loss[loss=0.1265, simple_loss=0.1339, pruned_loss=0.0471, audio_tagging_loss=0.01247, over 3050191.36 frames. ], batch size: 56, lr: 2.54e-02, grad_scale: 32.0 2023-11-18 08:30:16,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=142560.0, ans=0.0 2023-11-18 08:30:24,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=142626.66666666666, ans=0.125 2023-11-18 08:30:25,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=142626.66666666666, ans=0.125 2023-11-18 08:30:29,267 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.73 vs. limit=15.0 2023-11-18 08:30:34,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=142693.33333333334, ans=0.2 2023-11-18 08:30:48,961 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.103e+01 1.054e+02 1.142e+02 1.283e+02 1.990e+02, threshold=2.284e+02, percent-clipped=0.0 2023-11-18 08:30:50,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=142760.0, ans=0.125 2023-11-18 08:30:59,171 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9400, loss[loss=0.1492, simple_loss=0.1549, pruned_loss=0.05963, audio_tagging_loss=0.01212, over 15608.00 frames. ], tot_loss[loss=0.1259, simple_loss=0.1327, pruned_loss=0.0469, audio_tagging_loss=0.01265, over 3053195.01 frames. ], batch size: 58, lr: 2.54e-02, grad_scale: 32.0 2023-11-18 08:31:45,946 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.21 vs. limit=15.0 2023-11-18 08:31:48,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=143093.33333333334, ans=0.1 2023-11-18 08:31:50,659 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:31:54,926 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9450, loss[loss=0.1475, simple_loss=0.1621, pruned_loss=0.05546, audio_tagging_loss=0.01102, over 15347.00 frames. ], tot_loss[loss=0.126, simple_loss=0.133, pruned_loss=0.04678, audio_tagging_loss=0.01266, over 3051649.46 frames. ], batch size: 56, lr: 2.53e-02, grad_scale: 32.0 2023-11-18 08:32:05,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=143226.66666666666, ans=0.0 2023-11-18 08:32:11,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=143226.66666666666, ans=0.125 2023-11-18 08:32:20,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=143293.33333333334, ans=0.125 2023-11-18 08:32:28,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=143360.0, ans=0.0 2023-11-18 08:32:33,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=143360.0, ans=0.125 2023-11-18 08:32:38,747 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=16.92 vs. limit=15.0 2023-11-18 08:32:41,716 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.457e+01 1.024e+02 1.132e+02 1.318e+02 2.507e+02, threshold=2.264e+02, percent-clipped=1.0 2023-11-18 08:32:43,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=143426.66666666666, ans=0.125 2023-11-18 08:32:51,325 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9500, loss[loss=0.1088, simple_loss=0.1047, pruned_loss=0.0381, audio_tagging_loss=0.01835, over 14952.00 frames. ], tot_loss[loss=0.1267, simple_loss=0.1333, pruned_loss=0.04716, audio_tagging_loss=0.01287, over 3053470.65 frames. ], batch size: 57, lr: 2.53e-02, grad_scale: 32.0 2023-11-18 08:32:53,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=143493.33333333334, ans=0.2 2023-11-18 08:33:00,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=143493.33333333334, ans=0.125 2023-11-18 08:33:13,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=143626.66666666666, ans=0.125 2023-11-18 08:33:15,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=143626.66666666666, ans=0.125 2023-11-18 08:33:19,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=143626.66666666666, ans=0.1 2023-11-18 08:33:21,387 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2023-11-18 08:33:33,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=143693.33333333334, ans=0.07 2023-11-18 08:33:48,294 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9550, loss[loss=0.1113, simple_loss=0.1186, pruned_loss=0.03822, audio_tagging_loss=0.01378, over 15723.00 frames. ], tot_loss[loss=0.1265, simple_loss=0.1331, pruned_loss=0.04693, audio_tagging_loss=0.01302, over 3048663.15 frames. ], batch size: 57, lr: 2.53e-02, grad_scale: 32.0 2023-11-18 08:33:48,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=143826.66666666666, ans=0.1 2023-11-18 08:33:48,743 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.32 vs. limit=22.5 2023-11-18 08:33:49,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=143826.66666666666, ans=0.2 2023-11-18 08:33:56,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=143826.66666666666, ans=0.0 2023-11-18 08:34:34,609 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 9.724e+01 1.130e+02 1.324e+02 2.108e+02, threshold=2.261e+02, percent-clipped=0.0 2023-11-18 08:34:44,682 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9600, loss[loss=0.1157, simple_loss=0.1237, pruned_loss=0.04331, audio_tagging_loss=0.01059, over 15357.00 frames. ], tot_loss[loss=0.1272, simple_loss=0.1342, pruned_loss=0.0472, audio_tagging_loss=0.01285, over 3049845.88 frames. ], batch size: 62, lr: 2.53e-02, grad_scale: 32.0 2023-11-18 08:34:55,117 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:34:56,256 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:34:58,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=144226.66666666666, ans=0.0 2023-11-18 08:35:38,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2023-11-18 08:35:41,219 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9650, loss[loss=0.1041, simple_loss=0.1036, pruned_loss=0.0344, audio_tagging_loss=0.01794, over 15291.00 frames. ], tot_loss[loss=0.1277, simple_loss=0.1356, pruned_loss=0.0473, audio_tagging_loss=0.0126, over 3053284.85 frames. ], batch size: 59, lr: 2.52e-02, grad_scale: 32.0 2023-11-18 08:35:50,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=144493.33333333334, ans=0.125 2023-11-18 08:35:53,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=144560.0, ans=0.125 2023-11-18 08:35:54,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=144560.0, ans=0.0 2023-11-18 08:35:55,767 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=15.0 2023-11-18 08:35:58,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=144560.0, ans=0.125 2023-11-18 08:36:22,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=144693.33333333334, ans=0.125 2023-11-18 08:36:26,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=144760.0, ans=0.125 2023-11-18 08:36:27,677 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.399e+01 1.034e+02 1.159e+02 1.347e+02 1.813e+02, threshold=2.318e+02, percent-clipped=0.0 2023-11-18 08:36:37,760 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.64 vs. limit=15.0 2023-11-18 08:36:38,033 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9700, loss[loss=0.1537, simple_loss=0.1643, pruned_loss=0.05966, audio_tagging_loss=0.01188, over 16809.00 frames. ], tot_loss[loss=0.1279, simple_loss=0.1359, pruned_loss=0.04743, audio_tagging_loss=0.01251, over 3055245.52 frames. ], batch size: 62, lr: 2.52e-02, grad_scale: 32.0 2023-11-18 08:37:06,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=144960.0, ans=0.2 2023-11-18 08:37:13,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=145026.66666666666, ans=0.0 2023-11-18 08:37:28,207 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.41 vs. limit=22.5 2023-11-18 08:37:32,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=145093.33333333334, ans=0.0 2023-11-18 08:37:34,005 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9750, loss[loss=0.1778, simple_loss=0.192, pruned_loss=0.07291, audio_tagging_loss=0.008909, over 15272.00 frames. ], tot_loss[loss=0.1271, simple_loss=0.135, pruned_loss=0.04713, audio_tagging_loss=0.01244, over 3052334.65 frames. ], batch size: 54, lr: 2.52e-02, grad_scale: 32.0 2023-11-18 08:38:15,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=145360.0, ans=0.0 2023-11-18 08:38:17,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=145360.0, ans=0.125 2023-11-18 08:38:21,154 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 1.005e+02 1.144e+02 1.318e+02 1.775e+02, threshold=2.288e+02, percent-clipped=0.0 2023-11-18 08:38:31,529 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9800, loss[loss=0.1153, simple_loss=0.1196, pruned_loss=0.04487, audio_tagging_loss=0.01063, over 14547.00 frames. ], tot_loss[loss=0.127, simple_loss=0.1349, pruned_loss=0.04717, audio_tagging_loss=0.01235, over 3045346.08 frames. ], batch size: 55, lr: 2.52e-02, grad_scale: 32.0 2023-11-18 08:38:39,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=145493.33333333334, ans=0.09899494936611666 2023-11-18 08:38:51,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=145560.0, ans=0.2 2023-11-18 08:39:00,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=145626.66666666666, ans=0.2 2023-11-18 08:39:05,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=145693.33333333334, ans=0.125 2023-11-18 08:39:19,448 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:39:28,530 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9850, loss[loss=0.1421, simple_loss=0.158, pruned_loss=0.05403, audio_tagging_loss=0.009135, over 16141.00 frames. ], tot_loss[loss=0.1268, simple_loss=0.1348, pruned_loss=0.04714, audio_tagging_loss=0.01229, over 3044440.26 frames. ], batch size: 59, lr: 2.51e-02, grad_scale: 32.0 2023-11-18 08:39:38,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=145893.33333333334, ans=0.125 2023-11-18 08:39:50,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145960.0, ans=0.1 2023-11-18 08:40:08,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=146026.66666666666, ans=0.125 2023-11-18 08:40:12,099 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.66 vs. limit=22.5 2023-11-18 08:40:14,766 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 1.021e+02 1.122e+02 1.308e+02 2.084e+02, threshold=2.244e+02, percent-clipped=0.0 2023-11-18 08:40:24,465 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9900, loss[loss=0.1239, simple_loss=0.1374, pruned_loss=0.04296, audio_tagging_loss=0.01227, over 14476.00 frames. ], tot_loss[loss=0.1271, simple_loss=0.1353, pruned_loss=0.04715, audio_tagging_loss=0.01224, over 3036121.82 frames. ], batch size: 52, lr: 2.51e-02, grad_scale: 64.0 2023-11-18 08:40:32,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=146160.0, ans=0.125 2023-11-18 08:40:45,477 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.98 vs. limit=22.5 2023-11-18 08:41:01,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=146360.0, ans=0.125 2023-11-18 08:41:20,911 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 9950, loss[loss=0.1518, simple_loss=0.163, pruned_loss=0.05554, audio_tagging_loss=0.01478, over 16129.00 frames. ], tot_loss[loss=0.1281, simple_loss=0.1363, pruned_loss=0.04766, audio_tagging_loss=0.01227, over 3041924.36 frames. ], batch size: 61, lr: 2.51e-02, grad_scale: 64.0 2023-11-18 08:41:24,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=146493.33333333334, ans=0.125 2023-11-18 08:41:47,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=146626.66666666666, ans=0.2 2023-11-18 08:41:50,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=146626.66666666666, ans=0.125 2023-11-18 08:41:59,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=146693.33333333334, ans=0.2 2023-11-18 08:42:05,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=146760.0, ans=0.125 2023-11-18 08:42:07,456 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.831e+01 1.033e+02 1.176e+02 1.296e+02 1.958e+02, threshold=2.352e+02, percent-clipped=0.0 2023-11-18 08:42:16,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=146760.0, ans=0.125 2023-11-18 08:42:18,265 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10000, loss[loss=0.1121, simple_loss=0.1288, pruned_loss=0.03705, audio_tagging_loss=0.01067, over 14817.00 frames. ], tot_loss[loss=0.1271, simple_loss=0.1352, pruned_loss=0.04717, audio_tagging_loss=0.01228, over 3047901.91 frames. ], batch size: 58, lr: 2.51e-02, grad_scale: 64.0 2023-11-18 08:42:20,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=146826.66666666666, ans=0.2 2023-11-18 08:42:20,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=146826.66666666666, ans=0.125 2023-11-18 08:42:45,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=146960.0, ans=0.05 2023-11-18 08:42:47,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=146960.0, ans=0.1 2023-11-18 08:42:49,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=146960.0, ans=0.0 2023-11-18 08:43:00,882 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2023-11-18 08:43:06,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=147093.33333333334, ans=0.125 2023-11-18 08:43:12,673 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.90 vs. limit=6.0 2023-11-18 08:43:14,429 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10050, loss[loss=0.1057, simple_loss=0.09697, pruned_loss=0.04247, audio_tagging_loss=0.01476, over 15383.00 frames. ], tot_loss[loss=0.1267, simple_loss=0.1349, pruned_loss=0.047, audio_tagging_loss=0.01225, over 3051802.55 frames. ], batch size: 63, lr: 2.50e-02, grad_scale: 64.0 2023-11-18 08:43:17,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=147160.0, ans=0.95 2023-11-18 08:43:21,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=147160.0, ans=0.2 2023-11-18 08:43:23,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=147160.0, ans=0.0 2023-11-18 08:43:27,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=147226.66666666666, ans=0.125 2023-11-18 08:43:31,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=147226.66666666666, ans=0.125 2023-11-18 08:43:38,144 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.66 vs. limit=22.5 2023-11-18 08:43:56,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=147360.0, ans=0.09899494936611666 2023-11-18 08:43:57,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=147360.0, ans=0.125 2023-11-18 08:44:01,689 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.399e+01 9.828e+01 1.108e+02 1.232e+02 2.122e+02, threshold=2.217e+02, percent-clipped=0.0 2023-11-18 08:44:04,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=147426.66666666666, ans=0.0 2023-11-18 08:44:10,753 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10100, loss[loss=0.1126, simple_loss=0.1249, pruned_loss=0.03976, audio_tagging_loss=0.01039, over 14308.00 frames. ], tot_loss[loss=0.1274, simple_loss=0.1357, pruned_loss=0.04716, audio_tagging_loss=0.01239, over 3052076.73 frames. ], batch size: 54, lr: 2.50e-02, grad_scale: 32.0 2023-11-18 08:44:26,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=147560.0, ans=0.0 2023-11-18 08:44:29,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=147560.0, ans=0.07 2023-11-18 08:44:33,388 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.44 vs. limit=15.0 2023-11-18 08:44:43,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=147693.33333333334, ans=0.125 2023-11-18 08:44:52,702 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:45:08,345 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10150, loss[loss=0.09675, simple_loss=0.09743, pruned_loss=0.03185, audio_tagging_loss=0.01618, over 14079.00 frames. ], tot_loss[loss=0.1263, simple_loss=0.134, pruned_loss=0.04664, audio_tagging_loss=0.0127, over 3052551.64 frames. ], batch size: 56, lr: 2.50e-02, grad_scale: 32.0 2023-11-18 08:45:12,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=147826.66666666666, ans=0.125 2023-11-18 08:45:16,535 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.81 vs. limit=15.0 2023-11-18 08:45:18,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=147893.33333333334, ans=0.07 2023-11-18 08:45:22,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=147893.33333333334, ans=0.1 2023-11-18 08:45:23,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=147893.33333333334, ans=0.0 2023-11-18 08:45:24,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=147893.33333333334, ans=0.2 2023-11-18 08:45:26,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=147893.33333333334, ans=0.125 2023-11-18 08:45:30,104 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:45:45,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=148026.66666666666, ans=0.5 2023-11-18 08:45:55,677 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.369e+01 1.049e+02 1.137e+02 1.279e+02 1.864e+02, threshold=2.275e+02, percent-clipped=0.0 2023-11-18 08:45:57,264 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.03 vs. limit=15.0 2023-11-18 08:46:04,223 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10200, loss[loss=0.1202, simple_loss=0.1251, pruned_loss=0.04083, audio_tagging_loss=0.01686, over 14868.00 frames. ], tot_loss[loss=0.126, simple_loss=0.1336, pruned_loss=0.04645, audio_tagging_loss=0.01282, over 3050193.93 frames. ], batch size: 57, lr: 2.50e-02, grad_scale: 32.0 2023-11-18 08:46:08,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=148160.0, ans=0.125 2023-11-18 08:46:20,815 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:46:34,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=148293.33333333334, ans=0.0 2023-11-18 08:46:36,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=148293.33333333334, ans=0.0 2023-11-18 08:46:41,331 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.64 vs. limit=15.0 2023-11-18 08:46:42,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=148360.0, ans=0.125 2023-11-18 08:46:55,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=148426.66666666666, ans=0.2 2023-11-18 08:47:00,839 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10250, loss[loss=0.1215, simple_loss=0.1222, pruned_loss=0.04619, audio_tagging_loss=0.01426, over 15332.00 frames. ], tot_loss[loss=0.1256, simple_loss=0.1331, pruned_loss=0.04622, audio_tagging_loss=0.01288, over 3048862.04 frames. ], batch size: 59, lr: 2.49e-02, grad_scale: 32.0 2023-11-18 08:47:02,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2023-11-18 08:47:03,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=148493.33333333334, ans=0.125 2023-11-18 08:47:03,484 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.82 vs. limit=10.0 2023-11-18 08:47:22,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=148560.0, ans=0.125 2023-11-18 08:47:27,734 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.67 vs. limit=12.0 2023-11-18 08:47:44,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=148693.33333333334, ans=0.0 2023-11-18 08:47:48,254 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 1.009e+02 1.120e+02 1.271e+02 1.895e+02, threshold=2.240e+02, percent-clipped=0.0 2023-11-18 08:47:53,791 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.51 vs. limit=10.0 2023-11-18 08:47:56,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=148826.66666666666, ans=0.125 2023-11-18 08:47:58,051 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10300, loss[loss=0.1104, simple_loss=0.1073, pruned_loss=0.03873, audio_tagging_loss=0.01809, over 13553.00 frames. ], tot_loss[loss=0.1259, simple_loss=0.1334, pruned_loss=0.04634, audio_tagging_loss=0.01284, over 3053385.40 frames. ], batch size: 53, lr: 2.49e-02, grad_scale: 32.0 2023-11-18 08:48:15,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=148893.33333333334, ans=0.2 2023-11-18 08:48:24,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=148960.0, ans=0.0 2023-11-18 08:48:33,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=149026.66666666666, ans=0.125 2023-11-18 08:48:37,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=149026.66666666666, ans=0.0 2023-11-18 08:48:54,305 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10350, loss[loss=0.1136, simple_loss=0.1284, pruned_loss=0.03586, audio_tagging_loss=0.01352, over 16649.00 frames. ], tot_loss[loss=0.1256, simple_loss=0.1327, pruned_loss=0.04623, audio_tagging_loss=0.01303, over 3053124.65 frames. ], batch size: 63, lr: 2.49e-02, grad_scale: 32.0 2023-11-18 08:49:05,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=149226.66666666666, ans=0.0 2023-11-18 08:49:11,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=149226.66666666666, ans=0.125 2023-11-18 08:49:11,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=149226.66666666666, ans=0.0 2023-11-18 08:49:13,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=149226.66666666666, ans=0.2 2023-11-18 08:49:17,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=149293.33333333334, ans=0.125 2023-11-18 08:49:26,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=149293.33333333334, ans=0.125 2023-11-18 08:49:28,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=149360.0, ans=0.125 2023-11-18 08:49:34,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=149360.0, ans=0.125 2023-11-18 08:49:41,443 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.630e+01 9.879e+01 1.106e+02 1.235e+02 1.803e+02, threshold=2.212e+02, percent-clipped=0.0 2023-11-18 08:49:45,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=149426.66666666666, ans=0.125 2023-11-18 08:49:50,046 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10400, loss[loss=0.115, simple_loss=0.1172, pruned_loss=0.04495, audio_tagging_loss=0.01146, over 15039.00 frames. ], tot_loss[loss=0.1252, simple_loss=0.1321, pruned_loss=0.04606, audio_tagging_loss=0.0131, over 3045859.10 frames. ], batch size: 55, lr: 2.49e-02, grad_scale: 32.0 2023-11-18 08:49:50,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=149493.33333333334, ans=0.125 2023-11-18 08:49:57,856 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.59 vs. limit=15.0 2023-11-18 08:50:02,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=149560.0, ans=0.125 2023-11-18 08:50:07,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=149560.0, ans=10.0 2023-11-18 08:50:16,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=149626.66666666666, ans=0.125 2023-11-18 08:50:26,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=149693.33333333334, ans=0.1 2023-11-18 08:50:47,490 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10450, loss[loss=0.1128, simple_loss=0.1151, pruned_loss=0.04076, audio_tagging_loss=0.01444, over 15135.00 frames. ], tot_loss[loss=0.1239, simple_loss=0.1309, pruned_loss=0.04549, audio_tagging_loss=0.01299, over 3040504.25 frames. ], batch size: 57, lr: 2.48e-02, grad_scale: 32.0 2023-11-18 08:50:52,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=149826.66666666666, ans=0.125 2023-11-18 08:51:00,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=149893.33333333334, ans=0.025 2023-11-18 08:51:28,930 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.15 vs. limit=22.5 2023-11-18 08:51:31,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=150093.33333333334, ans=0.125 2023-11-18 08:51:33,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=150093.33333333334, ans=0.0 2023-11-18 08:51:35,242 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.869e+01 9.874e+01 1.064e+02 1.233e+02 1.785e+02, threshold=2.128e+02, percent-clipped=0.0 2023-11-18 08:51:42,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=150093.33333333334, ans=0.0 2023-11-18 08:51:44,328 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10500, loss[loss=0.1035, simple_loss=0.1136, pruned_loss=0.03621, audio_tagging_loss=0.01051, over 15217.00 frames. ], tot_loss[loss=0.1236, simple_loss=0.1309, pruned_loss=0.04541, audio_tagging_loss=0.01272, over 3035064.47 frames. ], batch size: 56, lr: 2.48e-02, grad_scale: 32.0 2023-11-18 08:51:56,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=150226.66666666666, ans=0.0 2023-11-18 08:51:58,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=150226.66666666666, ans=0.0 2023-11-18 08:52:26,543 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.93 vs. limit=15.0 2023-11-18 08:52:39,976 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10550, loss[loss=0.1454, simple_loss=0.1508, pruned_loss=0.05606, audio_tagging_loss=0.01394, over 16107.00 frames. ], tot_loss[loss=0.1239, simple_loss=0.1312, pruned_loss=0.04567, audio_tagging_loss=0.0126, over 3035397.98 frames. ], batch size: 58, lr: 2.48e-02, grad_scale: 32.0 2023-11-18 08:52:51,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=150560.0, ans=0.125 2023-11-18 08:52:52,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=150560.0, ans=0.125 2023-11-18 08:53:14,448 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.79 vs. limit=15.0 2023-11-18 08:53:27,566 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 9.723e+01 1.093e+02 1.257e+02 1.576e+02, threshold=2.186e+02, percent-clipped=0.0 2023-11-18 08:53:28,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=150760.0, ans=0.0 2023-11-18 08:53:30,712 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.56 vs. limit=15.0 2023-11-18 08:53:32,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=150760.0, ans=0.125 2023-11-18 08:53:37,313 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10600, loss[loss=0.08416, simple_loss=0.0846, pruned_loss=0.02883, audio_tagging_loss=0.01303, over 14284.00 frames. ], tot_loss[loss=0.1223, simple_loss=0.1296, pruned_loss=0.0449, audio_tagging_loss=0.01259, over 3034139.39 frames. ], batch size: 56, lr: 2.48e-02, grad_scale: 32.0 2023-11-18 08:54:15,898 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:54:17,156 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.19 vs. limit=15.0 2023-11-18 08:54:19,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=151026.66666666666, ans=0.125 2023-11-18 08:54:33,699 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10650, loss[loss=0.1272, simple_loss=0.1394, pruned_loss=0.0441, audio_tagging_loss=0.01342, over 15080.00 frames. ], tot_loss[loss=0.1229, simple_loss=0.1304, pruned_loss=0.04521, audio_tagging_loss=0.01251, over 3042528.71 frames. ], batch size: 56, lr: 2.47e-02, grad_scale: 32.0 2023-11-18 08:54:44,761 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.37 vs. limit=22.5 2023-11-18 08:54:48,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=151226.66666666666, ans=0.0 2023-11-18 08:54:51,329 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.02 vs. limit=15.0 2023-11-18 08:54:56,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=151293.33333333334, ans=0.125 2023-11-18 08:55:21,757 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.938e+01 1.018e+02 1.108e+02 1.279e+02 1.939e+02, threshold=2.217e+02, percent-clipped=0.0 2023-11-18 08:55:30,334 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10700, loss[loss=0.1287, simple_loss=0.1481, pruned_loss=0.04451, audio_tagging_loss=0.01011, over 14528.00 frames. ], tot_loss[loss=0.1228, simple_loss=0.1303, pruned_loss=0.04519, audio_tagging_loss=0.01246, over 3041861.32 frames. ], batch size: 55, lr: 2.47e-02, grad_scale: 32.0 2023-11-18 08:55:38,864 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.28 vs. limit=15.0 2023-11-18 08:55:42,214 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.84 vs. limit=6.0 2023-11-18 08:55:45,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=151560.0, ans=0.2 2023-11-18 08:55:50,192 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.78 vs. limit=22.5 2023-11-18 08:56:01,614 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.32 vs. limit=6.0 2023-11-18 08:56:26,873 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10750, loss[loss=0.1677, simple_loss=0.1788, pruned_loss=0.0639, audio_tagging_loss=0.01445, over 16351.00 frames. ], tot_loss[loss=0.1228, simple_loss=0.1302, pruned_loss=0.04512, audio_tagging_loss=0.01258, over 3033242.42 frames. ], batch size: 58, lr: 2.47e-02, grad_scale: 32.0 2023-11-18 08:56:40,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=151893.33333333334, ans=0.1 2023-11-18 08:56:57,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=151960.0, ans=0.125 2023-11-18 08:57:06,424 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.24 vs. limit=22.5 2023-11-18 08:57:08,488 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.94 vs. limit=15.0 2023-11-18 08:57:10,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=152026.66666666666, ans=0.0 2023-11-18 08:57:13,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=152093.33333333334, ans=0.2 2023-11-18 08:57:14,951 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.096e+01 9.809e+01 1.098e+02 1.227e+02 2.197e+02, threshold=2.197e+02, percent-clipped=0.0 2023-11-18 08:57:17,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=152093.33333333334, ans=0.2 2023-11-18 08:57:23,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=152160.0, ans=0.125 2023-11-18 08:57:24,149 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10800, loss[loss=0.1464, simple_loss=0.1561, pruned_loss=0.05896, audio_tagging_loss=0.009382, over 15575.00 frames. ], tot_loss[loss=0.1227, simple_loss=0.1305, pruned_loss=0.04508, audio_tagging_loss=0.01239, over 3038962.86 frames. ], batch size: 57, lr: 2.47e-02, grad_scale: 32.0 2023-11-18 08:57:30,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=152160.0, ans=0.2 2023-11-18 08:58:13,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=152426.66666666666, ans=0.05 2023-11-18 08:58:21,122 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10850, loss[loss=0.1059, simple_loss=0.1191, pruned_loss=0.03602, audio_tagging_loss=0.01038, over 16334.00 frames. ], tot_loss[loss=0.1224, simple_loss=0.1295, pruned_loss=0.04515, audio_tagging_loss=0.01253, over 3042575.26 frames. ], batch size: 61, lr: 2.46e-02, grad_scale: 32.0 2023-11-18 08:58:34,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=152560.0, ans=0.125 2023-11-18 08:58:53,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=152626.66666666666, ans=0.125 2023-11-18 08:58:58,680 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=15.0 2023-11-18 08:59:08,398 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 1.080e+02 1.224e+02 1.410e+02 3.165e+02, threshold=2.449e+02, percent-clipped=2.0 2023-11-18 08:59:09,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=152760.0, ans=0.2 2023-11-18 08:59:10,633 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:59:17,585 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10900, loss[loss=0.1587, simple_loss=0.1667, pruned_loss=0.06101, audio_tagging_loss=0.01437, over 14897.00 frames. ], tot_loss[loss=0.1224, simple_loss=0.1301, pruned_loss=0.04493, audio_tagging_loss=0.01245, over 3043906.59 frames. ], batch size: 55, lr: 2.46e-02, grad_scale: 32.0 2023-11-18 08:59:33,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=152893.33333333334, ans=0.0 2023-11-18 08:59:41,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=152960.0, ans=0.05 2023-11-18 08:59:51,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=153026.66666666666, ans=0.125 2023-11-18 08:59:51,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=153026.66666666666, ans=0.125 2023-11-18 08:59:56,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=153026.66666666666, ans=0.125 2023-11-18 08:59:58,813 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=15.0 2023-11-18 09:00:00,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=153026.66666666666, ans=0.125 2023-11-18 09:00:12,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=153093.33333333334, ans=0.0 2023-11-18 09:00:14,334 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 10950, loss[loss=0.09957, simple_loss=0.09441, pruned_loss=0.03244, audio_tagging_loss=0.01993, over 14519.00 frames. ], tot_loss[loss=0.1225, simple_loss=0.1301, pruned_loss=0.04479, audio_tagging_loss=0.01265, over 3033694.96 frames. ], batch size: 55, lr: 2.46e-02, grad_scale: 32.0 2023-11-18 09:00:14,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=153160.0, ans=0.125 2023-11-18 09:00:35,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=153293.33333333334, ans=0.0 2023-11-18 09:00:49,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=153360.0, ans=0.125 2023-11-18 09:01:01,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=153426.66666666666, ans=0.1 2023-11-18 09:01:02,024 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.175e+01 9.744e+01 1.111e+02 1.253e+02 1.675e+02, threshold=2.223e+02, percent-clipped=0.0 2023-11-18 09:01:10,790 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11000, loss[loss=0.09808, simple_loss=0.1002, pruned_loss=0.03531, audio_tagging_loss=0.01265, over 15448.00 frames. ], tot_loss[loss=0.1227, simple_loss=0.1307, pruned_loss=0.04477, audio_tagging_loss=0.01258, over 3030918.01 frames. ], batch size: 58, lr: 2.46e-02, grad_scale: 32.0 2023-11-18 09:01:12,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=153493.33333333334, ans=0.0 2023-11-18 09:01:17,747 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:01:26,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=153560.0, ans=0.05 2023-11-18 09:01:38,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=153626.66666666666, ans=0.125 2023-11-18 09:01:44,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=153693.33333333334, ans=0.125 2023-11-18 09:01:55,286 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2023-11-18 09:02:02,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=153760.0, ans=0.2 2023-11-18 09:02:07,788 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11050, loss[loss=0.158, simple_loss=0.1696, pruned_loss=0.05984, audio_tagging_loss=0.0134, over 14851.00 frames. ], tot_loss[loss=0.1247, simple_loss=0.1328, pruned_loss=0.04581, audio_tagging_loss=0.01254, over 3041910.14 frames. ], batch size: 55, lr: 2.45e-02, grad_scale: 32.0 2023-11-18 09:02:08,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=153826.66666666666, ans=0.125 2023-11-18 09:02:16,382 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.04 vs. limit=22.5 2023-11-18 09:02:18,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=153893.33333333334, ans=0.125 2023-11-18 09:02:25,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=153893.33333333334, ans=0.0 2023-11-18 09:02:41,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=154026.66666666666, ans=0.2 2023-11-18 09:02:49,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=154026.66666666666, ans=0.0 2023-11-18 09:02:51,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=154093.33333333334, ans=0.125 2023-11-18 09:02:55,027 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.060e+01 9.808e+01 1.104e+02 1.219e+02 2.392e+02, threshold=2.208e+02, percent-clipped=1.0 2023-11-18 09:03:02,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=154093.33333333334, ans=0.125 2023-11-18 09:03:02,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=154093.33333333334, ans=15.0 2023-11-18 09:03:04,801 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11100, loss[loss=0.1478, simple_loss=0.1523, pruned_loss=0.0589, audio_tagging_loss=0.01275, over 14475.00 frames. ], tot_loss[loss=0.1246, simple_loss=0.1319, pruned_loss=0.04596, audio_tagging_loss=0.01272, over 3048698.87 frames. ], batch size: 57, lr: 2.45e-02, grad_scale: 32.0 2023-11-18 09:03:05,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=154160.0, ans=0.125 2023-11-18 09:03:07,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=154160.0, ans=0.125 2023-11-18 09:03:15,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=154226.66666666666, ans=0.125 2023-11-18 09:03:16,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=154226.66666666666, ans=0.0 2023-11-18 09:03:22,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=154226.66666666666, ans=0.1 2023-11-18 09:04:00,573 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11150, loss[loss=0.07873, simple_loss=0.07761, pruned_loss=0.02578, audio_tagging_loss=0.01414, over 15299.00 frames. ], tot_loss[loss=0.125, simple_loss=0.1323, pruned_loss=0.04609, audio_tagging_loss=0.01276, over 3051438.33 frames. ], batch size: 60, lr: 2.45e-02, grad_scale: 32.0 2023-11-18 09:04:33,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=154626.66666666666, ans=0.0 2023-11-18 09:04:33,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=154626.66666666666, ans=0.0 2023-11-18 09:04:37,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=154693.33333333334, ans=0.125 2023-11-18 09:04:39,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=154693.33333333334, ans=0.125 2023-11-18 09:04:43,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=154693.33333333334, ans=0.125 2023-11-18 09:04:48,101 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 1.033e+02 1.136e+02 1.301e+02 2.057e+02, threshold=2.273e+02, percent-clipped=0.0 2023-11-18 09:04:53,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=154760.0, ans=15.0 2023-11-18 09:04:57,170 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11200, loss[loss=0.1319, simple_loss=0.1491, pruned_loss=0.04658, audio_tagging_loss=0.01071, over 15913.00 frames. ], tot_loss[loss=0.1239, simple_loss=0.131, pruned_loss=0.04544, audio_tagging_loss=0.01296, over 3056818.31 frames. ], batch size: 59, lr: 2.45e-02, grad_scale: 32.0 2023-11-18 09:05:23,863 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=15.0 2023-11-18 09:05:28,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=154960.0, ans=0.125 2023-11-18 09:05:32,436 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.85 vs. limit=22.5 2023-11-18 09:05:53,446 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=15.0 2023-11-18 09:05:53,932 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11250, loss[loss=0.1227, simple_loss=0.131, pruned_loss=0.04568, audio_tagging_loss=0.0115, over 16235.00 frames. ], tot_loss[loss=0.1243, simple_loss=0.1314, pruned_loss=0.04571, audio_tagging_loss=0.0129, over 3054005.80 frames. ], batch size: 59, lr: 2.44e-02, grad_scale: 32.0 2023-11-18 09:06:04,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=155226.66666666666, ans=0.2 2023-11-18 09:06:07,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=155226.66666666666, ans=0.0 2023-11-18 09:06:21,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=155293.33333333334, ans=0.125 2023-11-18 09:06:22,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=155293.33333333334, ans=0.0 2023-11-18 09:06:39,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=155426.66666666666, ans=0.0 2023-11-18 09:06:41,312 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.020e+01 9.682e+01 1.104e+02 1.218e+02 1.906e+02, threshold=2.209e+02, percent-clipped=0.0 2023-11-18 09:06:42,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=155426.66666666666, ans=0.0 2023-11-18 09:06:49,989 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11300, loss[loss=0.1102, simple_loss=0.113, pruned_loss=0.04006, audio_tagging_loss=0.01363, over 16714.00 frames. ], tot_loss[loss=0.1241, simple_loss=0.1316, pruned_loss=0.04567, audio_tagging_loss=0.01263, over 3049208.52 frames. ], batch size: 63, lr: 2.44e-02, grad_scale: 32.0 2023-11-18 09:06:50,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=155493.33333333334, ans=0.125 2023-11-18 09:06:52,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=155493.33333333334, ans=0.0 2023-11-18 09:07:06,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=155560.0, ans=0.1 2023-11-18 09:07:08,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=155560.0, ans=0.125 2023-11-18 09:07:08,547 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.83 vs. limit=10.0 2023-11-18 09:07:45,922 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11350, loss[loss=0.1141, simple_loss=0.1251, pruned_loss=0.04041, audio_tagging_loss=0.01116, over 14120.00 frames. ], tot_loss[loss=0.1236, simple_loss=0.1309, pruned_loss=0.04555, audio_tagging_loss=0.01261, over 3047134.54 frames. ], batch size: 57, lr: 2.44e-02, grad_scale: 32.0 2023-11-18 09:07:59,654 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.529e+00 2023-11-18 09:08:09,100 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=15.0 2023-11-18 09:08:33,858 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.925e+01 1.029e+02 1.095e+02 1.224e+02 1.585e+02, threshold=2.190e+02, percent-clipped=0.0 2023-11-18 09:08:37,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=156093.33333333334, ans=22.5 2023-11-18 09:08:43,601 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11400, loss[loss=0.1174, simple_loss=0.125, pruned_loss=0.04315, audio_tagging_loss=0.01178, over 14869.00 frames. ], tot_loss[loss=0.1234, simple_loss=0.131, pruned_loss=0.04543, audio_tagging_loss=0.01246, over 3044209.67 frames. ], batch size: 57, lr: 2.44e-02, grad_scale: 32.0 2023-11-18 09:08:52,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=156160.0, ans=0.1 2023-11-18 09:08:59,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=156226.66666666666, ans=0.1 2023-11-18 09:09:17,390 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.48 vs. limit=15.0 2023-11-18 09:09:20,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=156360.0, ans=0.0 2023-11-18 09:09:21,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=156360.0, ans=0.125 2023-11-18 09:09:35,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=156426.66666666666, ans=0.0 2023-11-18 09:09:39,846 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11450, loss[loss=0.1058, simple_loss=0.1154, pruned_loss=0.03497, audio_tagging_loss=0.01306, over 16186.00 frames. ], tot_loss[loss=0.1227, simple_loss=0.1304, pruned_loss=0.0451, audio_tagging_loss=0.01243, over 3053613.89 frames. ], batch size: 61, lr: 2.43e-02, grad_scale: 32.0 2023-11-18 09:10:14,356 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2023-11-18 09:10:26,555 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.266e+01 9.860e+01 1.075e+02 1.215e+02 1.820e+02, threshold=2.151e+02, percent-clipped=0.0 2023-11-18 09:10:26,784 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:10:26,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=156760.0, ans=0.125 2023-11-18 09:10:27,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=156760.0, ans=0.1 2023-11-18 09:10:28,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=156760.0, ans=0.125 2023-11-18 09:10:35,116 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11500, loss[loss=0.1179, simple_loss=0.1348, pruned_loss=0.03776, audio_tagging_loss=0.01275, over 15540.00 frames. ], tot_loss[loss=0.122, simple_loss=0.1296, pruned_loss=0.04473, audio_tagging_loss=0.01244, over 3045836.96 frames. ], batch size: 58, lr: 2.43e-02, grad_scale: 32.0 2023-11-18 09:10:43,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=156826.66666666666, ans=0.125 2023-11-18 09:11:07,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=156960.0, ans=0.125 2023-11-18 09:11:31,799 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11550, loss[loss=0.09603, simple_loss=0.09146, pruned_loss=0.03339, audio_tagging_loss=0.01692, over 15538.00 frames. ], tot_loss[loss=0.1228, simple_loss=0.1304, pruned_loss=0.04511, audio_tagging_loss=0.01252, over 3056527.13 frames. ], batch size: 61, lr: 2.43e-02, grad_scale: 32.0 2023-11-18 09:11:41,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=157160.0, ans=0.0 2023-11-18 09:11:48,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=157226.66666666666, ans=0.2 2023-11-18 09:12:01,707 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:12:04,106 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.78 vs. limit=22.5 2023-11-18 09:12:07,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=157360.0, ans=0.0 2023-11-18 09:12:08,168 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.26 vs. limit=22.5 2023-11-18 09:12:19,806 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.118e+01 1.012e+02 1.136e+02 1.340e+02 1.723e+02, threshold=2.272e+02, percent-clipped=0.0 2023-11-18 09:12:25,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=157426.66666666666, ans=0.125 2023-11-18 09:12:28,354 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11600, loss[loss=0.1511, simple_loss=0.1621, pruned_loss=0.05756, audio_tagging_loss=0.01253, over 14784.00 frames. ], tot_loss[loss=0.1231, simple_loss=0.1308, pruned_loss=0.04523, audio_tagging_loss=0.01247, over 3056187.39 frames. ], batch size: 56, lr: 2.43e-02, grad_scale: 32.0 2023-11-18 09:12:33,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=157493.33333333334, ans=0.1 2023-11-18 09:12:33,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=157493.33333333334, ans=0.2 2023-11-18 09:12:35,371 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.46 vs. limit=6.0 2023-11-18 09:13:05,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=157693.33333333334, ans=0.125 2023-11-18 09:13:23,791 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11650, loss[loss=0.1255, simple_loss=0.1378, pruned_loss=0.04155, audio_tagging_loss=0.015, over 14805.00 frames. ], tot_loss[loss=0.1236, simple_loss=0.1313, pruned_loss=0.04544, audio_tagging_loss=0.01251, over 3048756.07 frames. ], batch size: 53, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:13:44,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=157893.33333333334, ans=0.1 2023-11-18 09:14:04,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=158026.66666666666, ans=0.125 2023-11-18 09:14:10,052 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.668e+01 1.030e+02 1.124e+02 1.249e+02 1.579e+02, threshold=2.249e+02, percent-clipped=0.0 2023-11-18 09:14:11,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=158093.33333333334, ans=0.0 2023-11-18 09:14:19,000 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11700, loss[loss=0.1405, simple_loss=0.1456, pruned_loss=0.05486, audio_tagging_loss=0.01286, over 14924.00 frames. ], tot_loss[loss=0.1233, simple_loss=0.1307, pruned_loss=0.04524, audio_tagging_loss=0.01268, over 3052202.02 frames. ], batch size: 56, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:14:22,115 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.20 vs. limit=10.0 2023-11-18 09:14:34,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=158226.66666666666, ans=0.0 2023-11-18 09:14:37,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=158226.66666666666, ans=0.125 2023-11-18 09:14:38,988 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.24 vs. limit=22.5 2023-11-18 09:14:49,005 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:14:58,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=158360.0, ans=0.0 2023-11-18 09:14:59,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=158360.0, ans=0.0 2023-11-18 09:15:04,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=158426.66666666666, ans=0.125 2023-11-18 09:15:15,191 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11750, loss[loss=0.1244, simple_loss=0.1381, pruned_loss=0.0457, audio_tagging_loss=0.009663, over 15059.00 frames. ], tot_loss[loss=0.1224, simple_loss=0.1297, pruned_loss=0.04486, audio_tagging_loss=0.01271, over 3049097.52 frames. ], batch size: 56, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:15:19,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=158493.33333333334, ans=0.125 2023-11-18 09:15:25,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=158560.0, ans=0.2 2023-11-18 09:15:37,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.76 vs. limit=22.5 2023-11-18 09:15:41,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=158626.66666666666, ans=0.1 2023-11-18 09:16:00,882 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:16:01,614 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 9.907e+01 1.124e+02 1.266e+02 1.981e+02, threshold=2.248e+02, percent-clipped=0.0 2023-11-18 09:16:10,011 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11800, loss[loss=0.1171, simple_loss=0.1268, pruned_loss=0.04422, audio_tagging_loss=0.009485, over 14201.00 frames. ], tot_loss[loss=0.1239, simple_loss=0.1316, pruned_loss=0.0455, audio_tagging_loss=0.01261, over 3045102.82 frames. ], batch size: 55, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:16:36,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=158960.0, ans=0.1 2023-11-18 09:16:40,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=158960.0, ans=0.0 2023-11-18 09:17:05,603 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11850, loss[loss=0.102, simple_loss=0.1048, pruned_loss=0.03241, audio_tagging_loss=0.0172, over 14286.00 frames. ], tot_loss[loss=0.1245, simple_loss=0.1324, pruned_loss=0.04567, audio_tagging_loss=0.01261, over 3040682.31 frames. ], batch size: 56, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:17:25,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=159226.66666666666, ans=0.125 2023-11-18 09:17:30,785 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.611e-01 2023-11-18 09:17:52,692 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.196e+01 1.014e+02 1.138e+02 1.282e+02 2.288e+02, threshold=2.275e+02, percent-clipped=1.0 2023-11-18 09:17:56,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=159426.66666666666, ans=0.125 2023-11-18 09:17:56,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=159426.66666666666, ans=0.125 2023-11-18 09:18:01,627 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11900, loss[loss=0.1711, simple_loss=0.1886, pruned_loss=0.06524, audio_tagging_loss=0.01161, over 16140.00 frames. ], tot_loss[loss=0.1239, simple_loss=0.1319, pruned_loss=0.04521, audio_tagging_loss=0.01274, over 3041428.97 frames. ], batch size: 56, lr: 2.41e-02, grad_scale: 32.0 2023-11-18 09:18:12,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=159560.0, ans=0.125 2023-11-18 09:18:29,233 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.98 vs. limit=22.5 2023-11-18 09:18:38,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=159693.33333333334, ans=0.5 2023-11-18 09:18:41,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=159693.33333333334, ans=0.95 2023-11-18 09:18:55,106 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2023-11-18 09:18:56,806 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 11950, loss[loss=0.1433, simple_loss=0.1591, pruned_loss=0.05365, audio_tagging_loss=0.01006, over 15423.00 frames. ], tot_loss[loss=0.1236, simple_loss=0.1315, pruned_loss=0.04508, audio_tagging_loss=0.01281, over 3047556.21 frames. ], batch size: 56, lr: 2.41e-02, grad_scale: 32.0 2023-11-18 09:19:00,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=159826.66666666666, ans=0.0 2023-11-18 09:19:06,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=159893.33333333334, ans=0.125 2023-11-18 09:19:18,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=159960.0, ans=0.125 2023-11-18 09:19:24,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=159960.0, ans=0.0 2023-11-18 09:19:31,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=160026.66666666666, ans=0.1 2023-11-18 09:19:44,165 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.182e+01 9.874e+01 1.073e+02 1.187e+02 1.717e+02, threshold=2.145e+02, percent-clipped=0.0 2023-11-18 09:19:48,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=160093.33333333334, ans=0.2 2023-11-18 09:19:50,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=160093.33333333334, ans=0.125 2023-11-18 09:19:52,772 INFO [train_asr.py:1115] (2/4) Epoch 2, batch 12000, loss[loss=0.1081, simple_loss=0.1103, pruned_loss=0.03871, audio_tagging_loss=0.01419, over 15933.00 frames. ], tot_loss[loss=0.1236, simple_loss=0.1312, pruned_loss=0.04504, audio_tagging_loss=0.01299, over 3050424.15 frames. ], batch size: 59, lr: 2.41e-02, grad_scale: 32.0 2023-11-18 09:19:52,773 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 09:20:15,304 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.6396, 2.1258, 1.2752, 2.8994, 2.2123, 2.5331, 2.6436, 2.5408], device='cuda:2') 2023-11-18 09:20:24,858 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7133, 5.7507, 5.8068, 5.8577], device='cuda:2') 2023-11-18 09:20:26,779 INFO [train_asr.py:1147] (2/4) Epoch 2, validation: loss=0.08437, simple_loss=0.06733, pruned_loss=0.01363, audio_tagging_loss=0.03708, over 4681554.00 frames. 2023-11-18 09:20:26,780 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 09:20:37,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=160226.66666666666, ans=0.125 2023-11-18 09:20:41,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=160226.66666666666, ans=0.0 2023-11-18 09:20:43,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=160226.66666666666, ans=0.2 2023-11-18 09:20:44,537 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.05 vs. limit=15.0 2023-11-18 09:21:26,868 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 0, loss[loss=0.1444, simple_loss=0.1439, pruned_loss=0.04732, audio_tagging_loss=0.02514, over 15732.00 frames. ], tot_loss[loss=0.1444, simple_loss=0.1439, pruned_loss=0.04732, audio_tagging_loss=0.02514, over 15732.00 frames. ], batch size: 60, lr: 2.29e-02, grad_scale: 32.0 2023-11-18 09:21:26,868 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 09:21:58,061 INFO [train_asr.py:1147] (2/4) Epoch 3, validation: loss=0.08217, simple_loss=0.06725, pruned_loss=0.01375, audio_tagging_loss=0.03479, over 4681554.00 frames. 2023-11-18 09:21:58,062 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 09:22:03,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=160300.0, ans=0.125 2023-11-18 09:22:07,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=160300.0, ans=0.125 2023-11-18 09:22:10,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=160366.66666666666, ans=0.125 2023-11-18 09:22:14,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=160366.66666666666, ans=0.1 2023-11-18 09:22:37,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=160500.0, ans=0.125 2023-11-18 09:22:53,029 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 50, loss[loss=0.1432, simple_loss=0.1472, pruned_loss=0.05086, audio_tagging_loss=0.01874, over 13709.00 frames. ], tot_loss[loss=0.1349, simple_loss=0.1329, pruned_loss=0.04467, audio_tagging_loss=0.02374, over 695409.38 frames. ], batch size: 54, lr: 2.29e-02, grad_scale: 32.0 2023-11-18 09:22:57,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=160633.33333333334, ans=0.125 2023-11-18 09:22:58,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=160633.33333333334, ans=0.04949747468305833 2023-11-18 09:23:10,276 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.79 vs. limit=12.0 2023-11-18 09:23:16,044 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.190e+01 1.036e+02 1.137e+02 1.326e+02 1.917e+02, threshold=2.275e+02, percent-clipped=0.0 2023-11-18 09:23:47,703 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 100, loss[loss=0.1734, simple_loss=0.1694, pruned_loss=0.06989, audio_tagging_loss=0.01885, over 15242.00 frames. ], tot_loss[loss=0.1362, simple_loss=0.1349, pruned_loss=0.04591, audio_tagging_loss=0.02283, over 1211614.41 frames. ], batch size: 54, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:23:47,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=160966.66666666666, ans=0.125 2023-11-18 09:23:59,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=161033.33333333334, ans=0.0 2023-11-18 09:24:01,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=161033.33333333334, ans=0.0 2023-11-18 09:24:29,476 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.20 vs. limit=15.0 2023-11-18 09:24:33,032 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2023-11-18 09:24:36,458 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:24:39,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=161233.33333333334, ans=0.125 2023-11-18 09:24:43,646 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 150, loss[loss=0.1501, simple_loss=0.1567, pruned_loss=0.05551, audio_tagging_loss=0.01626, over 14694.00 frames. ], tot_loss[loss=0.1303, simple_loss=0.131, pruned_loss=0.04432, audio_tagging_loss=0.02049, over 1620547.20 frames. ], batch size: 54, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:24:50,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=161300.0, ans=0.09899494936611666 2023-11-18 09:25:06,615 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 1.007e+02 1.136e+02 1.298e+02 1.875e+02, threshold=2.273e+02, percent-clipped=0.0 2023-11-18 09:25:29,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=161566.66666666666, ans=0.0 2023-11-18 09:25:39,263 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 200, loss[loss=0.1108, simple_loss=0.1088, pruned_loss=0.03753, audio_tagging_loss=0.0189, over 15760.00 frames. ], tot_loss[loss=0.1286, simple_loss=0.1318, pruned_loss=0.04464, audio_tagging_loss=0.01804, over 1937617.32 frames. ], batch size: 59, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:25:40,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=161633.33333333334, ans=0.125 2023-11-18 09:25:44,759 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:25:59,121 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.34 vs. limit=15.0 2023-11-18 09:26:31,553 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.94 vs. limit=22.5 2023-11-18 09:26:34,570 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 250, loss[loss=0.1269, simple_loss=0.1375, pruned_loss=0.04794, audio_tagging_loss=0.01023, over 15722.00 frames. ], tot_loss[loss=0.127, simple_loss=0.1321, pruned_loss=0.04472, audio_tagging_loss=0.01624, over 2183536.25 frames. ], batch size: 60, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:26:46,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=162033.33333333334, ans=0.125 2023-11-18 09:26:50,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=162033.33333333334, ans=0.0 2023-11-18 09:26:57,825 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.024e+01 1.002e+02 1.144e+02 1.310e+02 1.731e+02, threshold=2.288e+02, percent-clipped=0.0 2023-11-18 09:27:06,339 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.29 vs. limit=15.0 2023-11-18 09:27:19,594 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.70 vs. limit=12.0 2023-11-18 09:27:27,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=162233.33333333334, ans=0.0 2023-11-18 09:27:27,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=162233.33333333334, ans=0.125 2023-11-18 09:27:27,787 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:27:30,580 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 300, loss[loss=0.126, simple_loss=0.1449, pruned_loss=0.04337, audio_tagging_loss=0.01015, over 15431.00 frames. ], tot_loss[loss=0.1242, simple_loss=0.1306, pruned_loss=0.04384, audio_tagging_loss=0.01503, over 2376050.66 frames. ], batch size: 54, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:27:36,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=162300.0, ans=0.0 2023-11-18 09:27:44,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=162366.66666666666, ans=0.1 2023-11-18 09:27:45,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=162366.66666666666, ans=0.1 2023-11-18 09:27:59,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=162433.33333333334, ans=0.95 2023-11-18 09:28:25,387 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 350, loss[loss=0.114, simple_loss=0.121, pruned_loss=0.03948, audio_tagging_loss=0.01408, over 14180.00 frames. ], tot_loss[loss=0.1247, simple_loss=0.132, pruned_loss=0.04449, audio_tagging_loss=0.01424, over 2525739.91 frames. ], batch size: 53, lr: 2.27e-02, grad_scale: 64.0 2023-11-18 09:28:50,146 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.758e+01 9.861e+01 1.085e+02 1.214e+02 1.858e+02, threshold=2.170e+02, percent-clipped=0.0 2023-11-18 09:28:50,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=162766.66666666666, ans=0.0 2023-11-18 09:28:55,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=162766.66666666666, ans=0.04949747468305833 2023-11-18 09:28:59,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=162833.33333333334, ans=0.1 2023-11-18 09:29:08,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=162833.33333333334, ans=0.1 2023-11-18 09:29:21,421 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 400, loss[loss=0.1196, simple_loss=0.125, pruned_loss=0.0429, audio_tagging_loss=0.01423, over 17075.00 frames. ], tot_loss[loss=0.1241, simple_loss=0.1319, pruned_loss=0.0443, audio_tagging_loss=0.01382, over 2643003.04 frames. ], batch size: 64, lr: 2.27e-02, grad_scale: 64.0 2023-11-18 09:29:38,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=163033.33333333334, ans=0.125 2023-11-18 09:29:43,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=163100.0, ans=0.125 2023-11-18 09:29:43,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=163100.0, ans=0.0 2023-11-18 09:29:49,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=163100.0, ans=0.125 2023-11-18 09:29:57,641 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=11.09 vs. limit=12.0 2023-11-18 09:30:18,228 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 450, loss[loss=0.1611, simple_loss=0.1743, pruned_loss=0.06015, audio_tagging_loss=0.01374, over 15586.00 frames. ], tot_loss[loss=0.1238, simple_loss=0.1318, pruned_loss=0.04435, audio_tagging_loss=0.01348, over 2732402.30 frames. ], batch size: 57, lr: 2.27e-02, grad_scale: 64.0 2023-11-18 09:30:18,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=163300.0, ans=0.0 2023-11-18 09:30:18,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.50 vs. limit=12.0 2023-11-18 09:30:21,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=163300.0, ans=0.125 2023-11-18 09:30:24,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=163300.0, ans=0.125 2023-11-18 09:30:37,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=163366.66666666666, ans=0.0 2023-11-18 09:30:40,028 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.316e+01 9.836e+01 1.125e+02 1.262e+02 2.640e+02, threshold=2.251e+02, percent-clipped=1.0 2023-11-18 09:31:12,860 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 500, loss[loss=0.08588, simple_loss=0.09051, pruned_loss=0.0287, audio_tagging_loss=0.01192, over 15869.00 frames. ], tot_loss[loss=0.1235, simple_loss=0.1321, pruned_loss=0.04438, audio_tagging_loss=0.01312, over 2795540.12 frames. ], batch size: 60, lr: 2.27e-02, grad_scale: 64.0 2023-11-18 09:31:15,513 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.49 vs. limit=15.0 2023-11-18 09:31:27,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=163700.0, ans=0.05 2023-11-18 09:31:31,335 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.94 vs. limit=15.0 2023-11-18 09:32:03,486 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.595e+00 2023-11-18 09:32:07,424 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 550, loss[loss=0.1263, simple_loss=0.1354, pruned_loss=0.04597, audio_tagging_loss=0.01267, over 16827.00 frames. ], tot_loss[loss=0.1241, simple_loss=0.1328, pruned_loss=0.04473, audio_tagging_loss=0.01293, over 2845511.26 frames. ], batch size: 65, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:32:12,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=163966.66666666666, ans=0.125 2023-11-18 09:32:18,287 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=12.0 2023-11-18 09:32:27,041 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.01 vs. limit=22.5 2023-11-18 09:32:31,487 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 9.566e+01 1.089e+02 1.252e+02 1.679e+02, threshold=2.177e+02, percent-clipped=0.0 2023-11-18 09:32:54,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=164233.33333333334, ans=0.125 2023-11-18 09:33:02,876 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=5.169e-01 2023-11-18 09:33:03,671 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 600, loss[loss=0.1211, simple_loss=0.1372, pruned_loss=0.04655, audio_tagging_loss=0.005935, over 14452.00 frames. ], tot_loss[loss=0.1241, simple_loss=0.1334, pruned_loss=0.04477, audio_tagging_loss=0.01265, over 2888395.17 frames. ], batch size: 55, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:33:04,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=164300.0, ans=0.2 2023-11-18 09:33:24,515 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.97 vs. limit=10.0 2023-11-18 09:33:51,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=164566.66666666666, ans=0.0 2023-11-18 09:33:57,685 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 650, loss[loss=0.1504, simple_loss=0.1674, pruned_loss=0.05682, audio_tagging_loss=0.009882, over 16802.00 frames. ], tot_loss[loss=0.1242, simple_loss=0.1335, pruned_loss=0.0449, audio_tagging_loss=0.01258, over 2927144.80 frames. ], batch size: 61, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:34:01,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=164633.33333333334, ans=0.125 2023-11-18 09:34:03,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=164633.33333333334, ans=0.0 2023-11-18 09:34:07,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=164700.0, ans=0.0 2023-11-18 09:34:11,836 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.87 vs. limit=15.0 2023-11-18 09:34:12,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=164700.0, ans=0.125 2023-11-18 09:34:20,738 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 9.683e+01 1.100e+02 1.220e+02 1.764e+02, threshold=2.199e+02, percent-clipped=0.0 2023-11-18 09:34:25,128 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2023-11-18 09:34:29,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=164766.66666666666, ans=0.125 2023-11-18 09:34:38,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=164833.33333333334, ans=0.125 2023-11-18 09:34:42,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=164900.0, ans=0.125 2023-11-18 09:34:52,270 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 700, loss[loss=0.09012, simple_loss=0.1047, pruned_loss=0.02662, audio_tagging_loss=0.01116, over 15627.00 frames. ], tot_loss[loss=0.1241, simple_loss=0.1336, pruned_loss=0.04479, audio_tagging_loss=0.01256, over 2958186.47 frames. ], batch size: 58, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:34:54,871 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.83 vs. limit=12.0 2023-11-18 09:35:00,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=164966.66666666666, ans=0.1 2023-11-18 09:35:06,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=165033.33333333334, ans=0.125 2023-11-18 09:35:11,785 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=22.5 2023-11-18 09:35:17,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=165100.0, ans=0.125 2023-11-18 09:35:33,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=165166.66666666666, ans=0.125 2023-11-18 09:35:37,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=165233.33333333334, ans=0.0 2023-11-18 09:35:39,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=165233.33333333334, ans=0.125 2023-11-18 09:35:43,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=165233.33333333334, ans=0.125 2023-11-18 09:35:47,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=165233.33333333334, ans=0.0 2023-11-18 09:35:49,130 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 750, loss[loss=0.1195, simple_loss=0.1206, pruned_loss=0.04563, audio_tagging_loss=0.01362, over 16427.00 frames. ], tot_loss[loss=0.1235, simple_loss=0.1328, pruned_loss=0.04451, audio_tagging_loss=0.01258, over 2982765.84 frames. ], batch size: 63, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:35:50,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=165300.0, ans=0.125 2023-11-18 09:36:08,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=165366.66666666666, ans=0.0 2023-11-18 09:36:11,469 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.593e+01 1.007e+02 1.126e+02 1.277e+02 1.870e+02, threshold=2.252e+02, percent-clipped=0.0 2023-11-18 09:36:22,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=165500.0, ans=0.1 2023-11-18 09:36:42,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=165566.66666666666, ans=0.1 2023-11-18 09:36:44,358 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 800, loss[loss=0.1645, simple_loss=0.1888, pruned_loss=0.06083, audio_tagging_loss=0.009298, over 15359.00 frames. ], tot_loss[loss=0.1243, simple_loss=0.1339, pruned_loss=0.0448, audio_tagging_loss=0.01254, over 2999771.09 frames. ], batch size: 53, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:37:19,471 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=22.5 2023-11-18 09:37:20,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=165833.33333333334, ans=0.125 2023-11-18 09:37:39,016 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 850, loss[loss=0.1077, simple_loss=0.1134, pruned_loss=0.03566, audio_tagging_loss=0.01534, over 14809.00 frames. ], tot_loss[loss=0.1239, simple_loss=0.1335, pruned_loss=0.04456, audio_tagging_loss=0.0126, over 3002845.79 frames. ], batch size: 57, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:37:40,323 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.442e-01 2023-11-18 09:37:54,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=166033.33333333334, ans=0.1 2023-11-18 09:37:58,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=166033.33333333334, ans=0.0 2023-11-18 09:38:03,435 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.295e+01 1.046e+02 1.125e+02 1.279e+02 2.412e+02, threshold=2.250e+02, percent-clipped=1.0 2023-11-18 09:38:09,139 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2023-11-18 09:38:16,566 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2023-11-18 09:38:18,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=166166.66666666666, ans=0.04949747468305833 2023-11-18 09:38:23,901 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.30 vs. limit=15.0 2023-11-18 09:38:25,147 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.94 vs. limit=15.0 2023-11-18 09:38:31,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=166233.33333333334, ans=0.1 2023-11-18 09:38:35,530 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 900, loss[loss=0.09514, simple_loss=0.09345, pruned_loss=0.03206, audio_tagging_loss=0.01636, over 13872.00 frames. ], tot_loss[loss=0.1248, simple_loss=0.1343, pruned_loss=0.04496, audio_tagging_loss=0.01273, over 3016670.79 frames. ], batch size: 54, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:38:59,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=166433.33333333334, ans=0.125 2023-11-18 09:39:01,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=166433.33333333334, ans=0.0 2023-11-18 09:39:12,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=166500.0, ans=0.125 2023-11-18 09:39:23,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=166566.66666666666, ans=0.1 2023-11-18 09:39:25,394 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=12.0 2023-11-18 09:39:27,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=166566.66666666666, ans=0.04949747468305833 2023-11-18 09:39:31,299 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 950, loss[loss=0.1429, simple_loss=0.1579, pruned_loss=0.05206, audio_tagging_loss=0.01188, over 15347.00 frames. ], tot_loss[loss=0.1229, simple_loss=0.1323, pruned_loss=0.04423, audio_tagging_loss=0.01254, over 3019676.22 frames. ], batch size: 56, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:39:54,127 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 9.509e+01 1.090e+02 1.237e+02 1.820e+02, threshold=2.179e+02, percent-clipped=0.0 2023-11-18 09:39:58,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=166766.66666666666, ans=0.125 2023-11-18 09:40:12,969 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.53 vs. limit=12.0 2023-11-18 09:40:26,289 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1000, loss[loss=0.08886, simple_loss=0.08674, pruned_loss=0.03138, audio_tagging_loss=0.01411, over 13861.00 frames. ], tot_loss[loss=0.1217, simple_loss=0.1308, pruned_loss=0.0438, audio_tagging_loss=0.01245, over 3025552.83 frames. ], batch size: 55, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:40:50,177 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:40:59,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=167166.66666666666, ans=0.0 2023-11-18 09:41:16,629 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.36 vs. limit=22.5 2023-11-18 09:41:22,338 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1050, loss[loss=0.1216, simple_loss=0.1396, pruned_loss=0.0389, audio_tagging_loss=0.01294, over 14682.00 frames. ], tot_loss[loss=0.1207, simple_loss=0.1299, pruned_loss=0.04335, audio_tagging_loss=0.01239, over 3026082.63 frames. ], batch size: 53, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:41:30,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=167300.0, ans=0.125 2023-11-18 09:41:45,461 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.835e+01 9.727e+01 1.056e+02 1.215e+02 1.619e+02, threshold=2.112e+02, percent-clipped=0.0 2023-11-18 09:42:04,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167500.0, ans=0.1 2023-11-18 09:42:15,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=167566.66666666666, ans=0.0 2023-11-18 09:42:18,375 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1100, loss[loss=0.1202, simple_loss=0.1194, pruned_loss=0.04356, audio_tagging_loss=0.01695, over 14643.00 frames. ], tot_loss[loss=0.1195, simple_loss=0.1287, pruned_loss=0.04289, audio_tagging_loss=0.0123, over 3026156.68 frames. ], batch size: 56, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:42:21,547 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:42:41,354 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.58 vs. limit=15.0 2023-11-18 09:42:50,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=167766.66666666666, ans=0.125 2023-11-18 09:42:51,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=167833.33333333334, ans=0.0 2023-11-18 09:43:00,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=167833.33333333334, ans=0.1 2023-11-18 09:43:07,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=167900.0, ans=0.125 2023-11-18 09:43:13,560 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1150, loss[loss=0.1227, simple_loss=0.1338, pruned_loss=0.04629, audio_tagging_loss=0.009533, over 15190.00 frames. ], tot_loss[loss=0.1192, simple_loss=0.1286, pruned_loss=0.04269, audio_tagging_loss=0.01224, over 3031553.74 frames. ], batch size: 57, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:43:19,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=167966.66666666666, ans=0.125 2023-11-18 09:43:29,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=168033.33333333334, ans=0.125 2023-11-18 09:43:31,511 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.54 vs. limit=15.0 2023-11-18 09:43:34,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=168033.33333333334, ans=0.2 2023-11-18 09:43:35,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=168100.0, ans=0.125 2023-11-18 09:43:37,785 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 9.862e+01 1.112e+02 1.270e+02 2.649e+02, threshold=2.225e+02, percent-clipped=1.0 2023-11-18 09:43:41,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=168100.0, ans=0.0 2023-11-18 09:43:51,711 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.78 vs. limit=15.0 2023-11-18 09:43:59,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=168233.33333333334, ans=0.125 2023-11-18 09:44:01,854 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:44:09,518 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1200, loss[loss=0.1065, simple_loss=0.1137, pruned_loss=0.03554, audio_tagging_loss=0.01415, over 15278.00 frames. ], tot_loss[loss=0.1196, simple_loss=0.1292, pruned_loss=0.0429, audio_tagging_loss=0.0121, over 3033673.05 frames. ], batch size: 59, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:44:22,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=168366.66666666666, ans=0.2 2023-11-18 09:44:24,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=168366.66666666666, ans=0.125 2023-11-18 09:44:43,775 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.84 vs. limit=10.0 2023-11-18 09:44:55,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=168566.66666666666, ans=0.0 2023-11-18 09:45:05,704 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1250, loss[loss=0.1325, simple_loss=0.1422, pruned_loss=0.05068, audio_tagging_loss=0.01069, over 14184.00 frames. ], tot_loss[loss=0.1208, simple_loss=0.1306, pruned_loss=0.04341, audio_tagging_loss=0.01208, over 3033987.91 frames. ], batch size: 56, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:45:10,127 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.088e+00 2023-11-18 09:45:28,399 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.825e+01 1.002e+02 1.131e+02 1.253e+02 1.979e+02, threshold=2.263e+02, percent-clipped=0.0 2023-11-18 09:45:46,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=168833.33333333334, ans=0.125 2023-11-18 09:45:56,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=168900.0, ans=0.125 2023-11-18 09:46:00,850 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1300, loss[loss=0.1363, simple_loss=0.1512, pruned_loss=0.05405, audio_tagging_loss=0.006717, over 14937.00 frames. ], tot_loss[loss=0.1202, simple_loss=0.13, pruned_loss=0.04313, audio_tagging_loss=0.01209, over 3032210.06 frames. ], batch size: 56, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:46:06,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=168966.66666666666, ans=0.125 2023-11-18 09:46:42,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=169166.66666666666, ans=0.0 2023-11-18 09:46:47,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=169233.33333333334, ans=0.0 2023-11-18 09:46:55,925 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1350, loss[loss=0.09501, simple_loss=0.1036, pruned_loss=0.03199, audio_tagging_loss=0.01121, over 13581.00 frames. ], tot_loss[loss=0.12, simple_loss=0.1293, pruned_loss=0.0431, audio_tagging_loss=0.01225, over 3029653.19 frames. ], batch size: 54, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:47:01,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=169300.0, ans=0.0 2023-11-18 09:47:07,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=169366.66666666666, ans=0.09899494936611666 2023-11-18 09:47:10,046 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.82 vs. limit=15.0 2023-11-18 09:47:13,116 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2023-11-18 09:47:18,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=169433.33333333334, ans=0.125 2023-11-18 09:47:19,993 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.798e+01 9.482e+01 1.049e+02 1.147e+02 1.889e+02, threshold=2.098e+02, percent-clipped=0.0 2023-11-18 09:47:26,131 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2023-11-18 09:47:29,794 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.77 vs. limit=6.0 2023-11-18 09:47:35,587 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:47:37,372 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:47:42,545 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.47 vs. limit=22.5 2023-11-18 09:47:43,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=169566.66666666666, ans=0.125 2023-11-18 09:47:49,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=169566.66666666666, ans=0.125 2023-11-18 09:47:52,633 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1400, loss[loss=0.1454, simple_loss=0.1549, pruned_loss=0.05422, audio_tagging_loss=0.01373, over 16511.00 frames. ], tot_loss[loss=0.1197, simple_loss=0.1289, pruned_loss=0.04293, audio_tagging_loss=0.01231, over 3038234.02 frames. ], batch size: 59, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:47:52,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=169633.33333333334, ans=0.07 2023-11-18 09:47:57,308 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.49 vs. limit=15.0 2023-11-18 09:48:17,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=169766.66666666666, ans=0.0 2023-11-18 09:48:28,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=169833.33333333334, ans=0.0 2023-11-18 09:48:30,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=169833.33333333334, ans=0.0 2023-11-18 09:48:40,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=169900.0, ans=0.1 2023-11-18 09:48:47,403 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1450, loss[loss=0.1151, simple_loss=0.1255, pruned_loss=0.03964, audio_tagging_loss=0.01276, over 15232.00 frames. ], tot_loss[loss=0.1199, simple_loss=0.129, pruned_loss=0.04306, audio_tagging_loss=0.01234, over 3043447.07 frames. ], batch size: 57, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:49:03,853 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.78 vs. limit=22.5 2023-11-18 09:49:03,893 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.44 vs. limit=22.5 2023-11-18 09:49:11,551 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.160e+01 9.587e+01 1.090e+02 1.197e+02 1.611e+02, threshold=2.181e+02, percent-clipped=0.0 2023-11-18 09:49:36,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=170233.33333333334, ans=0.2 2023-11-18 09:49:39,136 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.60 vs. limit=22.5 2023-11-18 09:49:42,818 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1500, loss[loss=0.1429, simple_loss=0.1502, pruned_loss=0.05734, audio_tagging_loss=0.01052, over 15397.00 frames. ], tot_loss[loss=0.1207, simple_loss=0.1296, pruned_loss=0.04349, audio_tagging_loss=0.01245, over 3046290.18 frames. ], batch size: 57, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:49:45,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=170300.0, ans=0.125 2023-11-18 09:50:04,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=170433.33333333334, ans=0.125 2023-11-18 09:50:24,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=170500.0, ans=0.125 2023-11-18 09:50:27,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=170566.66666666666, ans=0.07 2023-11-18 09:50:35,810 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.42 vs. limit=22.5 2023-11-18 09:50:39,363 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1550, loss[loss=0.1111, simple_loss=0.1198, pruned_loss=0.03708, audio_tagging_loss=0.01411, over 16418.00 frames. ], tot_loss[loss=0.1208, simple_loss=0.1295, pruned_loss=0.04359, audio_tagging_loss=0.0125, over 3048707.07 frames. ], batch size: 62, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:50:59,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=170766.66666666666, ans=0.0 2023-11-18 09:51:02,091 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 1.007e+02 1.092e+02 1.205e+02 1.689e+02, threshold=2.183e+02, percent-clipped=0.0 2023-11-18 09:51:31,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=170900.0, ans=0.125 2023-11-18 09:51:34,205 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1600, loss[loss=0.1073, simple_loss=0.1184, pruned_loss=0.03817, audio_tagging_loss=0.009965, over 15456.00 frames. ], tot_loss[loss=0.1198, simple_loss=0.1284, pruned_loss=0.04301, audio_tagging_loss=0.01263, over 3046902.85 frames. ], batch size: 58, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:51:58,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=171100.0, ans=0.125 2023-11-18 09:52:20,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=171233.33333333334, ans=0.125 2023-11-18 09:52:25,257 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.94 vs. limit=15.0 2023-11-18 09:52:29,520 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1650, loss[loss=0.1406, simple_loss=0.1635, pruned_loss=0.04922, audio_tagging_loss=0.009691, over 15320.00 frames. ], tot_loss[loss=0.1206, simple_loss=0.1294, pruned_loss=0.04332, audio_tagging_loss=0.01256, over 3050090.90 frames. ], batch size: 55, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:52:30,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=171300.0, ans=0.1 2023-11-18 09:52:30,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=171300.0, ans=0.125 2023-11-18 09:52:39,164 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.27 vs. limit=15.0 2023-11-18 09:52:46,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=171366.66666666666, ans=0.05 2023-11-18 09:52:52,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=171433.33333333334, ans=0.2 2023-11-18 09:52:53,344 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.843e+01 9.689e+01 1.063e+02 1.242e+02 1.763e+02, threshold=2.126e+02, percent-clipped=0.0 2023-11-18 09:53:05,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=171500.0, ans=0.0 2023-11-18 09:53:14,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=171566.66666666666, ans=0.0 2023-11-18 09:53:14,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=171566.66666666666, ans=0.125 2023-11-18 09:53:17,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=171566.66666666666, ans=0.0 2023-11-18 09:53:26,188 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1700, loss[loss=0.1239, simple_loss=0.1357, pruned_loss=0.04328, audio_tagging_loss=0.01277, over 16855.00 frames. ], tot_loss[loss=0.1203, simple_loss=0.1295, pruned_loss=0.04315, audio_tagging_loss=0.01237, over 3060373.68 frames. ], batch size: 63, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:53:27,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=171633.33333333334, ans=0.125 2023-11-18 09:53:29,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=171633.33333333334, ans=0.07 2023-11-18 09:53:35,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=171700.0, ans=0.2 2023-11-18 09:54:20,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=171966.66666666666, ans=0.125 2023-11-18 09:54:20,943 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1750, loss[loss=0.1343, simple_loss=0.1474, pruned_loss=0.04952, audio_tagging_loss=0.01107, over 14788.00 frames. ], tot_loss[loss=0.1195, simple_loss=0.1287, pruned_loss=0.04284, audio_tagging_loss=0.01231, over 3051964.66 frames. ], batch size: 56, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:54:44,389 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 9.833e+01 1.116e+02 1.265e+02 1.757e+02, threshold=2.232e+02, percent-clipped=0.0 2023-11-18 09:54:44,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=172100.0, ans=0.0 2023-11-18 09:55:03,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=172166.66666666666, ans=0.0 2023-11-18 09:55:05,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=172233.33333333334, ans=0.125 2023-11-18 09:55:16,037 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1800, loss[loss=0.1266, simple_loss=0.1375, pruned_loss=0.04726, audio_tagging_loss=0.01061, over 14569.00 frames. ], tot_loss[loss=0.1185, simple_loss=0.1275, pruned_loss=0.04249, audio_tagging_loss=0.01229, over 3048749.87 frames. ], batch size: 55, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:55:20,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=172300.0, ans=0.0 2023-11-18 09:55:34,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=172366.66666666666, ans=0.1 2023-11-18 09:55:39,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=172433.33333333334, ans=0.0 2023-11-18 09:55:48,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=172500.0, ans=0.0 2023-11-18 09:55:53,349 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.17 vs. limit=15.0 2023-11-18 09:56:12,362 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1850, loss[loss=0.1398, simple_loss=0.1459, pruned_loss=0.05432, audio_tagging_loss=0.01254, over 15241.00 frames. ], tot_loss[loss=0.1183, simple_loss=0.1273, pruned_loss=0.04247, audio_tagging_loss=0.0122, over 3048824.96 frames. ], batch size: 57, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:56:21,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=172700.0, ans=0.1 2023-11-18 09:56:27,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=172700.0, ans=0.2 2023-11-18 09:56:29,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=172700.0, ans=0.2 2023-11-18 09:56:31,981 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.24 vs. limit=15.0 2023-11-18 09:56:34,450 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.476e+01 9.349e+01 1.016e+02 1.150e+02 1.872e+02, threshold=2.031e+02, percent-clipped=0.0 2023-11-18 09:57:07,223 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1900, loss[loss=0.09108, simple_loss=0.09961, pruned_loss=0.02982, audio_tagging_loss=0.01146, over 15816.00 frames. ], tot_loss[loss=0.1179, simple_loss=0.127, pruned_loss=0.04219, audio_tagging_loss=0.01223, over 3050488.37 frames. ], batch size: 61, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:57:12,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=172966.66666666666, ans=0.125 2023-11-18 09:57:46,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=173166.66666666666, ans=0.1 2023-11-18 09:57:50,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=173166.66666666666, ans=0.025 2023-11-18 09:57:55,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=173233.33333333334, ans=0.125 2023-11-18 09:58:02,652 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 1950, loss[loss=0.08328, simple_loss=0.08686, pruned_loss=0.02594, audio_tagging_loss=0.0139, over 14845.00 frames. ], tot_loss[loss=0.1178, simple_loss=0.1267, pruned_loss=0.04224, audio_tagging_loss=0.01223, over 3050412.41 frames. ], batch size: 57, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:58:07,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=173300.0, ans=0.125 2023-11-18 09:58:22,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=173366.66666666666, ans=0.1 2023-11-18 09:58:26,734 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.049e+01 9.559e+01 1.056e+02 1.197e+02 1.715e+02, threshold=2.112e+02, percent-clipped=0.0 2023-11-18 09:58:26,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=173433.33333333334, ans=0.125 2023-11-18 09:58:45,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=173500.0, ans=0.5 2023-11-18 09:58:58,539 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2000, loss[loss=0.1463, simple_loss=0.1602, pruned_loss=0.05299, audio_tagging_loss=0.01327, over 14797.00 frames. ], tot_loss[loss=0.1172, simple_loss=0.126, pruned_loss=0.042, audio_tagging_loss=0.01225, over 3045348.80 frames. ], batch size: 55, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:59:33,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=173833.33333333334, ans=0.125 2023-11-18 09:59:44,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=173900.0, ans=0.0 2023-11-18 09:59:48,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=173900.0, ans=0.125 2023-11-18 09:59:54,003 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2050, loss[loss=0.09715, simple_loss=0.1001, pruned_loss=0.03512, audio_tagging_loss=0.01201, over 14389.00 frames. ], tot_loss[loss=0.1181, simple_loss=0.1271, pruned_loss=0.04246, audio_tagging_loss=0.01209, over 3040547.07 frames. ], batch size: 55, lr: 2.20e-02, grad_scale: 64.0 2023-11-18 10:00:08,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=174033.33333333334, ans=0.07 2023-11-18 10:00:16,276 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.215e+01 1.049e+02 1.194e+02 1.365e+02 2.043e+02, threshold=2.387e+02, percent-clipped=0.0 2023-11-18 10:00:43,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=174233.33333333334, ans=0.125 2023-11-18 10:00:48,869 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2100, loss[loss=0.1259, simple_loss=0.1271, pruned_loss=0.04843, audio_tagging_loss=0.01398, over 15320.00 frames. ], tot_loss[loss=0.1192, simple_loss=0.1285, pruned_loss=0.04292, audio_tagging_loss=0.01202, over 3035078.73 frames. ], batch size: 58, lr: 2.20e-02, grad_scale: 128.0 2023-11-18 10:01:20,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=174433.33333333334, ans=0.125 2023-11-18 10:01:39,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=174566.66666666666, ans=0.125 2023-11-18 10:01:44,276 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2150, loss[loss=0.1014, simple_loss=0.1033, pruned_loss=0.03376, audio_tagging_loss=0.01601, over 14124.00 frames. ], tot_loss[loss=0.119, simple_loss=0.1282, pruned_loss=0.04276, audio_tagging_loss=0.0121, over 3040590.40 frames. ], batch size: 54, lr: 2.20e-02, grad_scale: 128.0 2023-11-18 10:01:52,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=174633.33333333334, ans=0.0 2023-11-18 10:02:03,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=174700.0, ans=0.2 2023-11-18 10:02:08,228 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.984e+01 9.849e+01 1.118e+02 1.250e+02 1.648e+02, threshold=2.236e+02, percent-clipped=0.0 2023-11-18 10:02:12,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=174766.66666666666, ans=0.0 2023-11-18 10:02:18,835 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:02:35,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=174900.0, ans=0.0 2023-11-18 10:02:41,108 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2200, loss[loss=0.1437, simple_loss=0.1485, pruned_loss=0.05337, audio_tagging_loss=0.0161, over 14807.00 frames. ], tot_loss[loss=0.1199, simple_loss=0.1289, pruned_loss=0.04323, audio_tagging_loss=0.01219, over 3042712.69 frames. ], batch size: 55, lr: 2.20e-02, grad_scale: 128.0 2023-11-18 10:02:41,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn1.whiten.whitening_limit, batch_count=174966.66666666666, ans=22.5 2023-11-18 10:03:06,550 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.72 vs. limit=15.0 2023-11-18 10:03:16,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=175166.66666666666, ans=0.0 2023-11-18 10:03:17,638 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.00 vs. limit=15.0 2023-11-18 10:03:22,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=175166.66666666666, ans=0.1 2023-11-18 10:03:33,602 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=22.5 2023-11-18 10:03:36,348 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2250, loss[loss=0.1138, simple_loss=0.1234, pruned_loss=0.03775, audio_tagging_loss=0.01437, over 15314.00 frames. ], tot_loss[loss=0.1205, simple_loss=0.1297, pruned_loss=0.0434, audio_tagging_loss=0.01227, over 3045644.17 frames. ], batch size: 59, lr: 2.20e-02, grad_scale: 64.0 2023-11-18 10:03:40,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=175300.0, ans=0.125 2023-11-18 10:03:43,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=175300.0, ans=0.0 2023-11-18 10:03:45,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.54 vs. limit=22.5 2023-11-18 10:04:00,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=175433.33333333334, ans=0.0 2023-11-18 10:04:01,347 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 9.758e+01 1.067e+02 1.178e+02 1.415e+02, threshold=2.133e+02, percent-clipped=0.0 2023-11-18 10:04:32,096 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2300, loss[loss=0.1294, simple_loss=0.136, pruned_loss=0.0476, audio_tagging_loss=0.01384, over 15997.00 frames. ], tot_loss[loss=0.121, simple_loss=0.1304, pruned_loss=0.04346, audio_tagging_loss=0.01236, over 3056515.99 frames. ], batch size: 62, lr: 2.19e-02, grad_scale: 64.0 2023-11-18 10:04:34,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=175633.33333333334, ans=0.125 2023-11-18 10:04:35,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=175633.33333333334, ans=0.0 2023-11-18 10:04:36,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=175633.33333333334, ans=0.1 2023-11-18 10:04:43,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=175700.0, ans=0.1 2023-11-18 10:05:22,649 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:05:27,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=175966.66666666666, ans=0.125 2023-11-18 10:05:27,915 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2350, loss[loss=0.13, simple_loss=0.1526, pruned_loss=0.04144, audio_tagging_loss=0.0122, over 14981.00 frames. ], tot_loss[loss=0.121, simple_loss=0.1304, pruned_loss=0.0435, audio_tagging_loss=0.01232, over 3058637.76 frames. ], batch size: 54, lr: 2.19e-02, grad_scale: 64.0 2023-11-18 10:05:37,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=175966.66666666666, ans=0.07 2023-11-18 10:05:51,951 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.821e+01 9.805e+01 1.113e+02 1.261e+02 1.707e+02, threshold=2.226e+02, percent-clipped=0.0 2023-11-18 10:06:21,116 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.02 vs. limit=15.0 2023-11-18 10:06:22,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=176300.0, ans=0.0 2023-11-18 10:06:23,642 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2400, loss[loss=0.1085, simple_loss=0.1176, pruned_loss=0.03999, audio_tagging_loss=0.00969, over 15767.00 frames. ], tot_loss[loss=0.1196, simple_loss=0.1288, pruned_loss=0.04285, audio_tagging_loss=0.01239, over 3059267.22 frames. ], batch size: 58, lr: 2.19e-02, grad_scale: 32.0 2023-11-18 10:06:23,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=176300.0, ans=0.2 2023-11-18 10:06:25,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=176300.0, ans=22.5 2023-11-18 10:06:52,392 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2023-11-18 10:06:52,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=176433.33333333334, ans=0.0 2023-11-18 10:07:09,492 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.69 vs. limit=15.0 2023-11-18 10:07:19,257 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2450, loss[loss=0.103, simple_loss=0.105, pruned_loss=0.0346, audio_tagging_loss=0.01583, over 15557.00 frames. ], tot_loss[loss=0.1192, simple_loss=0.1285, pruned_loss=0.0425, audio_tagging_loss=0.01239, over 3051448.54 frames. ], batch size: 58, lr: 2.19e-02, grad_scale: 32.0 2023-11-18 10:07:19,532 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.822e-02 2023-11-18 10:07:45,100 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 1.013e+02 1.126e+02 1.298e+02 2.274e+02, threshold=2.253e+02, percent-clipped=1.0 2023-11-18 10:07:54,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=176833.33333333334, ans=0.125 2023-11-18 10:08:15,425 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2500, loss[loss=0.1516, simple_loss=0.1714, pruned_loss=0.0568, audio_tagging_loss=0.009137, over 13928.00 frames. ], tot_loss[loss=0.1195, simple_loss=0.1285, pruned_loss=0.04277, audio_tagging_loss=0.01243, over 3043076.86 frames. ], batch size: 53, lr: 2.19e-02, grad_scale: 32.0 2023-11-18 10:08:35,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=177033.33333333334, ans=0.125 2023-11-18 10:08:54,976 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.14 vs. limit=15.0 2023-11-18 10:09:06,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=177233.33333333334, ans=0.035 2023-11-18 10:09:06,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=177233.33333333334, ans=0.125 2023-11-18 10:09:10,977 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2550, loss[loss=0.1009, simple_loss=0.1085, pruned_loss=0.03492, audio_tagging_loss=0.01175, over 14599.00 frames. ], tot_loss[loss=0.1189, simple_loss=0.1282, pruned_loss=0.04258, audio_tagging_loss=0.01221, over 3046194.21 frames. ], batch size: 56, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:09:20,149 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.12 vs. limit=10.0 2023-11-18 10:09:37,081 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.896e+01 9.703e+01 1.094e+02 1.267e+02 1.679e+02, threshold=2.187e+02, percent-clipped=0.0 2023-11-18 10:09:43,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=177500.0, ans=0.1 2023-11-18 10:09:54,158 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2023-11-18 10:09:59,376 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-11-18 10:10:06,247 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2600, loss[loss=0.1344, simple_loss=0.1371, pruned_loss=0.04928, audio_tagging_loss=0.0166, over 15577.00 frames. ], tot_loss[loss=0.1195, simple_loss=0.129, pruned_loss=0.04283, audio_tagging_loss=0.01216, over 3048105.85 frames. ], batch size: 58, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:10:09,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=177633.33333333334, ans=0.2 2023-11-18 10:10:28,458 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.86 vs. limit=22.5 2023-11-18 10:10:29,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=177766.66666666666, ans=0.1 2023-11-18 10:10:41,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=177833.33333333334, ans=0.125 2023-11-18 10:11:03,107 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2650, loss[loss=0.1187, simple_loss=0.1345, pruned_loss=0.04237, audio_tagging_loss=0.009124, over 14908.00 frames. ], tot_loss[loss=0.1193, simple_loss=0.1287, pruned_loss=0.04286, audio_tagging_loss=0.01208, over 3045195.33 frames. ], batch size: 55, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:11:15,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=178033.33333333334, ans=0.2 2023-11-18 10:11:20,428 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2023-11-18 10:11:27,750 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 9.789e+01 1.066e+02 1.192e+02 1.496e+02, threshold=2.133e+02, percent-clipped=0.0 2023-11-18 10:11:30,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=178100.0, ans=0.125 2023-11-18 10:11:57,877 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2700, loss[loss=0.1318, simple_loss=0.1464, pruned_loss=0.04245, audio_tagging_loss=0.01609, over 16601.00 frames. ], tot_loss[loss=0.1191, simple_loss=0.1287, pruned_loss=0.04274, audio_tagging_loss=0.01203, over 3046830.09 frames. ], batch size: 64, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:12:07,440 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=15.0 2023-11-18 10:12:12,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=178366.66666666666, ans=0.07 2023-11-18 10:12:21,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=178433.33333333334, ans=0.1 2023-11-18 10:12:40,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=178500.0, ans=0.125 2023-11-18 10:12:53,261 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2750, loss[loss=0.1208, simple_loss=0.1256, pruned_loss=0.04781, audio_tagging_loss=0.01015, over 15556.00 frames. ], tot_loss[loss=0.1187, simple_loss=0.1279, pruned_loss=0.04268, audio_tagging_loss=0.01205, over 3048853.20 frames. ], batch size: 59, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:13:07,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=178700.0, ans=0.1 2023-11-18 10:13:19,275 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.876e+01 1.008e+02 1.122e+02 1.241e+02 2.001e+02, threshold=2.244e+02, percent-clipped=0.0 2023-11-18 10:13:36,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=178833.33333333334, ans=0.2 2023-11-18 10:13:41,763 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:13:50,103 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2800, loss[loss=0.129, simple_loss=0.1385, pruned_loss=0.0514, audio_tagging_loss=0.008383, over 15620.00 frames. ], tot_loss[loss=0.1196, simple_loss=0.129, pruned_loss=0.04309, audio_tagging_loss=0.01199, over 3049136.23 frames. ], batch size: 59, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:13:51,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=178966.66666666666, ans=0.125 2023-11-18 10:13:52,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=178966.66666666666, ans=0.1 2023-11-18 10:14:20,869 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.76 vs. limit=6.0 2023-11-18 10:14:38,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=179233.33333333334, ans=0.125 2023-11-18 10:14:43,328 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.35 vs. limit=6.0 2023-11-18 10:14:44,620 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2850, loss[loss=0.1143, simple_loss=0.1194, pruned_loss=0.04094, audio_tagging_loss=0.01372, over 15250.00 frames. ], tot_loss[loss=0.1201, simple_loss=0.1298, pruned_loss=0.04324, audio_tagging_loss=0.012, over 3044555.23 frames. ], batch size: 59, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:14:44,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=179300.0, ans=0.125 2023-11-18 10:14:53,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=179300.0, ans=0.0 2023-11-18 10:14:58,274 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=15.0 2023-11-18 10:15:04,942 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=179366.66666666666, ans=0.125 2023-11-18 10:15:10,734 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.435e+01 9.812e+01 1.069e+02 1.186e+02 1.678e+02, threshold=2.137e+02, percent-clipped=0.0 2023-11-18 10:15:11,441 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.03 vs. limit=15.0 2023-11-18 10:15:30,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=179566.66666666666, ans=0.035 2023-11-18 10:15:36,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=179566.66666666666, ans=0.2 2023-11-18 10:15:36,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=179566.66666666666, ans=0.0 2023-11-18 10:15:38,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=179633.33333333334, ans=0.125 2023-11-18 10:15:39,684 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2900, loss[loss=0.1265, simple_loss=0.1422, pruned_loss=0.04411, audio_tagging_loss=0.01131, over 15606.00 frames. ], tot_loss[loss=0.1196, simple_loss=0.1294, pruned_loss=0.04301, audio_tagging_loss=0.01193, over 3041869.50 frames. ], batch size: 59, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:15:53,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=179700.0, ans=0.1 2023-11-18 10:16:00,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=179700.0, ans=0.125 2023-11-18 10:16:12,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=179833.33333333334, ans=0.1 2023-11-18 10:16:36,725 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 2950, loss[loss=0.1145, simple_loss=0.133, pruned_loss=0.03611, audio_tagging_loss=0.01193, over 16121.00 frames. ], tot_loss[loss=0.1194, simple_loss=0.1292, pruned_loss=0.04287, audio_tagging_loss=0.01196, over 3049250.88 frames. ], batch size: 59, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:16:38,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=179966.66666666666, ans=0.125 2023-11-18 10:16:39,351 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.78 vs. limit=15.0 2023-11-18 10:16:56,222 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.83 vs. limit=10.0 2023-11-18 10:17:01,085 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.805e+01 9.762e+01 1.073e+02 1.254e+02 1.837e+02, threshold=2.146e+02, percent-clipped=0.0 2023-11-18 10:17:08,004 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.40 vs. limit=15.0 2023-11-18 10:17:26,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=180233.33333333334, ans=0.125 2023-11-18 10:17:26,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=180233.33333333334, ans=0.0 2023-11-18 10:17:32,028 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3000, loss[loss=0.1477, simple_loss=0.1615, pruned_loss=0.05754, audio_tagging_loss=0.009378, over 17026.00 frames. ], tot_loss[loss=0.1196, simple_loss=0.1294, pruned_loss=0.04282, audio_tagging_loss=0.01207, over 3050189.22 frames. ], batch size: 62, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:17:32,029 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 10:17:46,972 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.6567, 1.7527, 2.6109, 2.5537, 2.1681, 2.0470, 2.4082, 1.9776], device='cuda:2') 2023-11-18 10:17:49,094 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.7544, 1.9870, 1.9810, 1.2307, 1.8138, 2.1653, 1.8886, 1.8954], device='cuda:2') 2023-11-18 10:17:59,687 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.2334, 5.0206, 4.9616, 5.1249], device='cuda:2') 2023-11-18 10:18:05,513 INFO [train_asr.py:1147] (2/4) Epoch 3, validation: loss=0.08163, simple_loss=0.06585, pruned_loss=0.01265, audio_tagging_loss=0.03605, over 4681554.00 frames. 2023-11-18 10:18:05,514 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 10:18:27,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=180433.33333333334, ans=0.5 2023-11-18 10:18:33,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=180433.33333333334, ans=0.125 2023-11-18 10:18:35,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=180433.33333333334, ans=0.0 2023-11-18 10:18:50,232 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.39 vs. limit=22.5 2023-11-18 10:19:01,886 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3050, loss[loss=0.1472, simple_loss=0.1631, pruned_loss=0.05642, audio_tagging_loss=0.009175, over 15242.00 frames. ], tot_loss[loss=0.1196, simple_loss=0.1293, pruned_loss=0.04279, audio_tagging_loss=0.01218, over 3048804.57 frames. ], batch size: 55, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:19:13,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=180700.0, ans=0.1 2023-11-18 10:19:19,322 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.03 vs. limit=15.0 2023-11-18 10:19:19,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=180700.0, ans=0.0 2023-11-18 10:19:26,064 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 9.626e+01 1.059e+02 1.215e+02 1.726e+02, threshold=2.118e+02, percent-clipped=0.0 2023-11-18 10:19:33,990 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:19:39,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=180833.33333333334, ans=0.0 2023-11-18 10:19:56,874 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3100, loss[loss=0.08447, simple_loss=0.09105, pruned_loss=0.02622, audio_tagging_loss=0.01272, over 15241.00 frames. ], tot_loss[loss=0.1201, simple_loss=0.1294, pruned_loss=0.04304, audio_tagging_loss=0.01232, over 3049022.98 frames. ], batch size: 57, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:20:08,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=181033.33333333334, ans=0.0 2023-11-18 10:20:38,788 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.26 vs. limit=22.5 2023-11-18 10:20:51,942 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3150, loss[loss=0.1169, simple_loss=0.1275, pruned_loss=0.03961, audio_tagging_loss=0.01347, over 15513.00 frames. ], tot_loss[loss=0.1205, simple_loss=0.1302, pruned_loss=0.04312, audio_tagging_loss=0.0123, over 3052791.99 frames. ], batch size: 58, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:21:01,800 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.44 vs. limit=10.0 2023-11-18 10:21:10,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=181366.66666666666, ans=0.125 2023-11-18 10:21:15,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=181433.33333333334, ans=0.125 2023-11-18 10:21:18,111 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.858e+01 9.872e+01 1.154e+02 1.398e+02 2.452e+02, threshold=2.308e+02, percent-clipped=3.0 2023-11-18 10:21:27,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=181500.0, ans=0.125 2023-11-18 10:21:30,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=181500.0, ans=0.04949747468305833 2023-11-18 10:21:35,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=181566.66666666666, ans=0.0 2023-11-18 10:21:38,089 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.07 vs. limit=22.5 2023-11-18 10:21:39,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=181566.66666666666, ans=0.125 2023-11-18 10:21:45,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=181566.66666666666, ans=0.125 2023-11-18 10:21:48,260 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3200, loss[loss=0.1155, simple_loss=0.1212, pruned_loss=0.04317, audio_tagging_loss=0.01175, over 15098.00 frames. ], tot_loss[loss=0.1209, simple_loss=0.1308, pruned_loss=0.04315, audio_tagging_loss=0.01238, over 3053728.16 frames. ], batch size: 57, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:21:50,842 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2023-11-18 10:21:58,459 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.66 vs. limit=22.5 2023-11-18 10:22:07,872 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.50 vs. limit=15.0 2023-11-18 10:22:10,750 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.76 vs. limit=12.0 2023-11-18 10:22:13,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=181766.66666666666, ans=0.0 2023-11-18 10:22:27,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=181833.33333333334, ans=0.125 2023-11-18 10:22:36,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=181900.0, ans=0.125 2023-11-18 10:22:39,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=181900.0, ans=0.2 2023-11-18 10:22:42,989 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3250, loss[loss=0.1245, simple_loss=0.144, pruned_loss=0.03948, audio_tagging_loss=0.01297, over 14888.00 frames. ], tot_loss[loss=0.1206, simple_loss=0.1301, pruned_loss=0.04304, audio_tagging_loss=0.01249, over 3061194.15 frames. ], batch size: 55, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:23:05,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=182100.0, ans=0.04949747468305833 2023-11-18 10:23:07,867 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 9.562e+01 1.039e+02 1.190e+02 1.635e+02, threshold=2.078e+02, percent-clipped=0.0 2023-11-18 10:23:09,151 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:23:17,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten.whitening_limit, batch_count=182166.66666666666, ans=15.0 2023-11-18 10:23:20,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=182166.66666666666, ans=0.125 2023-11-18 10:23:35,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=182233.33333333334, ans=0.125 2023-11-18 10:23:37,518 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3300, loss[loss=0.1513, simple_loss=0.1578, pruned_loss=0.06205, audio_tagging_loss=0.0104, over 14636.00 frames. ], tot_loss[loss=0.1202, simple_loss=0.1298, pruned_loss=0.04284, audio_tagging_loss=0.01248, over 3056128.83 frames. ], batch size: 54, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:23:43,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=182300.0, ans=0.0 2023-11-18 10:23:58,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=182366.66666666666, ans=0.0 2023-11-18 10:24:01,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=182433.33333333334, ans=0.1 2023-11-18 10:24:01,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=182433.33333333334, ans=0.125 2023-11-18 10:24:02,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=182433.33333333334, ans=0.125 2023-11-18 10:24:10,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=182500.0, ans=0.0 2023-11-18 10:24:24,822 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.32 vs. limit=22.5 2023-11-18 10:24:33,223 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3350, loss[loss=0.1102, simple_loss=0.1171, pruned_loss=0.03742, audio_tagging_loss=0.01424, over 15813.00 frames. ], tot_loss[loss=0.1196, simple_loss=0.1289, pruned_loss=0.04272, audio_tagging_loss=0.01244, over 3061581.32 frames. ], batch size: 59, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:24:33,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=182633.33333333334, ans=0.0 2023-11-18 10:24:41,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=182633.33333333334, ans=0.2 2023-11-18 10:24:58,935 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 9.499e+01 1.070e+02 1.220e+02 2.186e+02, threshold=2.139e+02, percent-clipped=1.0 2023-11-18 10:25:08,488 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.94 vs. limit=22.5 2023-11-18 10:25:14,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=182833.33333333334, ans=0.125 2023-11-18 10:25:16,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=182833.33333333334, ans=0.0 2023-11-18 10:25:29,562 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3400, loss[loss=0.1478, simple_loss=0.1704, pruned_loss=0.0515, audio_tagging_loss=0.0111, over 15833.00 frames. ], tot_loss[loss=0.1193, simple_loss=0.1289, pruned_loss=0.04255, audio_tagging_loss=0.01228, over 3066684.48 frames. ], batch size: 57, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:25:31,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=182966.66666666666, ans=0.125 2023-11-18 10:25:44,851 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.54 vs. limit=10.0 2023-11-18 10:25:47,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=183033.33333333334, ans=0.0 2023-11-18 10:25:48,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=183033.33333333334, ans=0.0 2023-11-18 10:25:56,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=183100.0, ans=0.2 2023-11-18 10:26:10,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=183166.66666666666, ans=0.125 2023-11-18 10:26:18,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=183233.33333333334, ans=0.1 2023-11-18 10:26:24,425 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3450, loss[loss=0.1314, simple_loss=0.1513, pruned_loss=0.04448, audio_tagging_loss=0.01122, over 15672.00 frames. ], tot_loss[loss=0.1203, simple_loss=0.1302, pruned_loss=0.04311, audio_tagging_loss=0.01204, over 3061921.64 frames. ], batch size: 58, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:26:29,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2023-11-18 10:26:38,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=183366.66666666666, ans=0.1 2023-11-18 10:26:50,541 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 9.538e+01 1.062e+02 1.197e+02 2.158e+02, threshold=2.124e+02, percent-clipped=1.0 2023-11-18 10:27:13,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=183566.66666666666, ans=0.2 2023-11-18 10:27:20,069 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3500, loss[loss=0.08943, simple_loss=0.09556, pruned_loss=0.03092, audio_tagging_loss=0.01073, over 14915.00 frames. ], tot_loss[loss=0.1193, simple_loss=0.1293, pruned_loss=0.04254, audio_tagging_loss=0.01206, over 3057581.56 frames. ], batch size: 57, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:27:37,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=183700.0, ans=0.125 2023-11-18 10:27:45,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=183766.66666666666, ans=0.0 2023-11-18 10:27:48,631 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:27:49,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=183766.66666666666, ans=0.125 2023-11-18 10:28:06,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=183900.0, ans=10.0 2023-11-18 10:28:06,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=183900.0, ans=0.0 2023-11-18 10:28:15,723 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3550, loss[loss=0.1004, simple_loss=0.1067, pruned_loss=0.03485, audio_tagging_loss=0.01215, over 16089.00 frames. ], tot_loss[loss=0.1194, simple_loss=0.1295, pruned_loss=0.04272, audio_tagging_loss=0.01198, over 3058138.42 frames. ], batch size: 60, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:28:15,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=183966.66666666666, ans=0.125 2023-11-18 10:28:26,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=184033.33333333334, ans=0.125 2023-11-18 10:28:39,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=184100.0, ans=0.125 2023-11-18 10:28:41,221 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.814e+01 9.739e+01 1.081e+02 1.236e+02 3.784e+02, threshold=2.163e+02, percent-clipped=1.0 2023-11-18 10:28:54,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=184166.66666666666, ans=0.125 2023-11-18 10:28:54,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=184166.66666666666, ans=0.0 2023-11-18 10:29:11,451 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3600, loss[loss=0.1252, simple_loss=0.1322, pruned_loss=0.04657, audio_tagging_loss=0.01256, over 15691.00 frames. ], tot_loss[loss=0.1194, simple_loss=0.1294, pruned_loss=0.0428, audio_tagging_loss=0.01191, over 3060093.94 frames. ], batch size: 58, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:29:12,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=184300.0, ans=0.5 2023-11-18 10:29:25,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=184366.66666666666, ans=0.125 2023-11-18 10:29:51,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=184500.0, ans=0.125 2023-11-18 10:30:06,996 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3650, loss[loss=0.1043, simple_loss=0.1192, pruned_loss=0.03265, audio_tagging_loss=0.01208, over 14997.00 frames. ], tot_loss[loss=0.1186, simple_loss=0.1284, pruned_loss=0.04248, audio_tagging_loss=0.01198, over 3050046.44 frames. ], batch size: 56, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:30:09,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=184633.33333333334, ans=0.125 2023-11-18 10:30:15,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=184633.33333333334, ans=0.125 2023-11-18 10:30:23,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=184700.0, ans=0.035 2023-11-18 10:30:32,818 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.231e+01 1.003e+02 1.085e+02 1.205e+02 1.999e+02, threshold=2.169e+02, percent-clipped=0.0 2023-11-18 10:30:34,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=184766.66666666666, ans=0.125 2023-11-18 10:30:34,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=184766.66666666666, ans=0.125 2023-11-18 10:30:35,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=184766.66666666666, ans=0.95 2023-11-18 10:30:56,013 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.79 vs. limit=6.0 2023-11-18 10:31:02,921 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3700, loss[loss=0.1457, simple_loss=0.1564, pruned_loss=0.05703, audio_tagging_loss=0.01045, over 15771.00 frames. ], tot_loss[loss=0.1185, simple_loss=0.1284, pruned_loss=0.04234, audio_tagging_loss=0.01195, over 3049564.23 frames. ], batch size: 58, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:31:07,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=184966.66666666666, ans=15.0 2023-11-18 10:31:25,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=185100.0, ans=0.125 2023-11-18 10:31:58,483 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3750, loss[loss=0.1387, simple_loss=0.1508, pruned_loss=0.05061, audio_tagging_loss=0.01263, over 16112.00 frames. ], tot_loss[loss=0.1193, simple_loss=0.1292, pruned_loss=0.0427, audio_tagging_loss=0.01201, over 3055021.45 frames. ], batch size: 58, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:32:06,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=185300.0, ans=0.125 2023-11-18 10:32:24,707 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.955e+01 9.670e+01 1.084e+02 1.200e+02 2.427e+02, threshold=2.168e+02, percent-clipped=1.0 2023-11-18 10:32:24,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=185433.33333333334, ans=0.2 2023-11-18 10:32:37,372 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:32:46,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=185566.66666666666, ans=0.1 2023-11-18 10:32:54,192 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3800, loss[loss=0.1336, simple_loss=0.1549, pruned_loss=0.04566, audio_tagging_loss=0.01052, over 14769.00 frames. ], tot_loss[loss=0.1191, simple_loss=0.1286, pruned_loss=0.04266, audio_tagging_loss=0.01215, over 3053778.70 frames. ], batch size: 57, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:33:08,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=185700.0, ans=0.07 2023-11-18 10:33:22,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=185766.66666666666, ans=0.0 2023-11-18 10:33:37,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=185900.0, ans=0.125 2023-11-18 10:33:37,791 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:33:43,476 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-18 10:33:47,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=185900.0, ans=0.125 2023-11-18 10:33:50,167 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3850, loss[loss=0.0946, simple_loss=0.09062, pruned_loss=0.03444, audio_tagging_loss=0.01485, over 14248.00 frames. ], tot_loss[loss=0.1191, simple_loss=0.1284, pruned_loss=0.04251, audio_tagging_loss=0.01233, over 3049266.74 frames. ], batch size: 54, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:33:54,041 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2023-11-18 10:34:15,533 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 1.009e+02 1.117e+02 1.269e+02 1.869e+02, threshold=2.233e+02, percent-clipped=0.0 2023-11-18 10:34:45,181 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3900, loss[loss=0.1552, simple_loss=0.1647, pruned_loss=0.06363, audio_tagging_loss=0.009208, over 14532.00 frames. ], tot_loss[loss=0.1196, simple_loss=0.1291, pruned_loss=0.04265, audio_tagging_loss=0.01237, over 3045812.51 frames. ], batch size: 54, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:34:46,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=186300.0, ans=0.0 2023-11-18 10:34:55,563 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:34:59,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=186366.66666666666, ans=0.125 2023-11-18 10:35:11,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=186433.33333333334, ans=0.125 2023-11-18 10:35:15,388 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.53 vs. limit=6.0 2023-11-18 10:35:21,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=186500.0, ans=0.95 2023-11-18 10:35:23,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=186500.0, ans=0.0 2023-11-18 10:35:27,085 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.18 vs. limit=8.0 2023-11-18 10:35:40,987 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 3950, loss[loss=0.1006, simple_loss=0.1126, pruned_loss=0.0345, audio_tagging_loss=0.009758, over 15186.00 frames. ], tot_loss[loss=0.1184, simple_loss=0.1279, pruned_loss=0.04211, audio_tagging_loss=0.01235, over 3055456.93 frames. ], batch size: 57, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:35:56,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=186700.0, ans=0.125 2023-11-18 10:35:58,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=186700.0, ans=0.0 2023-11-18 10:36:03,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=186700.0, ans=0.125 2023-11-18 10:36:08,758 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 9.703e+01 1.079e+02 1.244e+02 1.846e+02, threshold=2.158e+02, percent-clipped=0.0 2023-11-18 10:36:12,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=186766.66666666666, ans=0.0 2023-11-18 10:36:24,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=186833.33333333334, ans=0.2 2023-11-18 10:36:39,212 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4000, loss[loss=0.1562, simple_loss=0.1764, pruned_loss=0.05809, audio_tagging_loss=0.009863, over 15234.00 frames. ], tot_loss[loss=0.1194, simple_loss=0.1289, pruned_loss=0.04255, audio_tagging_loss=0.01244, over 3052543.91 frames. ], batch size: 55, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:36:51,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=187033.33333333334, ans=0.2 2023-11-18 10:37:00,627 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.67 vs. limit=15.0 2023-11-18 10:37:11,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=187166.66666666666, ans=0.125 2023-11-18 10:37:18,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=187166.66666666666, ans=0.2 2023-11-18 10:37:20,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=187166.66666666666, ans=0.1 2023-11-18 10:37:34,004 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4050, loss[loss=0.1261, simple_loss=0.1252, pruned_loss=0.05178, audio_tagging_loss=0.01168, over 13132.00 frames. ], tot_loss[loss=0.1196, simple_loss=0.1291, pruned_loss=0.04268, audio_tagging_loss=0.01239, over 3046786.99 frames. ], batch size: 53, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:37:37,181 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:37:53,186 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:38:00,301 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.024e+01 9.492e+01 1.077e+02 1.184e+02 1.546e+02, threshold=2.154e+02, percent-clipped=0.0 2023-11-18 10:38:05,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=187433.33333333334, ans=0.1 2023-11-18 10:38:07,198 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.95 vs. limit=22.5 2023-11-18 10:38:30,032 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4100, loss[loss=0.1177, simple_loss=0.1351, pruned_loss=0.03926, audio_tagging_loss=0.01085, over 15809.00 frames. ], tot_loss[loss=0.1202, simple_loss=0.1298, pruned_loss=0.04296, audio_tagging_loss=0.01234, over 3039124.69 frames. ], batch size: 58, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:38:44,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=187700.0, ans=0.0 2023-11-18 10:38:51,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=187766.66666666666, ans=0.0 2023-11-18 10:39:14,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=187900.0, ans=0.125 2023-11-18 10:39:20,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=187900.0, ans=0.125 2023-11-18 10:39:26,026 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4150, loss[loss=0.1147, simple_loss=0.1262, pruned_loss=0.03992, audio_tagging_loss=0.01166, over 16409.00 frames. ], tot_loss[loss=0.1186, simple_loss=0.1283, pruned_loss=0.04222, audio_tagging_loss=0.01228, over 3035418.25 frames. ], batch size: 63, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:39:50,382 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.052e+01 9.719e+01 1.039e+02 1.185e+02 1.497e+02, threshold=2.077e+02, percent-clipped=0.0 2023-11-18 10:40:07,369 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:40:21,054 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4200, loss[loss=0.1021, simple_loss=0.1107, pruned_loss=0.03306, audio_tagging_loss=0.01369, over 15450.00 frames. ], tot_loss[loss=0.1178, simple_loss=0.1277, pruned_loss=0.04194, audio_tagging_loss=0.01205, over 3038778.46 frames. ], batch size: 58, lr: 2.12e-02, grad_scale: 32.0 2023-11-18 10:40:36,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=188366.66666666666, ans=0.0 2023-11-18 10:40:41,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=188433.33333333334, ans=0.2 2023-11-18 10:40:48,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=188433.33333333334, ans=0.125 2023-11-18 10:40:53,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=188500.0, ans=0.125 2023-11-18 10:40:56,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=188500.0, ans=0.1 2023-11-18 10:40:58,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=188500.0, ans=0.125 2023-11-18 10:41:06,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=188566.66666666666, ans=0.125 2023-11-18 10:41:13,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=188566.66666666666, ans=0.125 2023-11-18 10:41:14,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=188633.33333333334, ans=0.2 2023-11-18 10:41:15,115 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4250, loss[loss=0.1153, simple_loss=0.124, pruned_loss=0.04016, audio_tagging_loss=0.01317, over 14399.00 frames. ], tot_loss[loss=0.118, simple_loss=0.1283, pruned_loss=0.04189, audio_tagging_loss=0.01199, over 3041212.72 frames. ], batch size: 54, lr: 2.12e-02, grad_scale: 32.0 2023-11-18 10:41:16,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=188633.33333333334, ans=0.125 2023-11-18 10:41:23,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=188633.33333333334, ans=0.05 2023-11-18 10:41:38,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=188766.66666666666, ans=0.125 2023-11-18 10:41:41,654 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.045e+01 9.813e+01 1.062e+02 1.234e+02 2.396e+02, threshold=2.125e+02, percent-clipped=1.0 2023-11-18 10:41:43,001 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:41:46,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=188766.66666666666, ans=0.125 2023-11-18 10:41:51,836 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.35 vs. limit=22.5 2023-11-18 10:41:56,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=188833.33333333334, ans=0.0 2023-11-18 10:41:57,139 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2023-11-18 10:41:57,942 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.14 vs. limit=15.0 2023-11-18 10:42:08,240 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.55 vs. limit=22.5 2023-11-18 10:42:12,236 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4300, loss[loss=0.158, simple_loss=0.1731, pruned_loss=0.06298, audio_tagging_loss=0.008418, over 15691.00 frames. ], tot_loss[loss=0.1191, simple_loss=0.1297, pruned_loss=0.04242, audio_tagging_loss=0.01181, over 3034570.74 frames. ], batch size: 56, lr: 2.12e-02, grad_scale: 32.0 2023-11-18 10:42:22,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=189033.33333333334, ans=0.0 2023-11-18 10:42:27,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=189033.33333333334, ans=0.125 2023-11-18 10:42:28,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=189033.33333333334, ans=0.0 2023-11-18 10:42:46,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=189166.66666666666, ans=0.125 2023-11-18 10:43:07,153 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4350, loss[loss=0.1266, simple_loss=0.1404, pruned_loss=0.04613, audio_tagging_loss=0.01022, over 15407.00 frames. ], tot_loss[loss=0.1189, simple_loss=0.1292, pruned_loss=0.04242, audio_tagging_loss=0.01189, over 3033717.25 frames. ], batch size: 59, lr: 2.12e-02, grad_scale: 32.0 2023-11-18 10:43:14,192 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=12.0 2023-11-18 10:43:21,461 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.30 vs. limit=12.0 2023-11-18 10:43:24,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=189366.66666666666, ans=0.0 2023-11-18 10:43:33,114 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 9.769e+01 1.106e+02 1.188e+02 1.814e+02, threshold=2.212e+02, percent-clipped=0.0 2023-11-18 10:43:53,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=189566.66666666666, ans=0.09899494936611666 2023-11-18 10:43:54,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=189566.66666666666, ans=0.125 2023-11-18 10:44:01,994 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4400, loss[loss=0.1194, simple_loss=0.1444, pruned_loss=0.03566, audio_tagging_loss=0.01154, over 15359.00 frames. ], tot_loss[loss=0.1175, simple_loss=0.1277, pruned_loss=0.0417, audio_tagging_loss=0.01193, over 3034548.63 frames. ], batch size: 57, lr: 2.12e-02, grad_scale: 64.0 2023-11-18 10:44:13,249 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.61 vs. limit=15.0 2023-11-18 10:44:16,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=189700.0, ans=0.1 2023-11-18 10:44:20,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=189700.0, ans=0.125 2023-11-18 10:44:25,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=189766.66666666666, ans=0.125 2023-11-18 10:44:25,897 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.69 vs. limit=10.0 2023-11-18 10:44:58,516 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4450, loss[loss=0.1361, simple_loss=0.1504, pruned_loss=0.04977, audio_tagging_loss=0.01112, over 15705.00 frames. ], tot_loss[loss=0.1176, simple_loss=0.1276, pruned_loss=0.04179, audio_tagging_loss=0.01204, over 3041540.28 frames. ], batch size: 56, lr: 2.12e-02, grad_scale: 64.0 2023-11-18 10:45:05,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=189966.66666666666, ans=0.125 2023-11-18 10:45:18,424 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.18 vs. limit=6.0 2023-11-18 10:45:24,127 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 9.777e+01 1.062e+02 1.165e+02 1.734e+02, threshold=2.124e+02, percent-clipped=0.0 2023-11-18 10:45:47,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=190233.33333333334, ans=0.125 2023-11-18 10:45:49,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=190233.33333333334, ans=0.0 2023-11-18 10:45:50,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=190233.33333333334, ans=0.125 2023-11-18 10:45:52,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=190300.0, ans=0.125 2023-11-18 10:45:53,608 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4500, loss[loss=0.09543, simple_loss=0.1071, pruned_loss=0.03135, audio_tagging_loss=0.01054, over 14993.00 frames. ], tot_loss[loss=0.1175, simple_loss=0.1278, pruned_loss=0.04175, audio_tagging_loss=0.0119, over 3047281.83 frames. ], batch size: 56, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:46:29,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=190500.0, ans=0.0 2023-11-18 10:46:33,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=190500.0, ans=0.125 2023-11-18 10:46:42,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=190566.66666666666, ans=0.5 2023-11-18 10:46:48,221 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4550, loss[loss=0.1103, simple_loss=0.1273, pruned_loss=0.03466, audio_tagging_loss=0.01199, over 15541.00 frames. ], tot_loss[loss=0.1163, simple_loss=0.1267, pruned_loss=0.04104, audio_tagging_loss=0.01193, over 3050386.05 frames. ], batch size: 57, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:47:01,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=190700.0, ans=0.1 2023-11-18 10:47:09,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=190700.0, ans=0.125 2023-11-18 10:47:12,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=190766.66666666666, ans=0.025 2023-11-18 10:47:15,824 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 9.411e+01 1.047e+02 1.182e+02 1.787e+02, threshold=2.094e+02, percent-clipped=0.0 2023-11-18 10:47:27,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=190833.33333333334, ans=0.125 2023-11-18 10:47:30,612 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:47:33,377 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.94 vs. limit=15.0 2023-11-18 10:47:43,920 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2023-11-18 10:47:44,406 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4600, loss[loss=0.08904, simple_loss=0.1028, pruned_loss=0.02445, audio_tagging_loss=0.01319, over 14399.00 frames. ], tot_loss[loss=0.1165, simple_loss=0.1267, pruned_loss=0.04104, audio_tagging_loss=0.0121, over 3049198.69 frames. ], batch size: 55, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:48:05,050 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.82 vs. limit=15.0 2023-11-18 10:48:07,058 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.74 vs. limit=10.0 2023-11-18 10:48:31,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=191233.33333333334, ans=0.0 2023-11-18 10:48:34,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=191233.33333333334, ans=0.125 2023-11-18 10:48:40,139 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4650, loss[loss=0.1149, simple_loss=0.1281, pruned_loss=0.03713, audio_tagging_loss=0.01367, over 15371.00 frames. ], tot_loss[loss=0.1171, simple_loss=0.1271, pruned_loss=0.04135, audio_tagging_loss=0.01221, over 3044995.07 frames. ], batch size: 55, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:48:58,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=191366.66666666666, ans=0.125 2023-11-18 10:49:03,602 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.22 vs. limit=10.0 2023-11-18 10:49:06,099 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 9.958e+01 1.111e+02 1.228e+02 2.300e+02, threshold=2.222e+02, percent-clipped=1.0 2023-11-18 10:49:06,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=191433.33333333334, ans=0.125 2023-11-18 10:49:12,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=191500.0, ans=0.125 2023-11-18 10:49:25,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=191566.66666666666, ans=0.125 2023-11-18 10:49:26,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=191566.66666666666, ans=0.5 2023-11-18 10:49:28,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=191566.66666666666, ans=0.125 2023-11-18 10:49:34,886 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4700, loss[loss=0.1152, simple_loss=0.1223, pruned_loss=0.04102, audio_tagging_loss=0.01306, over 16173.00 frames. ], tot_loss[loss=0.1166, simple_loss=0.126, pruned_loss=0.04121, audio_tagging_loss=0.01242, over 3043743.51 frames. ], batch size: 60, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:49:44,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=191633.33333333334, ans=0.125 2023-11-18 10:49:50,095 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.82 vs. limit=15.0 2023-11-18 10:49:52,929 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:49:57,133 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=15.0 2023-11-18 10:50:04,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=191766.66666666666, ans=0.125 2023-11-18 10:50:10,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=191833.33333333334, ans=0.5 2023-11-18 10:50:16,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=191833.33333333334, ans=0.125 2023-11-18 10:50:23,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=191900.0, ans=0.2 2023-11-18 10:50:30,211 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4750, loss[loss=0.1095, simple_loss=0.1085, pruned_loss=0.0429, audio_tagging_loss=0.01237, over 14361.00 frames. ], tot_loss[loss=0.1164, simple_loss=0.1256, pruned_loss=0.04113, audio_tagging_loss=0.01249, over 3044231.36 frames. ], batch size: 55, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:50:42,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=192033.33333333334, ans=10.0 2023-11-18 10:50:42,618 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-11-18 10:50:45,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=192033.33333333334, ans=0.0 2023-11-18 10:50:51,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=192033.33333333334, ans=0.1 2023-11-18 10:50:57,119 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.711e+01 9.880e+01 1.110e+02 1.323e+02 1.950e+02, threshold=2.220e+02, percent-clipped=0.0 2023-11-18 10:51:14,451 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.23 vs. limit=12.0 2023-11-18 10:51:24,431 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2023-11-18 10:51:26,451 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4800, loss[loss=0.1241, simple_loss=0.1442, pruned_loss=0.04184, audio_tagging_loss=0.01018, over 15801.00 frames. ], tot_loss[loss=0.1173, simple_loss=0.1263, pruned_loss=0.04144, audio_tagging_loss=0.01269, over 3047321.85 frames. ], batch size: 57, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:51:29,039 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.45 vs. limit=15.0 2023-11-18 10:51:40,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=192366.66666666666, ans=0.2 2023-11-18 10:51:40,587 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.27 vs. limit=15.0 2023-11-18 10:51:44,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=192366.66666666666, ans=0.125 2023-11-18 10:52:07,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=192500.0, ans=0.125 2023-11-18 10:52:07,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=192500.0, ans=0.125 2023-11-18 10:52:19,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=192566.66666666666, ans=0.1 2023-11-18 10:52:21,044 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4850, loss[loss=0.1157, simple_loss=0.1185, pruned_loss=0.04476, audio_tagging_loss=0.01171, over 14177.00 frames. ], tot_loss[loss=0.1169, simple_loss=0.1259, pruned_loss=0.04119, audio_tagging_loss=0.01275, over 3046064.00 frames. ], batch size: 54, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:52:22,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=192633.33333333334, ans=0.125 2023-11-18 10:52:25,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=192633.33333333334, ans=0.1 2023-11-18 10:52:47,760 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.311e+01 9.557e+01 1.060e+02 1.196e+02 2.281e+02, threshold=2.120e+02, percent-clipped=1.0 2023-11-18 10:52:50,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=192766.66666666666, ans=0.0 2023-11-18 10:53:06,810 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.37 vs. limit=22.5 2023-11-18 10:53:15,995 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4900, loss[loss=0.1228, simple_loss=0.1384, pruned_loss=0.04241, audio_tagging_loss=0.01122, over 15820.00 frames. ], tot_loss[loss=0.1179, simple_loss=0.1273, pruned_loss=0.04159, audio_tagging_loss=0.01263, over 3046141.97 frames. ], batch size: 58, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:53:18,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=192966.66666666666, ans=0.07 2023-11-18 10:53:19,102 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.77 vs. limit=10.0 2023-11-18 10:53:21,032 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=22.86 vs. limit=15.0 2023-11-18 10:53:34,202 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.26 vs. limit=15.0 2023-11-18 10:53:44,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=193100.0, ans=0.125 2023-11-18 10:54:00,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=193233.33333333334, ans=0.125 2023-11-18 10:54:01,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193233.33333333334, ans=0.1 2023-11-18 10:54:11,445 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 4950, loss[loss=0.1181, simple_loss=0.1293, pruned_loss=0.04306, audio_tagging_loss=0.01038, over 15171.00 frames. ], tot_loss[loss=0.1173, simple_loss=0.1268, pruned_loss=0.04148, audio_tagging_loss=0.01246, over 3036395.08 frames. ], batch size: 57, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:54:14,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=193300.0, ans=0.125 2023-11-18 10:54:20,084 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2023-11-18 10:54:28,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=193366.66666666666, ans=0.125 2023-11-18 10:54:30,842 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.79 vs. limit=15.0 2023-11-18 10:54:34,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=193433.33333333334, ans=0.0 2023-11-18 10:54:37,971 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.127e+01 9.494e+01 1.131e+02 1.249e+02 1.755e+02, threshold=2.261e+02, percent-clipped=0.0 2023-11-18 10:54:41,598 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2023-11-18 10:54:41,736 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2023-11-18 10:54:56,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=193566.66666666666, ans=0.1 2023-11-18 10:54:57,506 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.73 vs. limit=10.0 2023-11-18 10:54:57,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=193566.66666666666, ans=0.2 2023-11-18 10:55:06,962 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5000, loss[loss=0.1218, simple_loss=0.129, pruned_loss=0.04299, audio_tagging_loss=0.01434, over 15005.00 frames. ], tot_loss[loss=0.1164, simple_loss=0.1261, pruned_loss=0.04101, audio_tagging_loss=0.01233, over 3040495.72 frames. ], batch size: 57, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:55:19,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=193700.0, ans=0.0 2023-11-18 10:55:21,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193700.0, ans=0.1 2023-11-18 10:55:22,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=193700.0, ans=0.07 2023-11-18 10:55:23,678 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.71 vs. limit=15.0 2023-11-18 10:55:49,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=193833.33333333334, ans=0.125 2023-11-18 10:56:02,084 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5050, loss[loss=0.129, simple_loss=0.1282, pruned_loss=0.05136, audio_tagging_loss=0.01351, over 15042.00 frames. ], tot_loss[loss=0.1173, simple_loss=0.127, pruned_loss=0.04153, audio_tagging_loss=0.01227, over 3043274.72 frames. ], batch size: 59, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 10:56:20,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=194033.33333333334, ans=0.2 2023-11-18 10:56:24,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=194100.0, ans=0.0 2023-11-18 10:56:26,872 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.60 vs. limit=22.5 2023-11-18 10:56:28,909 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 1.006e+02 1.111e+02 1.230e+02 2.145e+02, threshold=2.223e+02, percent-clipped=0.0 2023-11-18 10:56:57,802 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5100, loss[loss=0.1092, simple_loss=0.1307, pruned_loss=0.03483, audio_tagging_loss=0.008991, over 15082.00 frames. ], tot_loss[loss=0.117, simple_loss=0.1267, pruned_loss=0.04149, audio_tagging_loss=0.01211, over 3044915.31 frames. ], batch size: 57, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 10:57:01,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=194300.0, ans=0.125 2023-11-18 10:57:08,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=194366.66666666666, ans=0.0 2023-11-18 10:57:17,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=194366.66666666666, ans=0.0 2023-11-18 10:57:25,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=194433.33333333334, ans=0.125 2023-11-18 10:57:28,723 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.59 vs. limit=22.5 2023-11-18 10:57:33,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=194500.0, ans=0.125 2023-11-18 10:57:44,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=194566.66666666666, ans=0.125 2023-11-18 10:57:46,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=194566.66666666666, ans=0.125 2023-11-18 10:57:52,357 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5150, loss[loss=0.1193, simple_loss=0.1388, pruned_loss=0.0388, audio_tagging_loss=0.01112, over 14859.00 frames. ], tot_loss[loss=0.1173, simple_loss=0.1273, pruned_loss=0.04152, audio_tagging_loss=0.01209, over 3036382.16 frames. ], batch size: 55, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 10:57:56,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=194633.33333333334, ans=0.125 2023-11-18 10:58:20,054 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.587e+01 9.533e+01 1.047e+02 1.145e+02 1.744e+02, threshold=2.094e+02, percent-clipped=0.0 2023-11-18 10:58:21,858 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.85 vs. limit=10.0 2023-11-18 10:58:26,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=194833.33333333334, ans=0.125 2023-11-18 10:58:28,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=194833.33333333334, ans=0.0 2023-11-18 10:58:40,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=194900.0, ans=0.125 2023-11-18 10:58:44,172 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.63 vs. limit=6.0 2023-11-18 10:58:48,376 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5200, loss[loss=0.1103, simple_loss=0.1186, pruned_loss=0.04058, audio_tagging_loss=0.01041, over 15750.00 frames. ], tot_loss[loss=0.1172, simple_loss=0.1273, pruned_loss=0.04145, audio_tagging_loss=0.01212, over 3031296.55 frames. ], batch size: 63, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 10:58:48,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=194966.66666666666, ans=0.0 2023-11-18 10:59:16,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=195100.0, ans=0.125 2023-11-18 10:59:20,614 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.29 vs. limit=22.5 2023-11-18 10:59:29,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=195166.66666666666, ans=0.125 2023-11-18 10:59:30,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=195166.66666666666, ans=0.0 2023-11-18 10:59:32,641 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:59:42,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=195233.33333333334, ans=0.0 2023-11-18 10:59:44,071 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5250, loss[loss=0.09893, simple_loss=0.1029, pruned_loss=0.03375, audio_tagging_loss=0.01374, over 16831.00 frames. ], tot_loss[loss=0.1184, simple_loss=0.1291, pruned_loss=0.04188, audio_tagging_loss=0.01201, over 3040738.64 frames. ], batch size: 66, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 10:59:49,808 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.26 vs. limit=15.0 2023-11-18 11:00:03,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=195366.66666666666, ans=0.0 2023-11-18 11:00:09,747 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.048e+01 9.706e+01 1.086e+02 1.165e+02 1.723e+02, threshold=2.171e+02, percent-clipped=0.0 2023-11-18 11:00:10,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=195433.33333333334, ans=0.0 2023-11-18 11:00:16,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=195500.0, ans=0.2 2023-11-18 11:00:19,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=195500.0, ans=0.125 2023-11-18 11:00:25,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=195500.0, ans=10.0 2023-11-18 11:00:26,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=195500.0, ans=0.05 2023-11-18 11:00:31,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=195566.66666666666, ans=0.125 2023-11-18 11:00:33,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=195566.66666666666, ans=0.0 2023-11-18 11:00:38,389 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.72 vs. limit=10.0 2023-11-18 11:00:38,726 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5300, loss[loss=0.1003, simple_loss=0.1102, pruned_loss=0.03288, audio_tagging_loss=0.01226, over 14094.00 frames. ], tot_loss[loss=0.1195, simple_loss=0.1304, pruned_loss=0.04244, audio_tagging_loss=0.01188, over 3043485.25 frames. ], batch size: 55, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 11:00:39,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=195633.33333333334, ans=0.0 2023-11-18 11:00:59,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=195766.66666666666, ans=0.02 2023-11-18 11:01:24,372 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.65 vs. limit=12.0 2023-11-18 11:01:28,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=195900.0, ans=0.125 2023-11-18 11:01:29,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=195900.0, ans=0.125 2023-11-18 11:01:33,819 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5350, loss[loss=0.1682, simple_loss=0.1924, pruned_loss=0.06392, audio_tagging_loss=0.008066, over 15314.00 frames. ], tot_loss[loss=0.1191, simple_loss=0.1302, pruned_loss=0.04218, audio_tagging_loss=0.01182, over 3042238.72 frames. ], batch size: 54, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:01:58,326 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.71 vs. limit=15.0 2023-11-18 11:02:00,850 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 9.740e+01 1.103e+02 1.236e+02 1.942e+02, threshold=2.206e+02, percent-clipped=0.0 2023-11-18 11:02:04,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=196100.0, ans=0.125 2023-11-18 11:02:13,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=196166.66666666666, ans=0.125 2023-11-18 11:02:16,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=196166.66666666666, ans=0.0 2023-11-18 11:02:17,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=196233.33333333334, ans=0.125 2023-11-18 11:02:30,251 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5400, loss[loss=0.1323, simple_loss=0.1539, pruned_loss=0.04673, audio_tagging_loss=0.008591, over 16394.00 frames. ], tot_loss[loss=0.1195, simple_loss=0.1307, pruned_loss=0.04238, audio_tagging_loss=0.0118, over 3049199.14 frames. ], batch size: 58, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:02:31,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=196300.0, ans=0.2 2023-11-18 11:02:34,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=196300.0, ans=0.2 2023-11-18 11:03:23,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=196566.66666666666, ans=0.125 2023-11-18 11:03:24,823 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5450, loss[loss=0.1109, simple_loss=0.1141, pruned_loss=0.04019, audio_tagging_loss=0.01364, over 15179.00 frames. ], tot_loss[loss=0.12, simple_loss=0.1312, pruned_loss=0.04254, audio_tagging_loss=0.01185, over 3051619.29 frames. ], batch size: 59, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:03:33,765 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2023-11-18 11:03:39,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=196700.0, ans=0.125 2023-11-18 11:03:41,850 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.15 vs. limit=15.0 2023-11-18 11:03:46,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=196766.66666666666, ans=0.025 2023-11-18 11:03:51,182 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 9.475e+01 1.043e+02 1.232e+02 1.692e+02, threshold=2.085e+02, percent-clipped=0.0 2023-11-18 11:04:07,941 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.32 vs. limit=15.0 2023-11-18 11:04:19,144 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5500, loss[loss=0.09006, simple_loss=0.09256, pruned_loss=0.0287, audio_tagging_loss=0.01508, over 15475.00 frames. ], tot_loss[loss=0.1198, simple_loss=0.1307, pruned_loss=0.04247, audio_tagging_loss=0.01201, over 3056774.78 frames. ], batch size: 61, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:04:36,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=197033.33333333334, ans=0.2 2023-11-18 11:04:58,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=197166.66666666666, ans=0.0 2023-11-18 11:05:03,493 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.68 vs. limit=22.5 2023-11-18 11:05:15,115 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5550, loss[loss=0.1455, simple_loss=0.1495, pruned_loss=0.05942, audio_tagging_loss=0.01138, over 16236.00 frames. ], tot_loss[loss=0.12, simple_loss=0.1307, pruned_loss=0.04252, audio_tagging_loss=0.01216, over 3057984.53 frames. ], batch size: 60, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:05:15,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=197300.0, ans=0.07 2023-11-18 11:05:19,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=197300.0, ans=0.125 2023-11-18 11:05:24,014 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=15.0 2023-11-18 11:05:24,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=197300.0, ans=0.125 2023-11-18 11:05:33,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=197366.66666666666, ans=0.125 2023-11-18 11:05:35,642 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.50 vs. limit=6.0 2023-11-18 11:05:41,238 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.765e+01 9.422e+01 1.021e+02 1.116e+02 1.524e+02, threshold=2.042e+02, percent-clipped=0.0 2023-11-18 11:05:42,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=197433.33333333334, ans=0.125 2023-11-18 11:05:49,864 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.24 vs. limit=15.0 2023-11-18 11:05:56,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=197500.0, ans=0.1 2023-11-18 11:06:01,525 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:06:03,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=197566.66666666666, ans=0.125 2023-11-18 11:06:10,798 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5600, loss[loss=0.1356, simple_loss=0.1453, pruned_loss=0.0513, audio_tagging_loss=0.0116, over 14903.00 frames. ], tot_loss[loss=0.1196, simple_loss=0.1304, pruned_loss=0.04207, audio_tagging_loss=0.01229, over 3066308.04 frames. ], batch size: 53, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:06:12,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=197633.33333333334, ans=0.5 2023-11-18 11:06:16,453 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2023-11-18 11:06:19,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=197633.33333333334, ans=15.0 2023-11-18 11:06:31,760 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2023-11-18 11:06:49,878 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:07:05,550 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5650, loss[loss=0.148, simple_loss=0.1565, pruned_loss=0.05607, audio_tagging_loss=0.01364, over 15262.00 frames. ], tot_loss[loss=0.1193, simple_loss=0.13, pruned_loss=0.04198, audio_tagging_loss=0.01232, over 3064503.32 frames. ], batch size: 56, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:07:12,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=197966.66666666666, ans=0.125 2023-11-18 11:07:28,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=198100.0, ans=0.0 2023-11-18 11:07:29,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=198100.0, ans=0.0 2023-11-18 11:07:30,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=198100.0, ans=0.125 2023-11-18 11:07:32,366 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.586e+01 9.472e+01 1.054e+02 1.179e+02 1.784e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 11:07:42,538 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.69 vs. limit=22.5 2023-11-18 11:07:47,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=198166.66666666666, ans=0.125 2023-11-18 11:08:01,316 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5700, loss[loss=0.1297, simple_loss=0.1471, pruned_loss=0.04663, audio_tagging_loss=0.009492, over 15854.00 frames. ], tot_loss[loss=0.1196, simple_loss=0.1305, pruned_loss=0.04213, audio_tagging_loss=0.01219, over 3062835.10 frames. ], batch size: 58, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:08:07,493 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.86 vs. limit=15.0 2023-11-18 11:08:26,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=198433.33333333334, ans=0.125 2023-11-18 11:08:36,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=198500.0, ans=0.0 2023-11-18 11:08:46,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=198566.66666666666, ans=0.1 2023-11-18 11:08:55,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=198633.33333333334, ans=0.125 2023-11-18 11:08:56,301 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5750, loss[loss=0.09897, simple_loss=0.1099, pruned_loss=0.03131, audio_tagging_loss=0.01269, over 16754.00 frames. ], tot_loss[loss=0.1183, simple_loss=0.1288, pruned_loss=0.04167, audio_tagging_loss=0.01223, over 3061778.55 frames. ], batch size: 62, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:09:06,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=198700.0, ans=0.0 2023-11-18 11:09:20,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=198766.66666666666, ans=0.2 2023-11-18 11:09:22,454 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.260e+01 9.937e+01 1.145e+02 1.295e+02 2.386e+02, threshold=2.290e+02, percent-clipped=2.0 2023-11-18 11:09:31,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=198833.33333333334, ans=0.05 2023-11-18 11:09:36,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=198833.33333333334, ans=0.1 2023-11-18 11:09:48,330 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.03 vs. limit=15.0 2023-11-18 11:09:50,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=198966.66666666666, ans=0.0 2023-11-18 11:09:50,851 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5800, loss[loss=0.1358, simple_loss=0.1623, pruned_loss=0.04674, audio_tagging_loss=0.007933, over 15694.00 frames. ], tot_loss[loss=0.1184, simple_loss=0.1288, pruned_loss=0.04186, audio_tagging_loss=0.0121, over 3054061.71 frames. ], batch size: 56, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:10:03,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=199033.33333333334, ans=0.0 2023-11-18 11:10:20,952 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.55 vs. limit=15.0 2023-11-18 11:10:29,259 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2023-11-18 11:10:30,195 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.52 vs. limit=12.0 2023-11-18 11:10:40,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=199233.33333333334, ans=0.125 2023-11-18 11:10:45,923 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5850, loss[loss=0.1116, simple_loss=0.1312, pruned_loss=0.03518, audio_tagging_loss=0.01083, over 15527.00 frames. ], tot_loss[loss=0.1184, simple_loss=0.1292, pruned_loss=0.0419, audio_tagging_loss=0.01194, over 3056898.05 frames. ], batch size: 56, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:10:57,262 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:11:00,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=199366.66666666666, ans=0.1 2023-11-18 11:11:12,884 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 9.939e+01 1.135e+02 1.295e+02 1.954e+02, threshold=2.270e+02, percent-clipped=0.0 2023-11-18 11:11:18,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=199500.0, ans=0.0 2023-11-18 11:11:28,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=199500.0, ans=0.125 2023-11-18 11:11:42,593 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5900, loss[loss=0.07609, simple_loss=0.07332, pruned_loss=0.02263, audio_tagging_loss=0.0168, over 14266.00 frames. ], tot_loss[loss=0.1181, simple_loss=0.1292, pruned_loss=0.04167, audio_tagging_loss=0.01188, over 3050597.58 frames. ], batch size: 55, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:11:50,288 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2023-11-18 11:11:51,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=199633.33333333334, ans=0.0 2023-11-18 11:11:59,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=199700.0, ans=0.125 2023-11-18 11:12:00,956 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.60 vs. limit=22.5 2023-11-18 11:12:14,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=199833.33333333334, ans=0.1 2023-11-18 11:12:19,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=199833.33333333334, ans=0.025 2023-11-18 11:12:28,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=199900.0, ans=0.0 2023-11-18 11:12:30,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=199900.0, ans=0.125 2023-11-18 11:12:36,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=199966.66666666666, ans=0.125 2023-11-18 11:12:36,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=199966.66666666666, ans=0.125 2023-11-18 11:12:37,046 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 5950, loss[loss=0.1038, simple_loss=0.1197, pruned_loss=0.03592, audio_tagging_loss=0.007991, over 14748.00 frames. ], tot_loss[loss=0.1178, simple_loss=0.1287, pruned_loss=0.04149, audio_tagging_loss=0.01202, over 3053802.66 frames. ], batch size: 55, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:12:39,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=199966.66666666666, ans=0.125 2023-11-18 11:13:03,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=200100.0, ans=0.0 2023-11-18 11:13:04,192 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 9.491e+01 1.040e+02 1.180e+02 1.802e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-18 11:13:25,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=200233.33333333334, ans=0.125 2023-11-18 11:13:25,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=200233.33333333334, ans=0.0 2023-11-18 11:13:32,553 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6000, loss[loss=0.1118, simple_loss=0.1332, pruned_loss=0.037, audio_tagging_loss=0.008254, over 14344.00 frames. ], tot_loss[loss=0.1189, simple_loss=0.1299, pruned_loss=0.04196, audio_tagging_loss=0.01201, over 3054331.44 frames. ], batch size: 55, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:13:32,554 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 11:13:56,927 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.4889, 4.0005, 3.5940, 2.7524], device='cuda:2') 2023-11-18 11:14:04,241 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([3.9604, 3.7184, 3.4083, 3.6821], device='cuda:2') 2023-11-18 11:14:05,630 INFO [train_asr.py:1147] (2/4) Epoch 3, validation: loss=0.08054, simple_loss=0.06533, pruned_loss=0.01225, audio_tagging_loss=0.03562, over 4681554.00 frames. 2023-11-18 11:14:05,630 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 11:14:09,149 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=15.0 2023-11-18 11:14:09,375 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.78 vs. limit=12.0 2023-11-18 11:14:12,327 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.51 vs. limit=10.0 2023-11-18 11:14:26,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=200433.33333333334, ans=0.0 2023-11-18 11:14:39,281 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.81 vs. limit=15.0 2023-11-18 11:14:45,157 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:14:48,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=200500.0, ans=0.0 2023-11-18 11:15:00,397 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6050, loss[loss=0.09805, simple_loss=0.1084, pruned_loss=0.02728, audio_tagging_loss=0.01656, over 15298.00 frames. ], tot_loss[loss=0.1171, simple_loss=0.1276, pruned_loss=0.04115, audio_tagging_loss=0.01215, over 3044238.89 frames. ], batch size: 58, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:15:22,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=200766.66666666666, ans=0.125 2023-11-18 11:15:26,820 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.04 vs. limit=22.5 2023-11-18 11:15:27,197 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 9.505e+01 1.054e+02 1.196e+02 1.657e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 11:15:31,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=200766.66666666666, ans=0.0 2023-11-18 11:15:41,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=200833.33333333334, ans=0.125 2023-11-18 11:15:45,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=200900.0, ans=0.2 2023-11-18 11:15:53,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=200900.0, ans=0.2 2023-11-18 11:15:55,688 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6100, loss[loss=0.1392, simple_loss=0.1587, pruned_loss=0.05098, audio_tagging_loss=0.008814, over 14323.00 frames. ], tot_loss[loss=0.1171, simple_loss=0.1273, pruned_loss=0.0412, audio_tagging_loss=0.0122, over 3039647.89 frames. ], batch size: 54, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:16:02,695 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.73 vs. limit=22.5 2023-11-18 11:16:21,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=201100.0, ans=0.125 2023-11-18 11:16:35,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=201166.66666666666, ans=0.0 2023-11-18 11:16:47,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=201233.33333333334, ans=0.025 2023-11-18 11:16:51,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=201300.0, ans=0.07 2023-11-18 11:16:51,859 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6150, loss[loss=0.1733, simple_loss=0.1783, pruned_loss=0.07348, audio_tagging_loss=0.01071, over 15354.00 frames. ], tot_loss[loss=0.117, simple_loss=0.1271, pruned_loss=0.04122, audio_tagging_loss=0.01222, over 3042671.23 frames. ], batch size: 57, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:17:13,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=201433.33333333334, ans=0.125 2023-11-18 11:17:14,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=201433.33333333334, ans=0.125 2023-11-18 11:17:17,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=201433.33333333334, ans=0.125 2023-11-18 11:17:18,532 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 9.824e+01 1.100e+02 1.227e+02 1.879e+02, threshold=2.200e+02, percent-clipped=0.0 2023-11-18 11:17:18,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=201433.33333333334, ans=0.125 2023-11-18 11:17:23,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=201433.33333333334, ans=0.1 2023-11-18 11:17:25,026 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.82 vs. limit=6.0 2023-11-18 11:17:40,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=201566.66666666666, ans=0.125 2023-11-18 11:17:47,682 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6200, loss[loss=0.09757, simple_loss=0.1048, pruned_loss=0.03135, audio_tagging_loss=0.01384, over 15517.00 frames. ], tot_loss[loss=0.1169, simple_loss=0.1271, pruned_loss=0.04114, audio_tagging_loss=0.01223, over 3047266.57 frames. ], batch size: 58, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:18:00,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=201700.0, ans=0.0 2023-11-18 11:18:05,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=201700.0, ans=0.125 2023-11-18 11:18:06,141 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.71 vs. limit=15.0 2023-11-18 11:18:13,091 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.77 vs. limit=15.0 2023-11-18 11:18:16,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=201766.66666666666, ans=0.125 2023-11-18 11:18:28,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=201833.33333333334, ans=0.0 2023-11-18 11:18:29,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=201833.33333333334, ans=0.2 2023-11-18 11:18:34,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=201900.0, ans=0.125 2023-11-18 11:18:43,323 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6250, loss[loss=0.108, simple_loss=0.1185, pruned_loss=0.03699, audio_tagging_loss=0.01176, over 15804.00 frames. ], tot_loss[loss=0.1162, simple_loss=0.1261, pruned_loss=0.0408, audio_tagging_loss=0.01235, over 3050535.66 frames. ], batch size: 59, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:18:50,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=201966.66666666666, ans=0.2 2023-11-18 11:18:55,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=202033.33333333334, ans=0.0 2023-11-18 11:18:56,581 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.59 vs. limit=12.0 2023-11-18 11:19:00,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=202033.33333333334, ans=0.1 2023-11-18 11:19:09,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=202100.0, ans=0.02 2023-11-18 11:19:10,110 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 9.428e+01 1.017e+02 1.154e+02 1.739e+02, threshold=2.034e+02, percent-clipped=0.0 2023-11-18 11:19:12,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=202100.0, ans=0.07 2023-11-18 11:19:14,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=202100.0, ans=0.125 2023-11-18 11:19:29,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=202233.33333333334, ans=0.1 2023-11-18 11:19:38,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=202300.0, ans=0.125 2023-11-18 11:19:39,074 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6300, loss[loss=0.1249, simple_loss=0.142, pruned_loss=0.0419, audio_tagging_loss=0.01197, over 15228.00 frames. ], tot_loss[loss=0.1167, simple_loss=0.1265, pruned_loss=0.04112, audio_tagging_loss=0.01235, over 3043402.02 frames. ], batch size: 57, lr: 2.05e-02, grad_scale: 32.0 2023-11-18 11:19:45,702 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=202300.0, ans=0.125 2023-11-18 11:19:59,320 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.58 vs. limit=15.0 2023-11-18 11:20:03,591 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.267e+00 2023-11-18 11:20:05,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=202433.33333333334, ans=0.125 2023-11-18 11:20:18,955 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.25 vs. limit=15.0 2023-11-18 11:20:34,518 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6350, loss[loss=0.09441, simple_loss=0.09749, pruned_loss=0.02963, audio_tagging_loss=0.01604, over 14652.00 frames. ], tot_loss[loss=0.1167, simple_loss=0.1266, pruned_loss=0.04104, audio_tagging_loss=0.01238, over 3038608.75 frames. ], batch size: 55, lr: 2.05e-02, grad_scale: 32.0 2023-11-18 11:20:34,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=202633.33333333334, ans=0.5 2023-11-18 11:20:38,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=202633.33333333334, ans=0.2 2023-11-18 11:20:50,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=202700.0, ans=0.125 2023-11-18 11:20:50,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=202700.0, ans=0.125 2023-11-18 11:20:58,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=202766.66666666666, ans=0.125 2023-11-18 11:21:01,567 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 9.831e+01 1.084e+02 1.220e+02 1.699e+02, threshold=2.169e+02, percent-clipped=0.0 2023-11-18 11:21:02,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=202766.66666666666, ans=0.1 2023-11-18 11:21:04,108 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.58 vs. limit=10.0 2023-11-18 11:21:06,428 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.13 vs. limit=10.0 2023-11-18 11:21:14,859 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2023-11-18 11:21:21,623 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.32 vs. limit=15.0 2023-11-18 11:21:22,852 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.64 vs. limit=12.0 2023-11-18 11:21:24,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=202900.0, ans=0.125 2023-11-18 11:21:29,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=202966.66666666666, ans=0.125 2023-11-18 11:21:29,926 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6400, loss[loss=0.1569, simple_loss=0.1772, pruned_loss=0.06096, audio_tagging_loss=0.0074, over 15977.00 frames. ], tot_loss[loss=0.1166, simple_loss=0.1263, pruned_loss=0.04103, audio_tagging_loss=0.01247, over 3037305.82 frames. ], batch size: 56, lr: 2.05e-02, grad_scale: 32.0 2023-11-18 11:21:41,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=203033.33333333334, ans=0.07 2023-11-18 11:21:52,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=203100.0, ans=0.0 2023-11-18 11:22:26,002 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6450, loss[loss=0.1024, simple_loss=0.1163, pruned_loss=0.03365, audio_tagging_loss=0.01058, over 14374.00 frames. ], tot_loss[loss=0.1169, simple_loss=0.1268, pruned_loss=0.0411, audio_tagging_loss=0.01239, over 3042417.17 frames. ], batch size: 55, lr: 2.05e-02, grad_scale: 32.0 2023-11-18 11:22:41,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=203366.66666666666, ans=0.2 2023-11-18 11:22:52,365 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 9.763e+01 1.082e+02 1.171e+02 1.453e+02, threshold=2.164e+02, percent-clipped=0.0 2023-11-18 11:23:15,038 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.71 vs. limit=22.5 2023-11-18 11:23:20,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=203633.33333333334, ans=0.1 2023-11-18 11:23:21,047 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6500, loss[loss=0.1221, simple_loss=0.1303, pruned_loss=0.04531, audio_tagging_loss=0.01162, over 14881.00 frames. ], tot_loss[loss=0.1157, simple_loss=0.1257, pruned_loss=0.04046, audio_tagging_loss=0.01236, over 3042402.18 frames. ], batch size: 55, lr: 2.05e-02, grad_scale: 64.0 2023-11-18 11:23:31,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=203700.0, ans=0.125 2023-11-18 11:23:31,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=203700.0, ans=0.0 2023-11-18 11:23:37,495 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2023-11-18 11:23:38,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=203700.0, ans=0.0 2023-11-18 11:23:42,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=203766.66666666666, ans=0.09899494936611666 2023-11-18 11:23:46,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=203766.66666666666, ans=0.125 2023-11-18 11:23:57,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=203833.33333333334, ans=0.125 2023-11-18 11:24:03,874 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2023-11-18 11:24:07,201 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.61 vs. limit=22.5 2023-11-18 11:24:08,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=203900.0, ans=0.0 2023-11-18 11:24:17,214 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6550, loss[loss=0.07706, simple_loss=0.09227, pruned_loss=0.02057, audio_tagging_loss=0.01036, over 15178.00 frames. ], tot_loss[loss=0.1163, simple_loss=0.1266, pruned_loss=0.04084, audio_tagging_loss=0.01214, over 3041594.67 frames. ], batch size: 56, lr: 2.05e-02, grad_scale: 64.0 2023-11-18 11:24:20,846 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.48 vs. limit=15.0 2023-11-18 11:24:43,887 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.972e+01 9.621e+01 1.067e+02 1.227e+02 1.729e+02, threshold=2.134e+02, percent-clipped=0.0 2023-11-18 11:24:49,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=204166.66666666666, ans=0.0 2023-11-18 11:24:51,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=204166.66666666666, ans=0.1 2023-11-18 11:24:56,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=204166.66666666666, ans=0.95 2023-11-18 11:25:13,364 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6600, loss[loss=0.1263, simple_loss=0.1397, pruned_loss=0.04889, audio_tagging_loss=0.00755, over 15268.00 frames. ], tot_loss[loss=0.117, simple_loss=0.1278, pruned_loss=0.04125, audio_tagging_loss=0.01181, over 3051324.27 frames. ], batch size: 57, lr: 2.04e-02, grad_scale: 64.0 2023-11-18 11:25:16,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=204300.0, ans=0.125 2023-11-18 11:25:18,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=204300.0, ans=0.0 2023-11-18 11:25:18,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=204300.0, ans=0.04949747468305833 2023-11-18 11:25:24,480 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.02 vs. limit=12.0 2023-11-18 11:25:27,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=204366.66666666666, ans=0.125 2023-11-18 11:25:41,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=204433.33333333334, ans=0.1 2023-11-18 11:25:50,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=204500.0, ans=0.125 2023-11-18 11:25:51,563 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.65 vs. limit=6.0 2023-11-18 11:25:54,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=204500.0, ans=0.125 2023-11-18 11:26:07,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=204633.33333333334, ans=0.1 2023-11-18 11:26:08,406 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6650, loss[loss=0.1387, simple_loss=0.1566, pruned_loss=0.04968, audio_tagging_loss=0.01068, over 14513.00 frames. ], tot_loss[loss=0.1174, simple_loss=0.1284, pruned_loss=0.04142, audio_tagging_loss=0.01171, over 3052100.59 frames. ], batch size: 53, lr: 2.04e-02, grad_scale: 64.0 2023-11-18 11:26:08,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=204633.33333333334, ans=0.125 2023-11-18 11:26:15,482 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.51 vs. limit=22.5 2023-11-18 11:26:33,611 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=15.0 2023-11-18 11:26:35,312 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.502e+01 9.475e+01 1.025e+02 1.163e+02 1.869e+02, threshold=2.050e+02, percent-clipped=0.0 2023-11-18 11:26:45,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=204833.33333333334, ans=0.0 2023-11-18 11:26:56,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=204900.0, ans=0.1 2023-11-18 11:26:58,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=204900.0, ans=0.0 2023-11-18 11:27:00,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=204900.0, ans=0.1 2023-11-18 11:27:00,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=204900.0, ans=0.125 2023-11-18 11:27:03,167 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6700, loss[loss=0.1113, simple_loss=0.129, pruned_loss=0.0354, audio_tagging_loss=0.01134, over 15763.00 frames. ], tot_loss[loss=0.1175, simple_loss=0.1284, pruned_loss=0.04149, audio_tagging_loss=0.01181, over 3053380.77 frames. ], batch size: 58, lr: 2.04e-02, grad_scale: 64.0 2023-11-18 11:27:07,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=204966.66666666666, ans=0.5 2023-11-18 11:27:18,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=205033.33333333334, ans=0.125 2023-11-18 11:27:33,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=205100.0, ans=0.125 2023-11-18 11:27:48,679 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2023-11-18 11:27:59,108 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6750, loss[loss=0.1155, simple_loss=0.1273, pruned_loss=0.04605, audio_tagging_loss=0.005802, over 16618.00 frames. ], tot_loss[loss=0.1163, simple_loss=0.1268, pruned_loss=0.04101, audio_tagging_loss=0.01189, over 3050339.65 frames. ], batch size: 63, lr: 2.04e-02, grad_scale: 64.0 2023-11-18 11:28:18,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=205366.66666666666, ans=0.035 2023-11-18 11:28:25,171 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 9.647e+01 1.086e+02 1.295e+02 2.076e+02, threshold=2.172e+02, percent-clipped=1.0 2023-11-18 11:28:38,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=205500.0, ans=0.1 2023-11-18 11:28:43,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=205566.66666666666, ans=0.0 2023-11-18 11:28:45,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=205566.66666666666, ans=0.1 2023-11-18 11:28:54,655 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6800, loss[loss=0.09955, simple_loss=0.1121, pruned_loss=0.03113, audio_tagging_loss=0.01235, over 15094.00 frames. ], tot_loss[loss=0.1164, simple_loss=0.1271, pruned_loss=0.04097, audio_tagging_loss=0.01189, over 3046051.71 frames. ], batch size: 56, lr: 2.04e-02, grad_scale: 32.0 2023-11-18 11:29:01,502 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.16 vs. limit=15.0 2023-11-18 11:29:06,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=205700.0, ans=0.0 2023-11-18 11:29:24,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=205766.66666666666, ans=0.125 2023-11-18 11:29:25,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=205766.66666666666, ans=0.0 2023-11-18 11:29:35,404 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.54 vs. limit=6.0 2023-11-18 11:29:40,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=205900.0, ans=0.2 2023-11-18 11:29:42,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=205900.0, ans=0.05 2023-11-18 11:29:47,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=205900.0, ans=0.125 2023-11-18 11:29:49,785 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6850, loss[loss=0.1175, simple_loss=0.1367, pruned_loss=0.03627, audio_tagging_loss=0.01284, over 14226.00 frames. ], tot_loss[loss=0.117, simple_loss=0.1278, pruned_loss=0.04114, audio_tagging_loss=0.01195, over 3048305.93 frames. ], batch size: 55, lr: 2.04e-02, grad_scale: 32.0 2023-11-18 11:30:08,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=206033.33333333334, ans=0.1 2023-11-18 11:30:13,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=206100.0, ans=0.125 2023-11-18 11:30:17,590 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 9.332e+01 1.055e+02 1.143e+02 1.752e+02, threshold=2.109e+02, percent-clipped=0.0 2023-11-18 11:30:30,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=206166.66666666666, ans=0.2 2023-11-18 11:30:35,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=206233.33333333334, ans=0.0 2023-11-18 11:30:38,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=206233.33333333334, ans=0.125 2023-11-18 11:30:45,648 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6900, loss[loss=0.1567, simple_loss=0.1711, pruned_loss=0.05761, audio_tagging_loss=0.0135, over 14910.00 frames. ], tot_loss[loss=0.1175, simple_loss=0.1285, pruned_loss=0.04134, audio_tagging_loss=0.01193, over 3053921.24 frames. ], batch size: 54, lr: 2.04e-02, grad_scale: 32.0 2023-11-18 11:31:09,798 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.44 vs. limit=22.5 2023-11-18 11:31:19,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=206500.0, ans=0.2 2023-11-18 11:31:26,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=206500.0, ans=0.1 2023-11-18 11:31:27,817 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:31:40,917 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 6950, loss[loss=0.1396, simple_loss=0.1522, pruned_loss=0.05252, audio_tagging_loss=0.01094, over 15872.00 frames. ], tot_loss[loss=0.1172, simple_loss=0.1282, pruned_loss=0.04119, audio_tagging_loss=0.01193, over 3057487.29 frames. ], batch size: 58, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:31:50,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=206700.0, ans=0.125 2023-11-18 11:32:08,295 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.140e+01 9.386e+01 1.046e+02 1.149e+02 1.697e+02, threshold=2.092e+02, percent-clipped=0.0 2023-11-18 11:32:12,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=206766.66666666666, ans=0.125 2023-11-18 11:32:35,216 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.41 vs. limit=12.0 2023-11-18 11:32:35,652 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7000, loss[loss=0.1162, simple_loss=0.1317, pruned_loss=0.03672, audio_tagging_loss=0.01358, over 17255.00 frames. ], tot_loss[loss=0.1178, simple_loss=0.129, pruned_loss=0.04136, audio_tagging_loss=0.012, over 3051920.21 frames. ], batch size: 65, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:32:38,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=206966.66666666666, ans=0.05 2023-11-18 11:32:44,720 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:32:52,036 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=15.0 2023-11-18 11:33:06,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=207100.0, ans=0.125 2023-11-18 11:33:10,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=207166.66666666666, ans=0.07 2023-11-18 11:33:23,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=207233.33333333334, ans=0.2 2023-11-18 11:33:31,931 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7050, loss[loss=0.1012, simple_loss=0.1238, pruned_loss=0.02898, audio_tagging_loss=0.01037, over 14789.00 frames. ], tot_loss[loss=0.1165, simple_loss=0.1272, pruned_loss=0.04074, audio_tagging_loss=0.01216, over 3047646.73 frames. ], batch size: 56, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:33:39,461 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.85 vs. limit=22.5 2023-11-18 11:33:43,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=207366.66666666666, ans=0.1 2023-11-18 11:33:44,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=207366.66666666666, ans=0.125 2023-11-18 11:33:45,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=207366.66666666666, ans=0.125 2023-11-18 11:33:58,911 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.932e+01 9.545e+01 1.049e+02 1.197e+02 1.734e+02, threshold=2.097e+02, percent-clipped=0.0 2023-11-18 11:34:08,250 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.39 vs. limit=15.0 2023-11-18 11:34:27,344 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7100, loss[loss=0.1352, simple_loss=0.1432, pruned_loss=0.05071, audio_tagging_loss=0.01294, over 14503.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1262, pruned_loss=0.04051, audio_tagging_loss=0.01234, over 3056722.06 frames. ], batch size: 55, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:34:40,612 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.07 vs. limit=15.0 2023-11-18 11:34:42,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=207700.0, ans=0.0 2023-11-18 11:34:58,662 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2023-11-18 11:35:09,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=207833.33333333334, ans=0.125 2023-11-18 11:35:11,668 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.78 vs. limit=22.5 2023-11-18 11:35:22,614 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7150, loss[loss=0.1047, simple_loss=0.1072, pruned_loss=0.03817, audio_tagging_loss=0.01292, over 15600.00 frames. ], tot_loss[loss=0.1167, simple_loss=0.1268, pruned_loss=0.04098, audio_tagging_loss=0.01235, over 3047097.88 frames. ], batch size: 60, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:35:36,788 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:35:38,147 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.90 vs. limit=15.0 2023-11-18 11:35:51,550 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 9.617e+01 1.079e+02 1.252e+02 1.872e+02, threshold=2.157e+02, percent-clipped=0.0 2023-11-18 11:35:52,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=208100.0, ans=0.125 2023-11-18 11:35:52,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=208100.0, ans=10.0 2023-11-18 11:36:12,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=208233.33333333334, ans=0.0 2023-11-18 11:36:16,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=208233.33333333334, ans=0.125 2023-11-18 11:36:19,131 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7200, loss[loss=0.1051, simple_loss=0.1164, pruned_loss=0.03815, audio_tagging_loss=0.008703, over 14678.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1258, pruned_loss=0.04054, audio_tagging_loss=0.0125, over 3040296.47 frames. ], batch size: 55, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:36:43,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=208433.33333333334, ans=0.0 2023-11-18 11:37:07,290 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.60 vs. limit=6.0 2023-11-18 11:37:15,130 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7250, loss[loss=0.1162, simple_loss=0.1297, pruned_loss=0.03995, audio_tagging_loss=0.01138, over 15283.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1259, pruned_loss=0.0405, audio_tagging_loss=0.01258, over 3035511.83 frames. ], batch size: 58, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:37:18,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=208633.33333333334, ans=0.125 2023-11-18 11:37:26,167 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.99 vs. limit=6.0 2023-11-18 11:37:28,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=208700.0, ans=0.2 2023-11-18 11:37:41,783 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 9.494e+01 1.040e+02 1.201e+02 1.952e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-18 11:37:42,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=208766.66666666666, ans=0.1 2023-11-18 11:38:06,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=208900.0, ans=0.95 2023-11-18 11:38:09,852 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7300, loss[loss=0.1269, simple_loss=0.1357, pruned_loss=0.04681, audio_tagging_loss=0.01226, over 15486.00 frames. ], tot_loss[loss=0.1178, simple_loss=0.1282, pruned_loss=0.04134, audio_tagging_loss=0.0124, over 3045723.44 frames. ], batch size: 59, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:38:13,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=208966.66666666666, ans=0.1 2023-11-18 11:38:20,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=209033.33333333334, ans=0.125 2023-11-18 11:38:23,919 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.34 vs. limit=15.0 2023-11-18 11:38:54,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=209233.33333333334, ans=0.2 2023-11-18 11:38:56,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=209233.33333333334, ans=0.035 2023-11-18 11:39:05,514 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7350, loss[loss=0.1066, simple_loss=0.1144, pruned_loss=0.03671, audio_tagging_loss=0.01274, over 15223.00 frames. ], tot_loss[loss=0.1171, simple_loss=0.1275, pruned_loss=0.04103, audio_tagging_loss=0.01225, over 3044478.75 frames. ], batch size: 56, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:39:16,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=209366.66666666666, ans=0.0 2023-11-18 11:39:19,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=209366.66666666666, ans=0.125 2023-11-18 11:39:19,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=209366.66666666666, ans=0.0 2023-11-18 11:39:33,663 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.784e+01 9.639e+01 1.055e+02 1.233e+02 1.941e+02, threshold=2.110e+02, percent-clipped=0.0 2023-11-18 11:39:41,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=209500.0, ans=0.125 2023-11-18 11:39:43,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=209500.0, ans=0.04949747468305833 2023-11-18 11:39:50,856 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.03 vs. limit=10.0 2023-11-18 11:39:53,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=209566.66666666666, ans=0.125 2023-11-18 11:39:54,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=209566.66666666666, ans=0.125 2023-11-18 11:39:58,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=209566.66666666666, ans=0.2 2023-11-18 11:40:01,551 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7400, loss[loss=0.114, simple_loss=0.1366, pruned_loss=0.03775, audio_tagging_loss=0.007978, over 15284.00 frames. ], tot_loss[loss=0.1163, simple_loss=0.1271, pruned_loss=0.04058, audio_tagging_loss=0.01218, over 3047567.64 frames. ], batch size: 57, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:40:06,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=209633.33333333334, ans=0.125 2023-11-18 11:40:22,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=209766.66666666666, ans=0.125 2023-11-18 11:40:27,309 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.51 vs. limit=22.5 2023-11-18 11:40:38,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=209833.33333333334, ans=0.125 2023-11-18 11:40:56,854 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7450, loss[loss=0.129, simple_loss=0.1414, pruned_loss=0.04798, audio_tagging_loss=0.01036, over 14910.00 frames. ], tot_loss[loss=0.1163, simple_loss=0.1272, pruned_loss=0.04052, audio_tagging_loss=0.01213, over 3048194.81 frames. ], batch size: 55, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:40:57,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=209966.66666666666, ans=0.0 2023-11-18 11:41:24,895 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.797e+01 9.734e+01 1.062e+02 1.217e+02 1.649e+02, threshold=2.124e+02, percent-clipped=0.0 2023-11-18 11:41:28,871 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.29 vs. limit=6.0 2023-11-18 11:41:45,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=210233.33333333334, ans=0.2 2023-11-18 11:41:45,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=210233.33333333334, ans=0.125 2023-11-18 11:41:45,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=210233.33333333334, ans=0.125 2023-11-18 11:41:52,347 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7500, loss[loss=0.1072, simple_loss=0.1272, pruned_loss=0.03576, audio_tagging_loss=0.007842, over 15357.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1268, pruned_loss=0.04057, audio_tagging_loss=0.01203, over 3040337.47 frames. ], batch size: 57, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:41:52,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=210300.0, ans=0.125 2023-11-18 11:42:05,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=210366.66666666666, ans=0.02 2023-11-18 11:42:05,538 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.27 vs. limit=15.0 2023-11-18 11:42:10,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=210366.66666666666, ans=0.125 2023-11-18 11:42:21,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=210433.33333333334, ans=0.1 2023-11-18 11:42:25,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=210500.0, ans=0.0 2023-11-18 11:42:29,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=210500.0, ans=0.5 2023-11-18 11:42:45,801 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.09 vs. limit=22.5 2023-11-18 11:42:48,181 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7550, loss[loss=0.111, simple_loss=0.1256, pruned_loss=0.0397, audio_tagging_loss=0.008503, over 15145.00 frames. ], tot_loss[loss=0.1166, simple_loss=0.1272, pruned_loss=0.04098, audio_tagging_loss=0.01201, over 3039931.37 frames. ], batch size: 58, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:42:57,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=210633.33333333334, ans=0.1 2023-11-18 11:42:59,637 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.84 vs. limit=15.0 2023-11-18 11:43:13,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=210766.66666666666, ans=0.0 2023-11-18 11:43:15,904 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 1.017e+02 1.103e+02 1.286e+02 2.062e+02, threshold=2.206e+02, percent-clipped=0.0 2023-11-18 11:43:35,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=210900.0, ans=0.0 2023-11-18 11:43:36,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=210900.0, ans=0.125 2023-11-18 11:43:36,706 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.74 vs. limit=15.0 2023-11-18 11:43:37,785 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.52 vs. limit=6.0 2023-11-18 11:43:43,333 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7600, loss[loss=0.1027, simple_loss=0.1037, pruned_loss=0.03785, audio_tagging_loss=0.01296, over 15641.00 frames. ], tot_loss[loss=0.1158, simple_loss=0.1262, pruned_loss=0.04065, audio_tagging_loss=0.01205, over 3047798.81 frames. ], batch size: 61, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:43:47,507 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.89 vs. limit=12.0 2023-11-18 11:44:07,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=211100.0, ans=0.125 2023-11-18 11:44:11,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=211100.0, ans=0.0 2023-11-18 11:44:12,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=211100.0, ans=0.125 2023-11-18 11:44:14,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=211100.0, ans=0.125 2023-11-18 11:44:23,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=211166.66666666666, ans=0.125 2023-11-18 11:44:26,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=211166.66666666666, ans=0.0 2023-11-18 11:44:39,625 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7650, loss[loss=0.143, simple_loss=0.1649, pruned_loss=0.05076, audio_tagging_loss=0.009754, over 16281.00 frames. ], tot_loss[loss=0.1165, simple_loss=0.1269, pruned_loss=0.04109, audio_tagging_loss=0.01195, over 3050422.70 frames. ], batch size: 58, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:44:39,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=211300.0, ans=0.125 2023-11-18 11:44:42,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=211300.0, ans=0.125 2023-11-18 11:44:54,004 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.96 vs. limit=22.5 2023-11-18 11:45:05,554 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.16 vs. limit=10.0 2023-11-18 11:45:07,183 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.141e+01 9.868e+01 1.071e+02 1.213e+02 1.962e+02, threshold=2.142e+02, percent-clipped=0.0 2023-11-18 11:45:24,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=211566.66666666666, ans=0.1 2023-11-18 11:45:25,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=211566.66666666666, ans=0.0 2023-11-18 11:45:35,751 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7700, loss[loss=0.1212, simple_loss=0.1279, pruned_loss=0.04386, audio_tagging_loss=0.01336, over 14515.00 frames. ], tot_loss[loss=0.1165, simple_loss=0.1272, pruned_loss=0.04093, audio_tagging_loss=0.012, over 3049685.87 frames. ], batch size: 56, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:45:37,449 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.24 vs. limit=15.0 2023-11-18 11:45:48,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=211700.0, ans=0.0 2023-11-18 11:45:50,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=211700.0, ans=0.1 2023-11-18 11:45:55,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=211700.0, ans=0.125 2023-11-18 11:46:07,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=211833.33333333334, ans=0.125 2023-11-18 11:46:16,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=211833.33333333334, ans=0.125 2023-11-18 11:46:21,547 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.48 vs. limit=6.0 2023-11-18 11:46:24,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=211900.0, ans=0.0 2023-11-18 11:46:25,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=211900.0, ans=0.0 2023-11-18 11:46:30,612 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7750, loss[loss=0.08641, simple_loss=0.09097, pruned_loss=0.02703, audio_tagging_loss=0.0139, over 13521.00 frames. ], tot_loss[loss=0.1162, simple_loss=0.1268, pruned_loss=0.04085, audio_tagging_loss=0.01198, over 3054479.12 frames. ], batch size: 52, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:46:53,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=212100.0, ans=0.95 2023-11-18 11:46:53,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=212100.0, ans=0.0 2023-11-18 11:46:59,434 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 9.456e+01 1.068e+02 1.204e+02 1.685e+02, threshold=2.136e+02, percent-clipped=0.0 2023-11-18 11:47:07,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=212166.66666666666, ans=0.1 2023-11-18 11:47:11,665 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.18 vs. limit=15.0 2023-11-18 11:47:15,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=212233.33333333334, ans=0.125 2023-11-18 11:47:19,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=212233.33333333334, ans=0.125 2023-11-18 11:47:26,653 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7800, loss[loss=0.1345, simple_loss=0.149, pruned_loss=0.04936, audio_tagging_loss=0.01068, over 15343.00 frames. ], tot_loss[loss=0.1165, simple_loss=0.1273, pruned_loss=0.04079, audio_tagging_loss=0.01207, over 3047228.38 frames. ], batch size: 56, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:47:32,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=212300.0, ans=0.125 2023-11-18 11:47:36,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=212366.66666666666, ans=0.0 2023-11-18 11:47:36,995 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2023-11-18 11:47:55,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=212433.33333333334, ans=22.5 2023-11-18 11:48:06,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=212500.0, ans=0.1 2023-11-18 11:48:22,977 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7850, loss[loss=0.09576, simple_loss=0.1053, pruned_loss=0.03171, audio_tagging_loss=0.0114, over 15268.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1264, pruned_loss=0.04052, audio_tagging_loss=0.01225, over 3043589.57 frames. ], batch size: 58, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:48:46,576 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2023-11-18 11:48:47,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=212766.66666666666, ans=10.0 2023-11-18 11:48:49,653 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.949e+01 1.011e+02 1.139e+02 1.309e+02 2.076e+02, threshold=2.278e+02, percent-clipped=0.0 2023-11-18 11:48:57,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=212833.33333333334, ans=0.125 2023-11-18 11:48:59,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=212833.33333333334, ans=0.125 2023-11-18 11:48:59,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=212833.33333333334, ans=0.2 2023-11-18 11:49:17,814 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7900, loss[loss=0.1189, simple_loss=0.1288, pruned_loss=0.04471, audio_tagging_loss=0.009773, over 15371.00 frames. ], tot_loss[loss=0.1157, simple_loss=0.1261, pruned_loss=0.04027, audio_tagging_loss=0.01233, over 3039651.24 frames. ], batch size: 59, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:49:19,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=212966.66666666666, ans=0.0 2023-11-18 11:49:51,570 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.68 vs. limit=22.5 2023-11-18 11:49:52,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=213166.66666666666, ans=0.0 2023-11-18 11:49:57,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=213166.66666666666, ans=0.125 2023-11-18 11:49:58,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=213166.66666666666, ans=0.125 2023-11-18 11:50:05,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=213233.33333333334, ans=0.125 2023-11-18 11:50:11,972 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 7950, loss[loss=0.1251, simple_loss=0.1315, pruned_loss=0.04749, audio_tagging_loss=0.01188, over 15108.00 frames. ], tot_loss[loss=0.1164, simple_loss=0.1269, pruned_loss=0.04061, audio_tagging_loss=0.01238, over 3044034.93 frames. ], batch size: 56, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:50:23,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=213300.0, ans=0.0 2023-11-18 11:50:28,588 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:50:34,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=213366.66666666666, ans=0.0 2023-11-18 11:50:42,604 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.234e+01 9.499e+01 1.075e+02 1.220e+02 1.746e+02, threshold=2.150e+02, percent-clipped=0.0 2023-11-18 11:51:03,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=213566.66666666666, ans=0.1 2023-11-18 11:51:11,123 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8000, loss[loss=0.1207, simple_loss=0.1312, pruned_loss=0.04064, audio_tagging_loss=0.01446, over 14941.00 frames. ], tot_loss[loss=0.1162, simple_loss=0.1263, pruned_loss=0.04046, audio_tagging_loss=0.01255, over 3040262.12 frames. ], batch size: 54, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:51:40,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=213766.66666666666, ans=0.0 2023-11-18 11:51:54,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=213900.0, ans=0.2 2023-11-18 11:52:05,881 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8050, loss[loss=0.1206, simple_loss=0.1252, pruned_loss=0.04574, audio_tagging_loss=0.01224, over 15815.00 frames. ], tot_loss[loss=0.1164, simple_loss=0.1264, pruned_loss=0.04066, audio_tagging_loss=0.0125, over 3041579.25 frames. ], batch size: 59, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:52:08,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=213966.66666666666, ans=0.2 2023-11-18 11:52:19,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=214033.33333333334, ans=0.1 2023-11-18 11:52:25,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=214033.33333333334, ans=0.125 2023-11-18 11:52:33,824 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.767e+01 9.594e+01 1.075e+02 1.227e+02 1.823e+02, threshold=2.150e+02, percent-clipped=0.0 2023-11-18 11:52:39,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=214166.66666666666, ans=0.0 2023-11-18 11:53:00,857 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8100, loss[loss=0.0951, simple_loss=0.1059, pruned_loss=0.03021, audio_tagging_loss=0.01196, over 17112.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1261, pruned_loss=0.04059, audio_tagging_loss=0.01233, over 3048343.19 frames. ], batch size: 65, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:53:24,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=214433.33333333334, ans=0.0 2023-11-18 11:53:29,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=214433.33333333334, ans=0.0 2023-11-18 11:53:34,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=214500.0, ans=0.2 2023-11-18 11:53:56,454 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.99 vs. limit=15.0 2023-11-18 11:53:56,997 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8150, loss[loss=0.1028, simple_loss=0.1226, pruned_loss=0.03001, audio_tagging_loss=0.01146, over 15321.00 frames. ], tot_loss[loss=0.1168, simple_loss=0.1277, pruned_loss=0.04103, audio_tagging_loss=0.01199, over 3051996.35 frames. ], batch size: 56, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:53:59,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=214633.33333333334, ans=0.1 2023-11-18 11:54:04,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=214633.33333333334, ans=0.0 2023-11-18 11:54:24,398 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 9.659e+01 1.081e+02 1.221e+02 1.815e+02, threshold=2.163e+02, percent-clipped=0.0 2023-11-18 11:54:32,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=214833.33333333334, ans=0.0 2023-11-18 11:54:42,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=214900.0, ans=0.125 2023-11-18 11:54:53,066 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8200, loss[loss=0.09886, simple_loss=0.1123, pruned_loss=0.03167, audio_tagging_loss=0.01104, over 14132.00 frames. ], tot_loss[loss=0.1158, simple_loss=0.1265, pruned_loss=0.04065, audio_tagging_loss=0.01195, over 3044345.69 frames. ], batch size: 54, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:54:53,090 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:54:55,340 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:55:01,014 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2023-11-18 11:55:32,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=215166.66666666666, ans=15.0 2023-11-18 11:55:37,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=215233.33333333334, ans=0.0 2023-11-18 11:55:39,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=215233.33333333334, ans=0.0 2023-11-18 11:55:42,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=215233.33333333334, ans=0.125 2023-11-18 11:55:43,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=215233.33333333334, ans=0.125 2023-11-18 11:55:47,995 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8250, loss[loss=0.1171, simple_loss=0.131, pruned_loss=0.04174, audio_tagging_loss=0.009853, over 14645.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1267, pruned_loss=0.04071, audio_tagging_loss=0.0119, over 3046774.43 frames. ], batch size: 55, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:56:14,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=215433.33333333334, ans=0.0 2023-11-18 11:56:16,461 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 9.340e+01 1.070e+02 1.193e+02 1.705e+02, threshold=2.140e+02, percent-clipped=0.0 2023-11-18 11:56:35,630 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2023-11-18 11:56:38,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=215566.66666666666, ans=0.125 2023-11-18 11:56:43,857 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8300, loss[loss=0.144, simple_loss=0.1644, pruned_loss=0.04969, audio_tagging_loss=0.01213, over 14163.00 frames. ], tot_loss[loss=0.1162, simple_loss=0.1272, pruned_loss=0.04076, audio_tagging_loss=0.01187, over 3051207.59 frames. ], batch size: 53, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:56:55,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=215700.0, ans=0.125 2023-11-18 11:57:05,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=215766.66666666666, ans=0.125 2023-11-18 11:57:12,498 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:57:22,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=215833.33333333334, ans=0.1 2023-11-18 11:57:39,800 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8350, loss[loss=0.1401, simple_loss=0.1486, pruned_loss=0.05272, audio_tagging_loss=0.01313, over 14645.00 frames. ], tot_loss[loss=0.1168, simple_loss=0.1278, pruned_loss=0.04112, audio_tagging_loss=0.01175, over 3051110.02 frames. ], batch size: 53, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:57:47,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=215966.66666666666, ans=0.125 2023-11-18 11:58:07,653 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 9.952e+01 1.113e+02 1.251e+02 3.254e+02, threshold=2.227e+02, percent-clipped=1.0 2023-11-18 11:58:07,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=216100.0, ans=0.125 2023-11-18 11:58:10,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=216100.0, ans=0.0 2023-11-18 11:58:17,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=216166.66666666666, ans=0.125 2023-11-18 11:58:19,532 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.43 vs. limit=15.0 2023-11-18 11:58:34,638 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2023-11-18 11:58:35,202 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8400, loss[loss=0.09095, simple_loss=0.09273, pruned_loss=0.02721, audio_tagging_loss=0.01737, over 16079.00 frames. ], tot_loss[loss=0.1158, simple_loss=0.1267, pruned_loss=0.04069, audio_tagging_loss=0.01183, over 3054684.45 frames. ], batch size: 62, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:58:53,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=216366.66666666666, ans=0.1 2023-11-18 11:58:54,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=216366.66666666666, ans=0.0 2023-11-18 11:58:55,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=216366.66666666666, ans=0.1 2023-11-18 11:58:59,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=216433.33333333334, ans=0.1 2023-11-18 11:59:01,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=216433.33333333334, ans=0.2 2023-11-18 11:59:30,808 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8450, loss[loss=0.09838, simple_loss=0.1038, pruned_loss=0.02923, audio_tagging_loss=0.01723, over 14454.00 frames. ], tot_loss[loss=0.1152, simple_loss=0.1257, pruned_loss=0.04036, audio_tagging_loss=0.01195, over 3055021.05 frames. ], batch size: 56, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:59:47,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=216700.0, ans=0.5 2023-11-18 11:59:58,324 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 9.450e+01 1.064e+02 1.181e+02 2.171e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-18 12:00:07,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=216833.33333333334, ans=0.125 2023-11-18 12:00:20,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=216900.0, ans=0.0 2023-11-18 12:00:20,329 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.70 vs. limit=15.0 2023-11-18 12:00:23,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=216900.0, ans=0.1 2023-11-18 12:00:26,218 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8500, loss[loss=0.119, simple_loss=0.1318, pruned_loss=0.04169, audio_tagging_loss=0.0114, over 15244.00 frames. ], tot_loss[loss=0.1154, simple_loss=0.1257, pruned_loss=0.04047, audio_tagging_loss=0.01208, over 3050161.41 frames. ], batch size: 57, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 12:00:45,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=217033.33333333334, ans=0.125 2023-11-18 12:00:46,198 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.00 vs. limit=22.5 2023-11-18 12:00:51,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=217100.0, ans=0.125 2023-11-18 12:01:04,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=217166.66666666666, ans=0.125 2023-11-18 12:01:07,017 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.24 vs. limit=15.0 2023-11-18 12:01:18,937 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.90 vs. limit=15.0 2023-11-18 12:01:21,578 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8550, loss[loss=0.1162, simple_loss=0.1261, pruned_loss=0.04302, audio_tagging_loss=0.01009, over 15717.00 frames. ], tot_loss[loss=0.1166, simple_loss=0.1274, pruned_loss=0.04104, audio_tagging_loss=0.01192, over 3047164.74 frames. ], batch size: 60, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 12:01:21,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=217300.0, ans=0.0 2023-11-18 12:01:24,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=217300.0, ans=0.09899494936611666 2023-11-18 12:01:26,321 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.85 vs. limit=6.0 2023-11-18 12:01:29,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=217300.0, ans=0.0 2023-11-18 12:01:31,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=217300.0, ans=0.125 2023-11-18 12:01:49,699 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 1.000e+02 1.079e+02 1.274e+02 1.597e+02, threshold=2.158e+02, percent-clipped=0.0 2023-11-18 12:01:49,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=217433.33333333334, ans=0.2 2023-11-18 12:01:54,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=217500.0, ans=0.125 2023-11-18 12:01:55,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=217500.0, ans=0.0 2023-11-18 12:02:17,139 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8600, loss[loss=0.1196, simple_loss=0.1396, pruned_loss=0.03956, audio_tagging_loss=0.01025, over 15379.00 frames. ], tot_loss[loss=0.1164, simple_loss=0.1269, pruned_loss=0.04084, audio_tagging_loss=0.01205, over 3049357.31 frames. ], batch size: 57, lr: 1.98e-02, grad_scale: 32.0 2023-11-18 12:03:00,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=217900.0, ans=0.0 2023-11-18 12:03:02,792 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=15.0 2023-11-18 12:03:13,310 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8650, loss[loss=0.1061, simple_loss=0.1161, pruned_loss=0.03617, audio_tagging_loss=0.01192, over 14719.00 frames. ], tot_loss[loss=0.1174, simple_loss=0.1281, pruned_loss=0.04122, audio_tagging_loss=0.01209, over 3057220.02 frames. ], batch size: 57, lr: 1.98e-02, grad_scale: 32.0 2023-11-18 12:03:40,755 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 9.473e+01 1.061e+02 1.180e+02 2.111e+02, threshold=2.123e+02, percent-clipped=0.0 2023-11-18 12:04:00,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=218233.33333333334, ans=0.2 2023-11-18 12:04:08,811 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8700, loss[loss=0.09581, simple_loss=0.1068, pruned_loss=0.03239, audio_tagging_loss=0.01001, over 15132.00 frames. ], tot_loss[loss=0.1166, simple_loss=0.1271, pruned_loss=0.04072, audio_tagging_loss=0.01229, over 3054426.82 frames. ], batch size: 57, lr: 1.98e-02, grad_scale: 32.0 2023-11-18 12:04:30,422 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.60 vs. limit=15.0 2023-11-18 12:04:42,387 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=7.830e-03 2023-11-18 12:04:50,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=218500.0, ans=0.125 2023-11-18 12:04:52,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=218566.66666666666, ans=0.0 2023-11-18 12:04:57,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=218566.66666666666, ans=0.035 2023-11-18 12:04:58,176 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.64 vs. limit=15.0 2023-11-18 12:05:04,350 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8750, loss[loss=0.1429, simple_loss=0.1487, pruned_loss=0.05552, audio_tagging_loss=0.01307, over 15166.00 frames. ], tot_loss[loss=0.1179, simple_loss=0.1287, pruned_loss=0.04142, audio_tagging_loss=0.01218, over 3049713.33 frames. ], batch size: 57, lr: 1.98e-02, grad_scale: 32.0 2023-11-18 12:05:06,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=218633.33333333334, ans=0.0 2023-11-18 12:05:32,026 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 9.349e+01 1.069e+02 1.188e+02 1.662e+02, threshold=2.138e+02, percent-clipped=0.0 2023-11-18 12:06:00,708 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8800, loss[loss=0.1189, simple_loss=0.1268, pruned_loss=0.04117, audio_tagging_loss=0.01435, over 14883.00 frames. ], tot_loss[loss=0.1187, simple_loss=0.1296, pruned_loss=0.04162, audio_tagging_loss=0.01231, over 3056108.28 frames. ], batch size: 56, lr: 1.98e-02, grad_scale: 64.0 2023-11-18 12:06:08,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=218966.66666666666, ans=0.2 2023-11-18 12:06:18,063 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=22.5 2023-11-18 12:06:18,234 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=12.0 2023-11-18 12:06:26,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=219100.0, ans=0.125 2023-11-18 12:06:28,110 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.45 vs. limit=15.0 2023-11-18 12:06:37,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=219166.66666666666, ans=0.125 2023-11-18 12:06:37,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=219166.66666666666, ans=0.0 2023-11-18 12:06:45,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=219233.33333333334, ans=0.125 2023-11-18 12:06:48,888 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2023-11-18 12:06:55,584 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8850, loss[loss=0.1009, simple_loss=0.0984, pruned_loss=0.0335, audio_tagging_loss=0.01816, over 15153.00 frames. ], tot_loss[loss=0.119, simple_loss=0.1302, pruned_loss=0.04163, audio_tagging_loss=0.01228, over 3058581.10 frames. ], batch size: 57, lr: 1.98e-02, grad_scale: 64.0 2023-11-18 12:07:05,662 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:07:24,030 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 9.378e+01 1.055e+02 1.190e+02 1.653e+02, threshold=2.110e+02, percent-clipped=0.0 2023-11-18 12:07:36,261 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2023-11-18 12:07:36,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=219500.0, ans=0.0 2023-11-18 12:07:44,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=219566.66666666666, ans=0.2 2023-11-18 12:07:50,927 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8900, loss[loss=0.1107, simple_loss=0.122, pruned_loss=0.03617, audio_tagging_loss=0.01356, over 14761.00 frames. ], tot_loss[loss=0.1185, simple_loss=0.1299, pruned_loss=0.04145, audio_tagging_loss=0.01206, over 3052367.50 frames. ], batch size: 57, lr: 1.98e-02, grad_scale: 64.0 2023-11-18 12:08:05,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=219700.0, ans=0.1 2023-11-18 12:08:13,277 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.97 vs. limit=15.0 2023-11-18 12:08:18,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=219766.66666666666, ans=0.0 2023-11-18 12:08:19,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=219766.66666666666, ans=0.0 2023-11-18 12:08:24,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=219833.33333333334, ans=0.125 2023-11-18 12:08:34,629 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.42 vs. limit=22.5 2023-11-18 12:08:47,587 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 8950, loss[loss=0.0843, simple_loss=0.08376, pruned_loss=0.02444, audio_tagging_loss=0.01798, over 14700.00 frames. ], tot_loss[loss=0.1175, simple_loss=0.1289, pruned_loss=0.04122, audio_tagging_loss=0.01185, over 3052372.97 frames. ], batch size: 58, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:09:00,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=220033.33333333334, ans=0.0 2023-11-18 12:09:06,171 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.70 vs. limit=6.0 2023-11-18 12:09:14,044 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.540e+01 9.669e+01 1.057e+02 1.154e+02 1.635e+02, threshold=2.114e+02, percent-clipped=0.0 2023-11-18 12:09:17,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=220100.0, ans=0.125 2023-11-18 12:09:20,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=220166.66666666666, ans=0.125 2023-11-18 12:09:22,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=220166.66666666666, ans=0.125 2023-11-18 12:09:22,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=220166.66666666666, ans=0.2 2023-11-18 12:09:41,990 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9000, loss[loss=0.1366, simple_loss=0.1574, pruned_loss=0.04571, audio_tagging_loss=0.01214, over 15651.00 frames. ], tot_loss[loss=0.1176, simple_loss=0.129, pruned_loss=0.04131, audio_tagging_loss=0.01175, over 3050752.35 frames. ], batch size: 55, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:09:41,991 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 12:10:14,815 INFO [train_asr.py:1147] (2/4) Epoch 3, validation: loss=0.07901, simple_loss=0.06429, pruned_loss=0.01152, audio_tagging_loss=0.03534, over 4681554.00 frames. 2023-11-18 12:10:14,816 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 12:10:17,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=220300.0, ans=0.2 2023-11-18 12:10:32,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=220366.66666666666, ans=0.125 2023-11-18 12:10:34,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=220366.66666666666, ans=0.125 2023-11-18 12:11:01,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=220566.66666666666, ans=0.0 2023-11-18 12:11:01,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=220566.66666666666, ans=0.2 2023-11-18 12:11:06,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=220566.66666666666, ans=0.025 2023-11-18 12:11:06,908 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.69 vs. limit=6.0 2023-11-18 12:11:07,923 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.58 vs. limit=10.0 2023-11-18 12:11:09,373 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9050, loss[loss=0.1044, simple_loss=0.1108, pruned_loss=0.03432, audio_tagging_loss=0.01471, over 15606.00 frames. ], tot_loss[loss=0.1167, simple_loss=0.128, pruned_loss=0.04094, audio_tagging_loss=0.01175, over 3049168.97 frames. ], batch size: 59, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:11:16,218 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.49 vs. limit=15.0 2023-11-18 12:11:21,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=220700.0, ans=0.05 2023-11-18 12:11:23,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=220700.0, ans=0.1 2023-11-18 12:11:26,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=220700.0, ans=0.1 2023-11-18 12:11:36,243 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 9.522e+01 1.061e+02 1.198e+02 2.427e+02, threshold=2.123e+02, percent-clipped=1.0 2023-11-18 12:11:38,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=220766.66666666666, ans=0.125 2023-11-18 12:11:47,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=220833.33333333334, ans=0.07 2023-11-18 12:11:48,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=220833.33333333334, ans=0.125 2023-11-18 12:11:55,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=220900.0, ans=0.0 2023-11-18 12:12:04,410 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9100, loss[loss=0.1167, simple_loss=0.1267, pruned_loss=0.04016, audio_tagging_loss=0.01318, over 14342.00 frames. ], tot_loss[loss=0.1166, simple_loss=0.128, pruned_loss=0.04085, audio_tagging_loss=0.01178, over 3045834.67 frames. ], batch size: 55, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:12:14,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=221033.33333333334, ans=0.125 2023-11-18 12:12:17,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=221033.33333333334, ans=0.0 2023-11-18 12:12:18,350 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-11-18 12:12:33,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=221100.0, ans=0.2 2023-11-18 12:12:42,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=221166.66666666666, ans=0.1 2023-11-18 12:13:00,116 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9150, loss[loss=0.1061, simple_loss=0.1137, pruned_loss=0.03625, audio_tagging_loss=0.01294, over 15184.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.1268, pruned_loss=0.04012, audio_tagging_loss=0.0118, over 3053523.14 frames. ], batch size: 58, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:13:11,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=221366.66666666666, ans=0.2 2023-11-18 12:13:22,653 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.38 vs. limit=15.0 2023-11-18 12:13:28,252 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.044e+01 9.509e+01 1.025e+02 1.123e+02 1.698e+02, threshold=2.050e+02, percent-clipped=0.0 2023-11-18 12:13:57,067 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9200, loss[loss=0.1159, simple_loss=0.1239, pruned_loss=0.04087, audio_tagging_loss=0.01307, over 15223.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1275, pruned_loss=0.0405, audio_tagging_loss=0.01169, over 3048373.21 frames. ], batch size: 58, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:14:01,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=221633.33333333334, ans=0.0 2023-11-18 12:14:04,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=221633.33333333334, ans=0.125 2023-11-18 12:14:12,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=221700.0, ans=0.125 2023-11-18 12:14:15,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=221700.0, ans=0.1 2023-11-18 12:14:20,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=221766.66666666666, ans=0.5 2023-11-18 12:14:30,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=221833.33333333334, ans=0.1 2023-11-18 12:14:41,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=221900.0, ans=0.125 2023-11-18 12:14:44,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=221900.0, ans=0.1 2023-11-18 12:14:47,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=221900.0, ans=0.125 2023-11-18 12:14:51,677 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9250, loss[loss=0.09352, simple_loss=0.1003, pruned_loss=0.03165, audio_tagging_loss=0.01172, over 14487.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1277, pruned_loss=0.04043, audio_tagging_loss=0.01168, over 3054355.48 frames. ], batch size: 53, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:15:20,387 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.922e+01 9.718e+01 1.095e+02 1.245e+02 2.428e+02, threshold=2.190e+02, percent-clipped=1.0 2023-11-18 12:15:33,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=222166.66666666666, ans=0.0 2023-11-18 12:15:38,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=222233.33333333334, ans=0.0 2023-11-18 12:15:39,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=222233.33333333334, ans=0.1 2023-11-18 12:15:46,837 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9300, loss[loss=0.1103, simple_loss=0.1113, pruned_loss=0.04213, audio_tagging_loss=0.01251, over 14782.00 frames. ], tot_loss[loss=0.1163, simple_loss=0.1278, pruned_loss=0.04055, audio_tagging_loss=0.01184, over 3056492.47 frames. ], batch size: 55, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:15:49,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=222300.0, ans=0.0 2023-11-18 12:15:59,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=222366.66666666666, ans=0.125 2023-11-18 12:15:59,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=222366.66666666666, ans=0.125 2023-11-18 12:16:05,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=222366.66666666666, ans=0.0 2023-11-18 12:16:23,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=222500.0, ans=0.1 2023-11-18 12:16:43,427 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9350, loss[loss=0.1468, simple_loss=0.1669, pruned_loss=0.05367, audio_tagging_loss=0.009617, over 15770.00 frames. ], tot_loss[loss=0.1163, simple_loss=0.1277, pruned_loss=0.04049, audio_tagging_loss=0.01195, over 3049489.36 frames. ], batch size: 56, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:16:59,440 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-11-18 12:17:10,516 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.296e+01 9.866e+01 1.128e+02 1.276e+02 1.788e+02, threshold=2.257e+02, percent-clipped=0.0 2023-11-18 12:17:31,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=222900.0, ans=0.125 2023-11-18 12:17:39,425 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9400, loss[loss=0.09959, simple_loss=0.1129, pruned_loss=0.03185, audio_tagging_loss=0.01129, over 14514.00 frames. ], tot_loss[loss=0.1154, simple_loss=0.1263, pruned_loss=0.04018, audio_tagging_loss=0.01202, over 3046465.36 frames. ], batch size: 54, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:17:41,803 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:17:49,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=223033.33333333334, ans=0.0 2023-11-18 12:17:49,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=223033.33333333334, ans=0.0 2023-11-18 12:18:13,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=223166.66666666666, ans=0.0 2023-11-18 12:18:32,265 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:18:32,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=223233.33333333334, ans=0.125 2023-11-18 12:18:33,015 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.14 vs. limit=10.0 2023-11-18 12:18:34,372 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9450, loss[loss=0.1251, simple_loss=0.1367, pruned_loss=0.04499, audio_tagging_loss=0.0118, over 15590.00 frames. ], tot_loss[loss=0.1149, simple_loss=0.1254, pruned_loss=0.03996, audio_tagging_loss=0.01223, over 3043012.00 frames. ], batch size: 56, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:18:50,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=223366.66666666666, ans=0.1 2023-11-18 12:18:55,570 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=15.0 2023-11-18 12:19:03,041 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 9.898e+01 1.046e+02 1.175e+02 1.737e+02, threshold=2.092e+02, percent-clipped=0.0 2023-11-18 12:19:17,536 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.83 vs. limit=22.5 2023-11-18 12:19:28,192 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.33 vs. limit=22.5 2023-11-18 12:19:31,343 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9500, loss[loss=0.1191, simple_loss=0.1195, pruned_loss=0.04311, audio_tagging_loss=0.01628, over 14589.00 frames. ], tot_loss[loss=0.1149, simple_loss=0.125, pruned_loss=0.03991, audio_tagging_loss=0.0125, over 3040872.80 frames. ], batch size: 54, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:19:42,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=223700.0, ans=0.09899494936611666 2023-11-18 12:19:50,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=223700.0, ans=0.125 2023-11-18 12:19:52,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=223766.66666666666, ans=0.0 2023-11-18 12:20:06,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=223833.33333333334, ans=0.125 2023-11-18 12:20:14,773 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.17 vs. limit=22.5 2023-11-18 12:20:27,293 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9550, loss[loss=0.1203, simple_loss=0.1412, pruned_loss=0.04347, audio_tagging_loss=0.006209, over 15021.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.1256, pruned_loss=0.03997, audio_tagging_loss=0.01254, over 3038516.82 frames. ], batch size: 55, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:20:38,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=224033.33333333334, ans=0.125 2023-11-18 12:20:41,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=224033.33333333334, ans=0.0 2023-11-18 12:20:55,688 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 1.006e+02 1.117e+02 1.248e+02 1.898e+02, threshold=2.233e+02, percent-clipped=0.0 2023-11-18 12:21:05,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=224166.66666666666, ans=0.0 2023-11-18 12:21:22,533 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9600, loss[loss=0.1244, simple_loss=0.1353, pruned_loss=0.04491, audio_tagging_loss=0.01187, over 14927.00 frames. ], tot_loss[loss=0.1169, simple_loss=0.1273, pruned_loss=0.04057, audio_tagging_loss=0.01263, over 3044468.97 frames. ], batch size: 54, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:21:28,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=224300.0, ans=0.0 2023-11-18 12:21:32,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=224300.0, ans=0.125 2023-11-18 12:21:36,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=224366.66666666666, ans=0.0 2023-11-18 12:21:36,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=224366.66666666666, ans=0.125 2023-11-18 12:21:42,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=224366.66666666666, ans=0.2 2023-11-18 12:22:12,773 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=10.48 vs. limit=12.0 2023-11-18 12:22:16,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=224566.66666666666, ans=0.1 2023-11-18 12:22:18,507 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9650, loss[loss=0.1113, simple_loss=0.1298, pruned_loss=0.03444, audio_tagging_loss=0.01195, over 15049.00 frames. ], tot_loss[loss=0.1164, simple_loss=0.127, pruned_loss=0.04046, audio_tagging_loss=0.01245, over 3048634.26 frames. ], batch size: 57, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:22:29,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=224700.0, ans=0.125 2023-11-18 12:22:35,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=224700.0, ans=0.2 2023-11-18 12:22:45,985 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 9.215e+01 1.048e+02 1.170e+02 1.955e+02, threshold=2.095e+02, percent-clipped=0.0 2023-11-18 12:23:01,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=224833.33333333334, ans=0.125 2023-11-18 12:23:03,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=224900.0, ans=0.125 2023-11-18 12:23:14,150 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9700, loss[loss=0.114, simple_loss=0.1366, pruned_loss=0.03565, audio_tagging_loss=0.01007, over 14969.00 frames. ], tot_loss[loss=0.1167, simple_loss=0.1277, pruned_loss=0.04074, audio_tagging_loss=0.01216, over 3047180.19 frames. ], batch size: 56, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:23:33,516 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.91 vs. limit=15.0 2023-11-18 12:24:09,683 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9750, loss[loss=0.108, simple_loss=0.1206, pruned_loss=0.03509, audio_tagging_loss=0.01256, over 14928.00 frames. ], tot_loss[loss=0.1164, simple_loss=0.1273, pruned_loss=0.04062, audio_tagging_loss=0.01211, over 3044522.19 frames. ], batch size: 56, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:24:12,494 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.86 vs. limit=6.0 2023-11-18 12:24:19,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=225300.0, ans=0.0 2023-11-18 12:24:29,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=225366.66666666666, ans=0.0 2023-11-18 12:24:38,100 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 9.876e+01 1.096e+02 1.307e+02 1.863e+02, threshold=2.192e+02, percent-clipped=0.0 2023-11-18 12:24:39,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=225433.33333333334, ans=0.125 2023-11-18 12:25:06,225 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9800, loss[loss=0.1194, simple_loss=0.1298, pruned_loss=0.04281, audio_tagging_loss=0.01171, over 15262.00 frames. ], tot_loss[loss=0.1161, simple_loss=0.1269, pruned_loss=0.04057, audio_tagging_loss=0.01203, over 3046856.92 frames. ], batch size: 57, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:25:11,509 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=12.0 2023-11-18 12:25:12,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=225633.33333333334, ans=0.2 2023-11-18 12:25:34,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=225766.66666666666, ans=0.2 2023-11-18 12:25:55,526 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:26:01,872 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9850, loss[loss=0.116, simple_loss=0.1295, pruned_loss=0.0383, audio_tagging_loss=0.01289, over 15538.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.1263, pruned_loss=0.04046, audio_tagging_loss=0.01196, over 3043865.06 frames. ], batch size: 55, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:26:10,906 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.07 vs. limit=22.5 2023-11-18 12:26:14,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=226033.33333333334, ans=0.125 2023-11-18 12:26:15,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=226033.33333333334, ans=0.125 2023-11-18 12:26:24,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=226100.0, ans=0.125 2023-11-18 12:26:25,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=226100.0, ans=0.0 2023-11-18 12:26:26,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=226100.0, ans=0.0 2023-11-18 12:26:29,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=226100.0, ans=0.1 2023-11-18 12:26:30,002 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.673e+01 9.358e+01 1.054e+02 1.143e+02 1.553e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 12:26:31,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=226100.0, ans=0.125 2023-11-18 12:26:31,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=226100.0, ans=0.125 2023-11-18 12:26:50,878 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:26:53,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=226233.33333333334, ans=0.125 2023-11-18 12:26:57,546 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9900, loss[loss=0.1344, simple_loss=0.15, pruned_loss=0.0488, audio_tagging_loss=0.01058, over 15447.00 frames. ], tot_loss[loss=0.1146, simple_loss=0.1256, pruned_loss=0.03989, audio_tagging_loss=0.01193, over 3045769.15 frames. ], batch size: 56, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:27:09,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=226366.66666666666, ans=0.05 2023-11-18 12:27:28,134 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2023-11-18 12:27:47,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=226566.66666666666, ans=0.0 2023-11-18 12:27:53,591 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 9950, loss[loss=0.09346, simple_loss=0.1038, pruned_loss=0.02899, audio_tagging_loss=0.01258, over 13461.00 frames. ], tot_loss[loss=0.1149, simple_loss=0.126, pruned_loss=0.03996, audio_tagging_loss=0.01192, over 3043929.76 frames. ], batch size: 52, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:28:02,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=226633.33333333334, ans=0.125 2023-11-18 12:28:14,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=226766.66666666666, ans=0.0 2023-11-18 12:28:17,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=226766.66666666666, ans=0.2 2023-11-18 12:28:20,833 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.751e+01 9.736e+01 1.077e+02 1.172e+02 1.969e+02, threshold=2.153e+02, percent-clipped=0.0 2023-11-18 12:28:49,502 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10000, loss[loss=0.1058, simple_loss=0.1149, pruned_loss=0.03598, audio_tagging_loss=0.01237, over 14580.00 frames. ], tot_loss[loss=0.1135, simple_loss=0.1245, pruned_loss=0.03939, audio_tagging_loss=0.01189, over 3046526.92 frames. ], batch size: 54, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:28:50,953 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.97 vs. limit=12.0 2023-11-18 12:28:59,837 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.15 vs. limit=22.5 2023-11-18 12:29:44,577 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10050, loss[loss=0.1138, simple_loss=0.1277, pruned_loss=0.03967, audio_tagging_loss=0.01027, over 15065.00 frames. ], tot_loss[loss=0.1133, simple_loss=0.124, pruned_loss=0.03921, audio_tagging_loss=0.01205, over 3052455.81 frames. ], batch size: 56, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:30:01,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=227366.66666666666, ans=0.125 2023-11-18 12:30:02,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=227366.66666666666, ans=0.0 2023-11-18 12:30:13,283 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 9.408e+01 1.046e+02 1.122e+02 1.934e+02, threshold=2.091e+02, percent-clipped=0.0 2023-11-18 12:30:19,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=227500.0, ans=0.125 2023-11-18 12:30:23,404 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.59 vs. limit=15.0 2023-11-18 12:30:24,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=227500.0, ans=0.1 2023-11-18 12:30:24,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=227500.0, ans=0.125 2023-11-18 12:30:41,429 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10100, loss[loss=0.1008, simple_loss=0.1075, pruned_loss=0.03579, audio_tagging_loss=0.01124, over 15617.00 frames. ], tot_loss[loss=0.1135, simple_loss=0.1244, pruned_loss=0.03918, audio_tagging_loss=0.01214, over 3055910.01 frames. ], batch size: 57, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:31:25,789 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:31:36,815 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10150, loss[loss=0.1017, simple_loss=0.11, pruned_loss=0.03256, audio_tagging_loss=0.01415, over 14690.00 frames. ], tot_loss[loss=0.1137, simple_loss=0.1245, pruned_loss=0.0393, audio_tagging_loss=0.01214, over 3054462.48 frames. ], batch size: 55, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:31:50,955 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:31:58,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=228100.0, ans=0.2 2023-11-18 12:32:01,843 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:32:04,450 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 9.918e+01 1.075e+02 1.229e+02 2.012e+02, threshold=2.150e+02, percent-clipped=0.0 2023-11-18 12:32:32,042 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10200, loss[loss=0.1395, simple_loss=0.1682, pruned_loss=0.04466, audio_tagging_loss=0.01072, over 16414.00 frames. ], tot_loss[loss=0.1132, simple_loss=0.1244, pruned_loss=0.03894, audio_tagging_loss=0.01208, over 3057114.09 frames. ], batch size: 58, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:32:33,954 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.85 vs. limit=5.0 2023-11-18 12:32:48,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=228366.66666666666, ans=0.2 2023-11-18 12:32:50,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=228366.66666666666, ans=0.125 2023-11-18 12:32:52,185 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:33:27,630 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10250, loss[loss=0.09214, simple_loss=0.08824, pruned_loss=0.02986, audio_tagging_loss=0.01816, over 14411.00 frames. ], tot_loss[loss=0.1133, simple_loss=0.1241, pruned_loss=0.03905, audio_tagging_loss=0.01217, over 3060626.87 frames. ], batch size: 56, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:33:38,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=228700.0, ans=0.2 2023-11-18 12:33:41,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=228700.0, ans=0.0 2023-11-18 12:33:55,204 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 9.506e+01 1.026e+02 1.188e+02 1.534e+02, threshold=2.052e+02, percent-clipped=0.0 2023-11-18 12:34:22,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=228966.66666666666, ans=10.0 2023-11-18 12:34:23,494 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10300, loss[loss=0.1169, simple_loss=0.136, pruned_loss=0.03775, audio_tagging_loss=0.01117, over 16133.00 frames. ], tot_loss[loss=0.1137, simple_loss=0.1245, pruned_loss=0.03924, audio_tagging_loss=0.01219, over 3058248.63 frames. ], batch size: 58, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:34:29,502 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.51 vs. limit=10.0 2023-11-18 12:34:34,501 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=5.323e-03 2023-11-18 12:34:39,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=229033.33333333334, ans=0.125 2023-11-18 12:35:11,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=229233.33333333334, ans=0.0 2023-11-18 12:35:15,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=229233.33333333334, ans=0.0 2023-11-18 12:35:18,584 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10350, loss[loss=0.1043, simple_loss=0.1107, pruned_loss=0.03657, audio_tagging_loss=0.01238, over 14579.00 frames. ], tot_loss[loss=0.1145, simple_loss=0.1253, pruned_loss=0.03953, audio_tagging_loss=0.01228, over 3054314.70 frames. ], batch size: 54, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:35:28,853 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.02 vs. limit=15.0 2023-11-18 12:35:30,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=229366.66666666666, ans=0.125 2023-11-18 12:35:47,377 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.929e+01 9.749e+01 1.144e+02 1.287e+02 1.806e+02, threshold=2.288e+02, percent-clipped=0.0 2023-11-18 12:35:58,738 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.41 vs. limit=22.5 2023-11-18 12:36:08,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=229566.66666666666, ans=0.1 2023-11-18 12:36:14,274 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10400, loss[loss=0.1122, simple_loss=0.1303, pruned_loss=0.03649, audio_tagging_loss=0.01055, over 16514.00 frames. ], tot_loss[loss=0.1139, simple_loss=0.1243, pruned_loss=0.03933, audio_tagging_loss=0.01247, over 3053572.41 frames. ], batch size: 61, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:36:22,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=229633.33333333334, ans=0.125 2023-11-18 12:36:30,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=229700.0, ans=0.125 2023-11-18 12:37:09,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=229966.66666666666, ans=0.125 2023-11-18 12:37:10,527 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10450, loss[loss=0.08332, simple_loss=0.0971, pruned_loss=0.02326, audio_tagging_loss=0.01151, over 14271.00 frames. ], tot_loss[loss=0.1133, simple_loss=0.1238, pruned_loss=0.03908, audio_tagging_loss=0.0123, over 3052519.85 frames. ], batch size: 57, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:37:23,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=230033.33333333334, ans=0.125 2023-11-18 12:37:23,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=230033.33333333334, ans=0.0 2023-11-18 12:37:25,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=230033.33333333334, ans=0.0 2023-11-18 12:37:37,311 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.707e+01 9.254e+01 9.863e+01 1.141e+02 1.786e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 12:37:52,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=230166.66666666666, ans=0.035 2023-11-18 12:37:53,122 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.68 vs. limit=22.5 2023-11-18 12:38:02,623 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.89 vs. limit=22.5 2023-11-18 12:38:05,266 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10500, loss[loss=0.1028, simple_loss=0.1161, pruned_loss=0.03371, audio_tagging_loss=0.01103, over 15424.00 frames. ], tot_loss[loss=0.1127, simple_loss=0.1235, pruned_loss=0.03885, audio_tagging_loss=0.01207, over 3050639.33 frames. ], batch size: 56, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:38:09,241 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.74 vs. limit=6.0 2023-11-18 12:38:12,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=230300.0, ans=0.125 2023-11-18 12:38:13,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=230300.0, ans=0.1 2023-11-18 12:38:51,045 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:38:53,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=230566.66666666666, ans=0.0 2023-11-18 12:39:00,312 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10550, loss[loss=0.1325, simple_loss=0.1498, pruned_loss=0.04748, audio_tagging_loss=0.01007, over 15651.00 frames. ], tot_loss[loss=0.1131, simple_loss=0.1242, pruned_loss=0.03908, audio_tagging_loss=0.01194, over 3049771.23 frames. ], batch size: 58, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:39:02,090 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:39:09,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=230633.33333333334, ans=0.125 2023-11-18 12:39:09,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=230633.33333333334, ans=0.0 2023-11-18 12:39:09,611 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2023-11-18 12:39:14,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=230700.0, ans=0.1 2023-11-18 12:39:29,321 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 9.555e+01 1.067e+02 1.229e+02 1.948e+02, threshold=2.135e+02, percent-clipped=0.0 2023-11-18 12:39:35,238 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2023-11-18 12:39:56,907 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10600, loss[loss=0.06878, simple_loss=0.08118, pruned_loss=0.01934, audio_tagging_loss=0.008846, over 14818.00 frames. ], tot_loss[loss=0.1133, simple_loss=0.1245, pruned_loss=0.03916, audio_tagging_loss=0.01184, over 3044283.22 frames. ], batch size: 56, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:40:01,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=230966.66666666666, ans=0.125 2023-11-18 12:40:21,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=231100.0, ans=0.0 2023-11-18 12:40:28,399 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=12.0 2023-11-18 12:40:48,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=231233.33333333334, ans=0.1 2023-11-18 12:40:52,909 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10650, loss[loss=0.1262, simple_loss=0.1421, pruned_loss=0.04509, audio_tagging_loss=0.01012, over 14411.00 frames. ], tot_loss[loss=0.1137, simple_loss=0.1251, pruned_loss=0.03933, audio_tagging_loss=0.01185, over 3047133.99 frames. ], batch size: 55, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:41:10,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=231366.66666666666, ans=0.125 2023-11-18 12:41:20,428 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.385e+01 9.562e+01 1.044e+02 1.196e+02 1.427e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 12:41:32,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=231500.0, ans=0.125 2023-11-18 12:41:39,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=231566.66666666666, ans=0.125 2023-11-18 12:41:48,147 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10700, loss[loss=0.09384, simple_loss=0.1087, pruned_loss=0.02882, audio_tagging_loss=0.01069, over 15682.00 frames. ], tot_loss[loss=0.1138, simple_loss=0.125, pruned_loss=0.0394, audio_tagging_loss=0.0119, over 3052290.88 frames. ], batch size: 56, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:41:50,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=231633.33333333334, ans=0.125 2023-11-18 12:42:08,033 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:42:17,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=231766.66666666666, ans=0.0 2023-11-18 12:42:27,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=231833.33333333334, ans=0.0 2023-11-18 12:42:30,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=231833.33333333334, ans=0.0 2023-11-18 12:42:32,317 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=9.471e-01 2023-11-18 12:42:33,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=231900.0, ans=0.125 2023-11-18 12:42:41,554 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.69 vs. limit=10.0 2023-11-18 12:42:44,311 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10750, loss[loss=0.1129, simple_loss=0.1253, pruned_loss=0.0383, audio_tagging_loss=0.01198, over 16052.00 frames. ], tot_loss[loss=0.1127, simple_loss=0.1241, pruned_loss=0.03866, audio_tagging_loss=0.01196, over 3050902.25 frames. ], batch size: 58, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:42:50,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=231966.66666666666, ans=0.1 2023-11-18 12:43:10,514 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.21 vs. limit=6.0 2023-11-18 12:43:12,201 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 9.211e+01 1.033e+02 1.162e+02 1.735e+02, threshold=2.066e+02, percent-clipped=0.0 2023-11-18 12:43:13,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=232100.0, ans=0.1 2023-11-18 12:43:34,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=232233.33333333334, ans=0.125 2023-11-18 12:43:38,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=232233.33333333334, ans=0.2 2023-11-18 12:43:40,220 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10800, loss[loss=0.1023, simple_loss=0.1174, pruned_loss=0.03574, audio_tagging_loss=0.007905, over 15106.00 frames. ], tot_loss[loss=0.1127, simple_loss=0.1242, pruned_loss=0.03867, audio_tagging_loss=0.01189, over 3049265.46 frames. ], batch size: 56, lr: 1.92e-02, grad_scale: 128.0 2023-11-18 12:44:08,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=232433.33333333334, ans=0.1 2023-11-18 12:44:35,717 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10850, loss[loss=0.09492, simple_loss=0.1123, pruned_loss=0.02765, audio_tagging_loss=0.01113, over 16682.00 frames. ], tot_loss[loss=0.113, simple_loss=0.1246, pruned_loss=0.03881, audio_tagging_loss=0.01192, over 3056079.73 frames. ], batch size: 64, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:44:43,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=232633.33333333334, ans=0.0 2023-11-18 12:45:04,522 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.924e+01 9.821e+01 1.081e+02 1.220e+02 1.822e+02, threshold=2.162e+02, percent-clipped=0.0 2023-11-18 12:45:27,313 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:45:31,519 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10900, loss[loss=0.1185, simple_loss=0.1285, pruned_loss=0.04141, audio_tagging_loss=0.01281, over 14516.00 frames. ], tot_loss[loss=0.1131, simple_loss=0.1245, pruned_loss=0.0389, audio_tagging_loss=0.01201, over 3057248.95 frames. ], batch size: 56, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:45:43,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=233033.33333333334, ans=0.0 2023-11-18 12:45:47,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=233033.33333333334, ans=0.125 2023-11-18 12:45:50,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=233033.33333333334, ans=0.2 2023-11-18 12:46:14,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=233166.66666666666, ans=0.1 2023-11-18 12:46:27,305 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 10950, loss[loss=0.118, simple_loss=0.1281, pruned_loss=0.04095, audio_tagging_loss=0.01295, over 16423.00 frames. ], tot_loss[loss=0.1128, simple_loss=0.1241, pruned_loss=0.03867, audio_tagging_loss=0.01203, over 3058485.31 frames. ], batch size: 61, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:46:27,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=233300.0, ans=0.125 2023-11-18 12:46:39,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=233366.66666666666, ans=0.125 2023-11-18 12:46:42,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=233366.66666666666, ans=0.0 2023-11-18 12:46:54,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=233433.33333333334, ans=0.2 2023-11-18 12:46:56,639 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 9.530e+01 1.056e+02 1.170e+02 1.707e+02, threshold=2.112e+02, percent-clipped=0.0 2023-11-18 12:46:58,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=233433.33333333334, ans=0.125 2023-11-18 12:46:59,309 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2023-11-18 12:47:14,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=233566.66666666666, ans=0.125 2023-11-18 12:47:14,759 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.85 vs. limit=12.0 2023-11-18 12:47:22,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=233633.33333333334, ans=0.125 2023-11-18 12:47:23,192 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11000, loss[loss=0.1322, simple_loss=0.138, pruned_loss=0.04691, audio_tagging_loss=0.01626, over 15376.00 frames. ], tot_loss[loss=0.1123, simple_loss=0.1233, pruned_loss=0.03846, audio_tagging_loss=0.01219, over 3052174.50 frames. ], batch size: 57, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:47:30,392 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2023-11-18 12:47:32,176 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:48:18,570 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.20 vs. limit=10.0 2023-11-18 12:48:19,017 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11050, loss[loss=0.09842, simple_loss=0.1045, pruned_loss=0.03433, audio_tagging_loss=0.01185, over 14544.00 frames. ], tot_loss[loss=0.1122, simple_loss=0.1226, pruned_loss=0.03866, audio_tagging_loss=0.01228, over 3050472.84 frames. ], batch size: 57, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:48:32,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=234033.33333333334, ans=0.1 2023-11-18 12:48:32,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=234033.33333333334, ans=0.04949747468305833 2023-11-18 12:48:38,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=234033.33333333334, ans=0.125 2023-11-18 12:48:47,582 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 9.699e+01 1.056e+02 1.206e+02 1.867e+02, threshold=2.111e+02, percent-clipped=0.0 2023-11-18 12:48:48,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=234100.0, ans=0.125 2023-11-18 12:49:00,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=234166.66666666666, ans=0.0 2023-11-18 12:49:05,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=234233.33333333334, ans=0.0 2023-11-18 12:49:14,684 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11100, loss[loss=0.1187, simple_loss=0.1319, pruned_loss=0.03952, audio_tagging_loss=0.01322, over 14912.00 frames. ], tot_loss[loss=0.1142, simple_loss=0.1247, pruned_loss=0.03956, audio_tagging_loss=0.01233, over 3050235.85 frames. ], batch size: 56, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:49:28,819 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2023-11-18 12:49:39,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=234433.33333333334, ans=0.0 2023-11-18 12:49:57,420 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.12 vs. limit=10.0 2023-11-18 12:50:09,601 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11150, loss[loss=0.07807, simple_loss=0.08294, pruned_loss=0.02154, audio_tagging_loss=0.01506, over 15618.00 frames. ], tot_loss[loss=0.1137, simple_loss=0.124, pruned_loss=0.03923, audio_tagging_loss=0.01243, over 3055747.30 frames. ], batch size: 60, lr: 1.91e-02, grad_scale: 64.0 2023-11-18 12:50:13,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=234633.33333333334, ans=0.0 2023-11-18 12:50:39,496 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.809e+01 9.879e+01 1.104e+02 1.293e+02 2.710e+02, threshold=2.209e+02, percent-clipped=1.0 2023-11-18 12:50:43,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=234833.33333333334, ans=0.2 2023-11-18 12:50:46,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=234833.33333333334, ans=0.0 2023-11-18 12:50:55,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=234900.0, ans=0.0 2023-11-18 12:51:06,384 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11200, loss[loss=0.1497, simple_loss=0.1606, pruned_loss=0.05819, audio_tagging_loss=0.01122, over 14941.00 frames. ], tot_loss[loss=0.1142, simple_loss=0.1249, pruned_loss=0.03931, audio_tagging_loss=0.01246, over 3056796.87 frames. ], batch size: 57, lr: 1.91e-02, grad_scale: 64.0 2023-11-18 12:51:10,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.61 vs. limit=8.0 2023-11-18 12:51:16,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=235033.33333333334, ans=0.1 2023-11-18 12:51:25,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=235033.33333333334, ans=0.2 2023-11-18 12:51:36,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=235100.0, ans=0.1 2023-11-18 12:51:42,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=235166.66666666666, ans=0.1 2023-11-18 12:51:57,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=235233.33333333334, ans=0.05 2023-11-18 12:52:01,480 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11250, loss[loss=0.09245, simple_loss=0.1069, pruned_loss=0.02872, audio_tagging_loss=0.01028, over 15913.00 frames. ], tot_loss[loss=0.1139, simple_loss=0.1243, pruned_loss=0.03937, audio_tagging_loss=0.01238, over 3059624.86 frames. ], batch size: 59, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:52:09,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=235300.0, ans=0.0 2023-11-18 12:52:21,002 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.52 vs. limit=15.0 2023-11-18 12:52:31,471 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.992e+01 9.651e+01 1.080e+02 1.306e+02 2.369e+02, threshold=2.160e+02, percent-clipped=1.0 2023-11-18 12:52:42,881 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.199e+00 2023-11-18 12:52:42,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=235500.0, ans=0.0 2023-11-18 12:52:56,392 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11300, loss[loss=0.04942, simple_loss=0.04178, pruned_loss=0.01184, audio_tagging_loss=0.01669, over 14855.00 frames. ], tot_loss[loss=0.1138, simple_loss=0.1248, pruned_loss=0.03911, audio_tagging_loss=0.01223, over 3060460.03 frames. ], batch size: 58, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:52:58,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=235633.33333333334, ans=0.5 2023-11-18 12:53:00,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=235633.33333333334, ans=0.0 2023-11-18 12:53:11,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=235700.0, ans=0.125 2023-11-18 12:53:42,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=235900.0, ans=0.0 2023-11-18 12:53:46,761 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.42 vs. limit=15.0 2023-11-18 12:53:52,136 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11350, loss[loss=0.1144, simple_loss=0.1188, pruned_loss=0.0403, audio_tagging_loss=0.01475, over 15370.00 frames. ], tot_loss[loss=0.114, simple_loss=0.1254, pruned_loss=0.03933, audio_tagging_loss=0.01199, over 3058651.86 frames. ], batch size: 61, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:53:53,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=235966.66666666666, ans=0.0 2023-11-18 12:54:08,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=236033.33333333334, ans=0.125 2023-11-18 12:54:09,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=236033.33333333334, ans=0.125 2023-11-18 12:54:10,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=236033.33333333334, ans=0.0 2023-11-18 12:54:21,997 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.976e+01 9.667e+01 1.093e+02 1.238e+02 1.995e+02, threshold=2.185e+02, percent-clipped=0.0 2023-11-18 12:54:24,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=236166.66666666666, ans=0.1 2023-11-18 12:54:48,556 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11400, loss[loss=0.122, simple_loss=0.1347, pruned_loss=0.04384, audio_tagging_loss=0.01078, over 15582.00 frames. ], tot_loss[loss=0.113, simple_loss=0.1247, pruned_loss=0.03889, audio_tagging_loss=0.01181, over 3059954.99 frames. ], batch size: 56, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:54:55,640 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.51 vs. limit=22.5 2023-11-18 12:55:02,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=236366.66666666666, ans=0.125 2023-11-18 12:55:15,020 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.08 vs. limit=15.0 2023-11-18 12:55:43,121 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11450, loss[loss=0.07559, simple_loss=0.08323, pruned_loss=0.0193, audio_tagging_loss=0.01468, over 13260.00 frames. ], tot_loss[loss=0.1126, simple_loss=0.1241, pruned_loss=0.0387, audio_tagging_loss=0.01187, over 3062073.41 frames. ], batch size: 52, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:56:09,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=236766.66666666666, ans=0.04949747468305833 2023-11-18 12:56:13,499 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.236e+01 9.179e+01 9.957e+01 1.104e+02 1.348e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-18 12:56:15,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=236833.33333333334, ans=0.05 2023-11-18 12:56:30,157 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.55 vs. limit=22.5 2023-11-18 12:56:30,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=236900.0, ans=0.0 2023-11-18 12:56:32,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=236900.0, ans=0.2 2023-11-18 12:56:32,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=236900.0, ans=0.1 2023-11-18 12:56:36,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.18 vs. limit=22.5 2023-11-18 12:56:37,830 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.31 vs. limit=22.5 2023-11-18 12:56:38,365 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11500, loss[loss=0.1519, simple_loss=0.1651, pruned_loss=0.05731, audio_tagging_loss=0.01204, over 15939.00 frames. ], tot_loss[loss=0.1121, simple_loss=0.1233, pruned_loss=0.03851, audio_tagging_loss=0.0119, over 3062219.29 frames. ], batch size: 58, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:56:46,476 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.27 vs. limit=15.0 2023-11-18 12:56:46,516 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2023-11-18 12:56:47,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=236966.66666666666, ans=0.125 2023-11-18 12:56:47,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=236966.66666666666, ans=0.0 2023-11-18 12:57:35,030 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11550, loss[loss=0.1107, simple_loss=0.1317, pruned_loss=0.03703, audio_tagging_loss=0.007814, over 14696.00 frames. ], tot_loss[loss=0.112, simple_loss=0.1231, pruned_loss=0.03858, audio_tagging_loss=0.01188, over 3056915.72 frames. ], batch size: 56, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 12:57:39,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=237300.0, ans=0.125 2023-11-18 12:57:43,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=237300.0, ans=0.0 2023-11-18 12:57:55,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=237433.33333333334, ans=0.125 2023-11-18 12:57:56,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=237433.33333333334, ans=0.125 2023-11-18 12:58:02,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=237433.33333333334, ans=0.2 2023-11-18 12:58:04,245 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 9.411e+01 1.015e+02 1.135e+02 1.692e+02, threshold=2.029e+02, percent-clipped=0.0 2023-11-18 12:58:08,066 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:58:28,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=237566.66666666666, ans=0.125 2023-11-18 12:58:30,019 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11600, loss[loss=0.1128, simple_loss=0.1293, pruned_loss=0.03856, audio_tagging_loss=0.009571, over 15668.00 frames. ], tot_loss[loss=0.1133, simple_loss=0.1247, pruned_loss=0.0391, audio_tagging_loss=0.01192, over 3055326.49 frames. ], batch size: 57, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 12:58:49,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=237700.0, ans=0.0 2023-11-18 12:58:57,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=237766.66666666666, ans=0.1 2023-11-18 12:59:04,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=237833.33333333334, ans=0.125 2023-11-18 12:59:11,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=237833.33333333334, ans=0.125 2023-11-18 12:59:13,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=237900.0, ans=0.125 2023-11-18 12:59:25,495 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11650, loss[loss=0.1125, simple_loss=0.1324, pruned_loss=0.03832, audio_tagging_loss=0.007968, over 15984.00 frames. ], tot_loss[loss=0.1127, simple_loss=0.1238, pruned_loss=0.03883, audio_tagging_loss=0.01203, over 3042928.51 frames. ], batch size: 60, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 12:59:36,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=238033.33333333334, ans=0.0 2023-11-18 12:59:53,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=238100.0, ans=0.0 2023-11-18 12:59:54,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=238100.0, ans=0.025 2023-11-18 12:59:55,508 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 9.411e+01 1.037e+02 1.164e+02 1.752e+02, threshold=2.075e+02, percent-clipped=0.0 2023-11-18 13:00:00,499 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.10 vs. limit=22.5 2023-11-18 13:00:19,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=238233.33333333334, ans=0.1 2023-11-18 13:00:20,934 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11700, loss[loss=0.112, simple_loss=0.1093, pruned_loss=0.03845, audio_tagging_loss=0.01891, over 14946.00 frames. ], tot_loss[loss=0.1125, simple_loss=0.1233, pruned_loss=0.0388, audio_tagging_loss=0.01205, over 3049992.36 frames. ], batch size: 59, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:00:35,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=238366.66666666666, ans=0.2 2023-11-18 13:00:35,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=238366.66666666666, ans=0.125 2023-11-18 13:00:53,285 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.04 vs. limit=15.0 2023-11-18 13:01:16,338 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11750, loss[loss=0.09521, simple_loss=0.1113, pruned_loss=0.02979, audio_tagging_loss=0.009776, over 14353.00 frames. ], tot_loss[loss=0.1123, simple_loss=0.1232, pruned_loss=0.03866, audio_tagging_loss=0.01203, over 3046934.27 frames. ], batch size: 56, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:01:34,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=238700.0, ans=0.0 2023-11-18 13:01:41,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=238766.66666666666, ans=0.1 2023-11-18 13:01:42,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=238766.66666666666, ans=0.125 2023-11-18 13:01:45,344 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.12 vs. limit=15.0 2023-11-18 13:01:46,323 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.614e+01 1.047e+02 1.164e+02 1.458e+02 1.909e+02, threshold=2.328e+02, percent-clipped=0.0 2023-11-18 13:01:59,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=238900.0, ans=0.1 2023-11-18 13:02:11,169 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11800, loss[loss=0.1212, simple_loss=0.1253, pruned_loss=0.03887, audio_tagging_loss=0.01967, over 16295.00 frames. ], tot_loss[loss=0.1126, simple_loss=0.1237, pruned_loss=0.03875, audio_tagging_loss=0.01201, over 3042961.64 frames. ], batch size: 59, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:02:26,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=239033.33333333334, ans=0.0 2023-11-18 13:02:29,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=239033.33333333334, ans=0.0 2023-11-18 13:02:33,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=239100.0, ans=0.125 2023-11-18 13:02:38,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=239100.0, ans=0.025 2023-11-18 13:02:42,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=239100.0, ans=0.125 2023-11-18 13:02:48,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=239166.66666666666, ans=0.1 2023-11-18 13:03:07,105 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11850, loss[loss=0.1328, simple_loss=0.1499, pruned_loss=0.04826, audio_tagging_loss=0.009612, over 15627.00 frames. ], tot_loss[loss=0.1132, simple_loss=0.1245, pruned_loss=0.03889, audio_tagging_loss=0.01211, over 3052802.86 frames. ], batch size: 56, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:03:15,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=239300.0, ans=0.025 2023-11-18 13:03:17,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=239366.66666666666, ans=0.0 2023-11-18 13:03:22,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=239366.66666666666, ans=0.1 2023-11-18 13:03:30,804 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.65 vs. limit=6.0 2023-11-18 13:03:33,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=239433.33333333334, ans=0.2 2023-11-18 13:03:36,859 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.964e+01 9.657e+01 1.079e+02 1.246e+02 1.721e+02, threshold=2.158e+02, percent-clipped=0.0 2023-11-18 13:03:38,458 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.68 vs. limit=15.0 2023-11-18 13:03:44,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=239500.0, ans=0.05 2023-11-18 13:03:49,355 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.06 vs. limit=15.0 2023-11-18 13:03:50,329 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=12.0 2023-11-18 13:03:51,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=239566.66666666666, ans=0.0 2023-11-18 13:04:02,551 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11900, loss[loss=0.1211, simple_loss=0.1324, pruned_loss=0.04404, audio_tagging_loss=0.01085, over 15830.00 frames. ], tot_loss[loss=0.1135, simple_loss=0.1246, pruned_loss=0.03891, audio_tagging_loss=0.0123, over 3050347.10 frames. ], batch size: 61, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:04:08,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=239633.33333333334, ans=0.0 2023-11-18 13:04:32,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=239766.66666666666, ans=0.035 2023-11-18 13:04:34,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=239833.33333333334, ans=0.125 2023-11-18 13:04:45,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=239900.0, ans=0.125 2023-11-18 13:04:53,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=239900.0, ans=0.025 2023-11-18 13:04:56,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=239966.66666666666, ans=0.125 2023-11-18 13:04:57,147 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 11950, loss[loss=0.1054, simple_loss=0.1216, pruned_loss=0.03338, audio_tagging_loss=0.01121, over 15357.00 frames. ], tot_loss[loss=0.1145, simple_loss=0.126, pruned_loss=0.0393, audio_tagging_loss=0.01225, over 3053905.15 frames. ], batch size: 56, lr: 1.89e-02, grad_scale: 32.0 2023-11-18 13:05:08,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=239966.66666666666, ans=0.0 2023-11-18 13:05:11,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=240033.33333333334, ans=0.1 2023-11-18 13:05:16,779 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.38 vs. limit=22.5 2023-11-18 13:05:29,707 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 9.537e+01 1.099e+02 1.271e+02 1.974e+02, threshold=2.199e+02, percent-clipped=0.0 2023-11-18 13:05:38,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=240166.66666666666, ans=0.0 2023-11-18 13:05:40,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=240166.66666666666, ans=0.125 2023-11-18 13:05:48,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=240233.33333333334, ans=0.0 2023-11-18 13:05:53,172 INFO [train_asr.py:1115] (2/4) Epoch 3, batch 12000, loss[loss=0.1175, simple_loss=0.1312, pruned_loss=0.04091, audio_tagging_loss=0.01101, over 15291.00 frames. ], tot_loss[loss=0.114, simple_loss=0.1253, pruned_loss=0.03892, audio_tagging_loss=0.01241, over 3058497.41 frames. ], batch size: 58, lr: 1.89e-02, grad_scale: 32.0 2023-11-18 13:05:53,173 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 13:06:06,224 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3725, 4.0299, 2.0911, 3.9346], device='cuda:2') 2023-11-18 13:06:11,037 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1998, 2.2282, 5.1882, 2.1206], device='cuda:2') 2023-11-18 13:06:26,328 INFO [train_asr.py:1147] (2/4) Epoch 3, validation: loss=0.07855, simple_loss=0.06384, pruned_loss=0.01132, audio_tagging_loss=0.03531, over 4681554.00 frames. 2023-11-18 13:06:26,329 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 13:06:39,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=240366.66666666666, ans=0.125 2023-11-18 13:07:28,603 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 0, loss[loss=0.08532, simple_loss=0.07393, pruned_loss=0.01607, audio_tagging_loss=0.03228, over 15655.00 frames. ], tot_loss[loss=0.08532, simple_loss=0.07393, pruned_loss=0.01607, audio_tagging_loss=0.03228, over 15655.00 frames. ], batch size: 60, lr: 1.77e-02, grad_scale: 32.0 2023-11-18 13:07:28,604 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 13:08:00,457 INFO [train_asr.py:1147] (2/4) Epoch 4, validation: loss=0.07694, simple_loss=0.06378, pruned_loss=0.01116, audio_tagging_loss=0.03389, over 4681554.00 frames. 2023-11-18 13:08:00,458 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 13:08:14,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=240520.0, ans=0.125 2023-11-18 13:08:23,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=240586.66666666666, ans=0.125 2023-11-18 13:08:27,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=240586.66666666666, ans=10.0 2023-11-18 13:08:55,949 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 50, loss[loss=0.1291, simple_loss=0.1474, pruned_loss=0.03863, audio_tagging_loss=0.01674, over 14914.00 frames. ], tot_loss[loss=0.1199, simple_loss=0.1189, pruned_loss=0.03683, audio_tagging_loss=0.02361, over 687239.91 frames. ], batch size: 54, lr: 1.77e-02, grad_scale: 32.0 2023-11-18 13:09:00,235 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.826e+01 9.935e+01 1.154e+02 1.332e+02 1.872e+02, threshold=2.308e+02, percent-clipped=0.0 2023-11-18 13:09:14,045 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.52 vs. limit=22.5 2023-11-18 13:09:17,212 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.62 vs. limit=12.0 2023-11-18 13:09:25,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=240920.0, ans=0.0 2023-11-18 13:09:29,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=240986.66666666666, ans=0.2 2023-11-18 13:09:39,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=241053.33333333334, ans=0.1 2023-11-18 13:09:43,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=241053.33333333334, ans=10.0 2023-11-18 13:09:50,334 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.68 vs. limit=22.5 2023-11-18 13:09:52,225 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 100, loss[loss=0.144, simple_loss=0.15, pruned_loss=0.05063, audio_tagging_loss=0.01835, over 15033.00 frames. ], tot_loss[loss=0.1212, simple_loss=0.1222, pruned_loss=0.03756, audio_tagging_loss=0.02255, over 1222448.54 frames. ], batch size: 57, lr: 1.77e-02, grad_scale: 32.0 2023-11-18 13:10:00,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=241120.0, ans=0.0 2023-11-18 13:10:23,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=241253.33333333334, ans=0.125 2023-11-18 13:10:37,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=241386.66666666666, ans=0.1 2023-11-18 13:10:47,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=241453.33333333334, ans=0.125 2023-11-18 13:10:47,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=241453.33333333334, ans=0.125 2023-11-18 13:10:48,258 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 150, loss[loss=0.1066, simple_loss=0.1145, pruned_loss=0.03589, audio_tagging_loss=0.01344, over 15588.00 frames. ], tot_loss[loss=0.1186, simple_loss=0.1226, pruned_loss=0.03732, audio_tagging_loss=0.02, over 1622833.55 frames. ], batch size: 58, lr: 1.77e-02, grad_scale: 32.0 2023-11-18 13:10:52,418 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 9.521e+01 1.016e+02 1.130e+02 1.451e+02, threshold=2.033e+02, percent-clipped=0.0 2023-11-18 13:10:54,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=241453.33333333334, ans=0.04949747468305833 2023-11-18 13:10:58,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=241520.0, ans=0.1 2023-11-18 13:11:32,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=241720.0, ans=0.0 2023-11-18 13:11:36,367 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=241720.0, ans=0.0 2023-11-18 13:11:36,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=241720.0, ans=0.0 2023-11-18 13:11:44,097 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 200, loss[loss=0.1123, simple_loss=0.1125, pruned_loss=0.03984, audio_tagging_loss=0.01627, over 14391.00 frames. ], tot_loss[loss=0.1168, simple_loss=0.123, pruned_loss=0.03764, audio_tagging_loss=0.01765, over 1939156.26 frames. ], batch size: 56, lr: 1.76e-02, grad_scale: 32.0 2023-11-18 13:12:17,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=241986.66666666666, ans=0.0 2023-11-18 13:12:40,379 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 250, loss[loss=0.09884, simple_loss=0.118, pruned_loss=0.02841, audio_tagging_loss=0.01142, over 14464.00 frames. ], tot_loss[loss=0.117, simple_loss=0.1251, pruned_loss=0.03856, audio_tagging_loss=0.01589, over 2186081.26 frames. ], batch size: 53, lr: 1.76e-02, grad_scale: 16.0 2023-11-18 13:12:44,971 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2023-11-18 13:12:45,587 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 9.626e+01 1.050e+02 1.196e+02 1.667e+02, threshold=2.101e+02, percent-clipped=0.0 2023-11-18 13:12:59,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=242186.66666666666, ans=0.2 2023-11-18 13:13:14,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=242320.0, ans=0.0 2023-11-18 13:13:21,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=242320.0, ans=0.07 2023-11-18 13:13:23,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=242386.66666666666, ans=0.125 2023-11-18 13:13:35,863 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 300, loss[loss=0.1108, simple_loss=0.1259, pruned_loss=0.03652, audio_tagging_loss=0.01136, over 15341.00 frames. ], tot_loss[loss=0.1162, simple_loss=0.1254, pruned_loss=0.03872, audio_tagging_loss=0.01475, over 2370183.03 frames. ], batch size: 56, lr: 1.76e-02, grad_scale: 16.0 2023-11-18 13:13:49,695 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.343e+00 2023-11-18 13:14:03,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=242586.66666666666, ans=0.0 2023-11-18 13:14:26,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=242720.0, ans=0.07 2023-11-18 13:14:26,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=242720.0, ans=0.125 2023-11-18 13:14:31,224 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 350, loss[loss=0.08977, simple_loss=0.0953, pruned_loss=0.02982, audio_tagging_loss=0.0123, over 15917.00 frames. ], tot_loss[loss=0.1149, simple_loss=0.1247, pruned_loss=0.03856, audio_tagging_loss=0.014, over 2524550.32 frames. ], batch size: 61, lr: 1.76e-02, grad_scale: 16.0 2023-11-18 13:14:37,024 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.919e+01 9.705e+01 1.099e+02 1.261e+02 1.880e+02, threshold=2.197e+02, percent-clipped=0.0 2023-11-18 13:14:59,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=242920.0, ans=10.0 2023-11-18 13:15:27,800 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 400, loss[loss=0.1215, simple_loss=0.1447, pruned_loss=0.03596, audio_tagging_loss=0.01313, over 15878.00 frames. ], tot_loss[loss=0.1129, simple_loss=0.1234, pruned_loss=0.03769, audio_tagging_loss=0.01348, over 2636974.11 frames. ], batch size: 61, lr: 1.76e-02, grad_scale: 32.0 2023-11-18 13:15:35,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=243120.0, ans=0.5 2023-11-18 13:15:51,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=243253.33333333334, ans=0.0 2023-11-18 13:15:53,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=243253.33333333334, ans=0.0 2023-11-18 13:16:01,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=243320.0, ans=0.1 2023-11-18 13:16:22,998 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 450, loss[loss=0.125, simple_loss=0.1413, pruned_loss=0.04303, audio_tagging_loss=0.0113, over 16711.00 frames. ], tot_loss[loss=0.1125, simple_loss=0.124, pruned_loss=0.03761, audio_tagging_loss=0.01294, over 2734395.17 frames. ], batch size: 64, lr: 1.76e-02, grad_scale: 32.0 2023-11-18 13:16:28,335 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.979e+01 9.241e+01 1.029e+02 1.146e+02 1.664e+02, threshold=2.058e+02, percent-clipped=0.0 2023-11-18 13:16:32,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=243453.33333333334, ans=0.0 2023-11-18 13:16:47,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=243586.66666666666, ans=0.07 2023-11-18 13:16:54,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=243586.66666666666, ans=0.2 2023-11-18 13:17:01,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=243653.33333333334, ans=0.2 2023-11-18 13:17:02,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=243653.33333333334, ans=0.0 2023-11-18 13:17:18,762 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 500, loss[loss=0.09825, simple_loss=0.1051, pruned_loss=0.03028, audio_tagging_loss=0.01541, over 13378.00 frames. ], tot_loss[loss=0.1118, simple_loss=0.1232, pruned_loss=0.03752, audio_tagging_loss=0.01271, over 2797029.89 frames. ], batch size: 53, lr: 1.76e-02, grad_scale: 16.0 2023-11-18 13:17:23,011 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.54 vs. limit=15.0 2023-11-18 13:17:25,813 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:18:07,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=244053.33333333334, ans=0.125 2023-11-18 13:18:15,518 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 550, loss[loss=0.1293, simple_loss=0.1435, pruned_loss=0.05005, audio_tagging_loss=0.007451, over 14358.00 frames. ], tot_loss[loss=0.1118, simple_loss=0.123, pruned_loss=0.03771, audio_tagging_loss=0.01256, over 2848440.00 frames. ], batch size: 53, lr: 1.76e-02, grad_scale: 8.0 2023-11-18 13:18:20,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=244120.0, ans=0.125 2023-11-18 13:18:23,484 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 9.469e+01 1.045e+02 1.178e+02 1.805e+02, threshold=2.090e+02, percent-clipped=0.0 2023-11-18 13:18:23,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=244120.0, ans=0.0 2023-11-18 13:18:48,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=244320.0, ans=0.125 2023-11-18 13:18:51,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=244320.0, ans=0.015 2023-11-18 13:19:06,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=244386.66666666666, ans=0.125 2023-11-18 13:19:11,336 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 600, loss[loss=0.1213, simple_loss=0.1421, pruned_loss=0.041, audio_tagging_loss=0.009198, over 16112.00 frames. ], tot_loss[loss=0.1117, simple_loss=0.1231, pruned_loss=0.03784, audio_tagging_loss=0.01235, over 2889424.47 frames. ], batch size: 57, lr: 1.76e-02, grad_scale: 8.0 2023-11-18 13:19:12,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=244453.33333333334, ans=0.1 2023-11-18 13:19:21,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=244520.0, ans=0.125 2023-11-18 13:19:36,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=244586.66666666666, ans=0.2 2023-11-18 13:19:49,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=244653.33333333334, ans=0.0 2023-11-18 13:19:55,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=244720.0, ans=0.0 2023-11-18 13:19:59,820 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.72 vs. limit=12.0 2023-11-18 13:20:06,673 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 650, loss[loss=0.09943, simple_loss=0.116, pruned_loss=0.03047, audio_tagging_loss=0.01097, over 14546.00 frames. ], tot_loss[loss=0.1114, simple_loss=0.1228, pruned_loss=0.03767, audio_tagging_loss=0.01229, over 2915705.21 frames. ], batch size: 54, lr: 1.75e-02, grad_scale: 8.0 2023-11-18 13:20:15,227 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 9.493e+01 1.068e+02 1.179e+02 1.760e+02, threshold=2.137e+02, percent-clipped=0.0 2023-11-18 13:20:17,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=244853.33333333334, ans=0.125 2023-11-18 13:20:29,977 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.58 vs. limit=15.0 2023-11-18 13:20:35,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=244920.0, ans=0.125 2023-11-18 13:20:47,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=244986.66666666666, ans=0.125 2023-11-18 13:21:03,270 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 700, loss[loss=0.1054, simple_loss=0.1141, pruned_loss=0.03539, audio_tagging_loss=0.01301, over 14336.00 frames. ], tot_loss[loss=0.1118, simple_loss=0.124, pruned_loss=0.03777, audio_tagging_loss=0.01206, over 2944403.33 frames. ], batch size: 54, lr: 1.75e-02, grad_scale: 8.0 2023-11-18 13:21:13,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=245120.0, ans=0.125 2023-11-18 13:21:15,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=245186.66666666666, ans=0.0 2023-11-18 13:21:33,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=245253.33333333334, ans=0.2 2023-11-18 13:21:33,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=245253.33333333334, ans=0.1 2023-11-18 13:21:53,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=245386.66666666666, ans=0.0 2023-11-18 13:21:53,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=245386.66666666666, ans=0.0 2023-11-18 13:21:59,701 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 750, loss[loss=0.1566, simple_loss=0.1747, pruned_loss=0.05779, audio_tagging_loss=0.01146, over 15734.00 frames. ], tot_loss[loss=0.1128, simple_loss=0.1254, pruned_loss=0.03818, audio_tagging_loss=0.01198, over 2972312.83 frames. ], batch size: 57, lr: 1.75e-02, grad_scale: 8.0 2023-11-18 13:22:07,024 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 9.458e+01 1.066e+02 1.214e+02 1.611e+02, threshold=2.132e+02, percent-clipped=0.0 2023-11-18 13:22:10,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=245520.0, ans=0.2 2023-11-18 13:22:26,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=245586.66666666666, ans=0.2 2023-11-18 13:22:36,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=245653.33333333334, ans=0.125 2023-11-18 13:22:40,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=245653.33333333334, ans=0.05 2023-11-18 13:22:54,553 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 800, loss[loss=0.133, simple_loss=0.1411, pruned_loss=0.04996, audio_tagging_loss=0.01252, over 15246.00 frames. ], tot_loss[loss=0.1121, simple_loss=0.1242, pruned_loss=0.0378, audio_tagging_loss=0.01218, over 2993537.59 frames. ], batch size: 56, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:23:13,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=245853.33333333334, ans=0.0 2023-11-18 13:23:16,365 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.03 vs. limit=22.5 2023-11-18 13:23:28,931 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2023-11-18 13:23:32,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=245986.66666666666, ans=0.1 2023-11-18 13:23:35,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=245986.66666666666, ans=0.125 2023-11-18 13:23:35,291 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=17.36 vs. limit=15.0 2023-11-18 13:23:46,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=246053.33333333334, ans=0.125 2023-11-18 13:23:50,699 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 850, loss[loss=0.1085, simple_loss=0.1177, pruned_loss=0.03791, audio_tagging_loss=0.01177, over 15334.00 frames. ], tot_loss[loss=0.1113, simple_loss=0.1235, pruned_loss=0.03747, audio_tagging_loss=0.01214, over 3005113.00 frames. ], batch size: 57, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:23:59,082 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.134e+01 9.530e+01 1.051e+02 1.203e+02 1.738e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-18 13:24:06,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=246186.66666666666, ans=0.125 2023-11-18 13:24:11,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=246186.66666666666, ans=0.125 2023-11-18 13:24:13,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=246253.33333333334, ans=0.09899494936611666 2023-11-18 13:24:25,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=246320.0, ans=0.125 2023-11-18 13:24:47,011 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 900, loss[loss=0.1142, simple_loss=0.132, pruned_loss=0.03538, audio_tagging_loss=0.01279, over 15907.00 frames. ], tot_loss[loss=0.1125, simple_loss=0.1247, pruned_loss=0.03801, audio_tagging_loss=0.0122, over 3018704.40 frames. ], batch size: 58, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:24:53,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=246453.33333333334, ans=0.125 2023-11-18 13:25:04,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=246520.0, ans=0.1 2023-11-18 13:25:04,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=246520.0, ans=0.125 2023-11-18 13:25:05,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=246520.0, ans=0.125 2023-11-18 13:25:14,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=246586.66666666666, ans=0.2 2023-11-18 13:25:15,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=246586.66666666666, ans=0.125 2023-11-18 13:25:15,518 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.81 vs. limit=15.0 2023-11-18 13:25:38,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=246720.0, ans=0.015 2023-11-18 13:25:42,383 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 950, loss[loss=0.119, simple_loss=0.1348, pruned_loss=0.04046, audio_tagging_loss=0.01112, over 15524.00 frames. ], tot_loss[loss=0.1119, simple_loss=0.1244, pruned_loss=0.03771, audio_tagging_loss=0.01201, over 3026974.83 frames. ], batch size: 56, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:25:49,684 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 9.343e+01 1.032e+02 1.151e+02 2.313e+02, threshold=2.063e+02, percent-clipped=1.0 2023-11-18 13:25:50,309 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2023-11-18 13:26:18,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=246986.66666666666, ans=6.0 2023-11-18 13:26:18,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=246986.66666666666, ans=0.125 2023-11-18 13:26:26,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=247053.33333333334, ans=0.125 2023-11-18 13:26:32,906 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=13.11 vs. limit=15.0 2023-11-18 13:26:37,925 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1000, loss[loss=0.07093, simple_loss=0.06286, pruned_loss=0.02654, audio_tagging_loss=0.01296, over 14998.00 frames. ], tot_loss[loss=0.1118, simple_loss=0.1244, pruned_loss=0.03779, audio_tagging_loss=0.0118, over 3029430.71 frames. ], batch size: 60, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:26:40,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=247120.0, ans=0.0 2023-11-18 13:26:47,454 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.92 vs. limit=12.0 2023-11-18 13:27:01,676 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:27:01,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=247253.33333333334, ans=0.1 2023-11-18 13:27:02,466 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.01 vs. limit=22.5 2023-11-18 13:27:12,838 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=1.98 vs. limit=15.0 2023-11-18 13:27:33,828 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1050, loss[loss=0.118, simple_loss=0.1318, pruned_loss=0.04342, audio_tagging_loss=0.008709, over 15495.00 frames. ], tot_loss[loss=0.1108, simple_loss=0.1232, pruned_loss=0.03745, audio_tagging_loss=0.01177, over 3028162.60 frames. ], batch size: 56, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:27:37,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=247453.33333333334, ans=0.125 2023-11-18 13:27:41,184 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.026e+01 9.797e+01 1.106e+02 1.274e+02 2.848e+02, threshold=2.212e+02, percent-clipped=1.0 2023-11-18 13:27:48,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=247520.0, ans=0.2 2023-11-18 13:28:01,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=247586.66666666666, ans=0.125 2023-11-18 13:28:03,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=247586.66666666666, ans=0.125 2023-11-18 13:28:13,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=247653.33333333334, ans=0.2 2023-11-18 13:28:20,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=247720.0, ans=0.125 2023-11-18 13:28:28,417 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1100, loss[loss=0.109, simple_loss=0.1144, pruned_loss=0.03796, audio_tagging_loss=0.01381, over 15465.00 frames. ], tot_loss[loss=0.1102, simple_loss=0.1222, pruned_loss=0.0374, audio_tagging_loss=0.01167, over 3036277.86 frames. ], batch size: 59, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:28:29,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=247786.66666666666, ans=0.2 2023-11-18 13:28:30,568 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:28:33,154 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.22 vs. limit=15.0 2023-11-18 13:28:42,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=247853.33333333334, ans=0.125 2023-11-18 13:28:49,127 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=247920.0, ans=0.0 2023-11-18 13:28:49,589 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=12.0 2023-11-18 13:29:03,907 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.61 vs. limit=12.0 2023-11-18 13:29:11,948 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.78 vs. limit=15.0 2023-11-18 13:29:24,438 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1150, loss[loss=0.157, simple_loss=0.1774, pruned_loss=0.05845, audio_tagging_loss=0.009892, over 15827.00 frames. ], tot_loss[loss=0.1097, simple_loss=0.1216, pruned_loss=0.03721, audio_tagging_loss=0.01171, over 3040124.33 frames. ], batch size: 58, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:29:31,759 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 9.396e+01 1.043e+02 1.149e+02 1.593e+02, threshold=2.087e+02, percent-clipped=0.0 2023-11-18 13:29:31,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=248120.0, ans=0.125 2023-11-18 13:29:38,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=248186.66666666666, ans=0.125 2023-11-18 13:29:52,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=248253.33333333334, ans=0.0 2023-11-18 13:30:21,172 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1200, loss[loss=0.09895, simple_loss=0.103, pruned_loss=0.03293, audio_tagging_loss=0.0145, over 14774.00 frames. ], tot_loss[loss=0.1095, simple_loss=0.1214, pruned_loss=0.03702, audio_tagging_loss=0.01171, over 3041220.57 frames. ], batch size: 56, lr: 1.74e-02, grad_scale: 32.0 2023-11-18 13:30:41,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=248586.66666666666, ans=0.125 2023-11-18 13:30:49,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=248586.66666666666, ans=0.125 2023-11-18 13:30:52,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=248653.33333333334, ans=0.125 2023-11-18 13:30:57,462 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.57 vs. limit=22.5 2023-11-18 13:31:00,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=248653.33333333334, ans=0.125 2023-11-18 13:31:13,362 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.66 vs. limit=15.0 2023-11-18 13:31:16,181 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1250, loss[loss=0.06359, simple_loss=0.06457, pruned_loss=0.01947, audio_tagging_loss=0.01183, over 14852.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1214, pruned_loss=0.03696, audio_tagging_loss=0.01176, over 3046242.29 frames. ], batch size: 60, lr: 1.74e-02, grad_scale: 32.0 2023-11-18 13:31:23,567 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.729e+01 9.534e+01 1.061e+02 1.217e+02 1.836e+02, threshold=2.122e+02, percent-clipped=0.0 2023-11-18 13:31:23,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=248786.66666666666, ans=0.125 2023-11-18 13:31:39,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=248920.0, ans=0.1 2023-11-18 13:31:44,863 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.11 vs. limit=15.0 2023-11-18 13:31:50,863 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.34 vs. limit=5.0 2023-11-18 13:31:57,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=248986.66666666666, ans=0.1 2023-11-18 13:32:11,683 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1300, loss[loss=0.1269, simple_loss=0.1372, pruned_loss=0.04684, audio_tagging_loss=0.01149, over 16380.00 frames. ], tot_loss[loss=0.1088, simple_loss=0.1208, pruned_loss=0.0367, audio_tagging_loss=0.01173, over 3051500.77 frames. ], batch size: 61, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:32:16,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=249120.0, ans=0.125 2023-11-18 13:32:25,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=249186.66666666666, ans=0.1 2023-11-18 13:33:08,242 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1350, loss[loss=0.1412, simple_loss=0.1628, pruned_loss=0.05103, audio_tagging_loss=0.008783, over 16983.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1204, pruned_loss=0.03666, audio_tagging_loss=0.01172, over 3052074.10 frames. ], batch size: 60, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:33:16,725 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.34 vs. limit=15.0 2023-11-18 13:33:17,323 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.519e+01 9.738e+01 1.103e+02 1.190e+02 1.796e+02, threshold=2.206e+02, percent-clipped=0.0 2023-11-18 13:33:17,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=249453.33333333334, ans=0.0 2023-11-18 13:33:29,689 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2023-11-18 13:33:35,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=249586.66666666666, ans=0.125 2023-11-18 13:33:35,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=249586.66666666666, ans=0.0 2023-11-18 13:33:36,962 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=18.03 vs. limit=22.5 2023-11-18 13:33:41,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=249653.33333333334, ans=0.1 2023-11-18 13:33:41,615 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:33:47,427 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:33:51,878 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.79 vs. limit=15.0 2023-11-18 13:33:52,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=249720.0, ans=0.125 2023-11-18 13:34:04,457 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1400, loss[loss=0.09282, simple_loss=0.09958, pruned_loss=0.03163, audio_tagging_loss=0.01141, over 14762.00 frames. ], tot_loss[loss=0.1096, simple_loss=0.1216, pruned_loss=0.03702, audio_tagging_loss=0.01176, over 3054018.86 frames. ], batch size: 56, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:34:09,186 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.12 vs. limit=15.0 2023-11-18 13:34:17,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=249853.33333333334, ans=0.125 2023-11-18 13:34:21,426 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:34:22,754 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.04 vs. limit=15.0 2023-11-18 13:34:46,833 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.57 vs. limit=15.0 2023-11-18 13:34:48,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=250053.33333333334, ans=0.125 2023-11-18 13:35:00,076 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1450, loss[loss=0.1625, simple_loss=0.1886, pruned_loss=0.06185, audio_tagging_loss=0.00636, over 15597.00 frames. ], tot_loss[loss=0.1108, simple_loss=0.1228, pruned_loss=0.03747, audio_tagging_loss=0.01187, over 3054333.26 frames. ], batch size: 57, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:35:00,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=250120.0, ans=0.0 2023-11-18 13:35:09,004 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 9.516e+01 1.029e+02 1.105e+02 1.571e+02, threshold=2.057e+02, percent-clipped=0.0 2023-11-18 13:35:26,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=250253.33333333334, ans=0.125 2023-11-18 13:35:53,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=250386.66666666666, ans=0.0 2023-11-18 13:35:56,370 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1500, loss[loss=0.0977, simple_loss=0.1133, pruned_loss=0.02927, audio_tagging_loss=0.01179, over 14753.00 frames. ], tot_loss[loss=0.1118, simple_loss=0.1239, pruned_loss=0.03795, audio_tagging_loss=0.01188, over 3051209.68 frames. ], batch size: 54, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:36:13,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=250520.0, ans=0.125 2023-11-18 13:36:16,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=250520.0, ans=0.0 2023-11-18 13:36:18,836 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:36:29,312 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=12.0 2023-11-18 13:36:33,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=250653.33333333334, ans=0.0 2023-11-18 13:36:52,282 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1550, loss[loss=0.1068, simple_loss=0.1211, pruned_loss=0.03402, audio_tagging_loss=0.01226, over 15594.00 frames. ], tot_loss[loss=0.1106, simple_loss=0.1224, pruned_loss=0.03732, audio_tagging_loss=0.01208, over 3043567.19 frames. ], batch size: 59, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:36:54,906 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.04 vs. limit=22.5 2023-11-18 13:37:01,193 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 9.375e+01 1.072e+02 1.254e+02 1.823e+02, threshold=2.144e+02, percent-clipped=0.0 2023-11-18 13:37:09,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=250853.33333333334, ans=0.95 2023-11-18 13:37:09,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=250853.33333333334, ans=0.125 2023-11-18 13:37:10,770 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.65 vs. limit=15.0 2023-11-18 13:37:24,196 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:37:26,789 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.47 vs. limit=15.0 2023-11-18 13:37:47,454 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1600, loss[loss=0.1135, simple_loss=0.1174, pruned_loss=0.03996, audio_tagging_loss=0.01481, over 16593.00 frames. ], tot_loss[loss=0.112, simple_loss=0.1242, pruned_loss=0.03782, audio_tagging_loss=0.0121, over 3041451.07 frames. ], batch size: 61, lr: 1.73e-02, grad_scale: 32.0 2023-11-18 13:38:12,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=251253.33333333334, ans=0.125 2023-11-18 13:38:16,225 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.92 vs. limit=15.0 2023-11-18 13:38:34,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=251386.66666666666, ans=0.125 2023-11-18 13:38:34,537 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.28 vs. limit=22.5 2023-11-18 13:38:43,341 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1650, loss[loss=0.1134, simple_loss=0.1316, pruned_loss=0.03663, audio_tagging_loss=0.01098, over 15066.00 frames. ], tot_loss[loss=0.1122, simple_loss=0.1244, pruned_loss=0.03786, audio_tagging_loss=0.01209, over 3043228.17 frames. ], batch size: 57, lr: 1.73e-02, grad_scale: 32.0 2023-11-18 13:38:48,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=251453.33333333334, ans=0.0 2023-11-18 13:38:52,826 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 9.946e+01 1.090e+02 1.261e+02 1.677e+02, threshold=2.181e+02, percent-clipped=0.0 2023-11-18 13:39:09,512 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2023-11-18 13:39:16,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=251653.33333333334, ans=0.125 2023-11-18 13:39:39,287 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1700, loss[loss=0.1184, simple_loss=0.1341, pruned_loss=0.0427, audio_tagging_loss=0.008619, over 14439.00 frames. ], tot_loss[loss=0.111, simple_loss=0.1232, pruned_loss=0.03725, audio_tagging_loss=0.01211, over 3051368.47 frames. ], batch size: 52, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:39:57,422 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:40:05,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=251920.0, ans=0.125 2023-11-18 13:40:12,622 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=9.05 vs. limit=12.0 2023-11-18 13:40:15,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=251986.66666666666, ans=0.125 2023-11-18 13:40:35,212 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1750, loss[loss=0.09525, simple_loss=0.106, pruned_loss=0.03099, audio_tagging_loss=0.01127, over 15823.00 frames. ], tot_loss[loss=0.1112, simple_loss=0.1236, pruned_loss=0.0374, audio_tagging_loss=0.01199, over 3050859.32 frames. ], batch size: 57, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:40:39,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=252120.0, ans=0.1 2023-11-18 13:40:39,660 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-18 13:40:43,904 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.10 vs. limit=15.0 2023-11-18 13:40:44,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=252120.0, ans=0.125 2023-11-18 13:40:45,323 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 9.260e+01 1.013e+02 1.177e+02 1.598e+02, threshold=2.026e+02, percent-clipped=0.0 2023-11-18 13:40:45,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=252186.66666666666, ans=0.2 2023-11-18 13:41:07,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=252320.0, ans=0.04949747468305833 2023-11-18 13:41:10,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.91 vs. limit=12.0 2023-11-18 13:41:31,150 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1800, loss[loss=0.09224, simple_loss=0.1062, pruned_loss=0.02917, audio_tagging_loss=0.009964, over 14514.00 frames. ], tot_loss[loss=0.1114, simple_loss=0.124, pruned_loss=0.0376, audio_tagging_loss=0.01184, over 3046869.96 frames. ], batch size: 57, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:41:44,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=252520.0, ans=0.0 2023-11-18 13:41:53,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=252586.66666666666, ans=0.125 2023-11-18 13:42:27,627 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1850, loss[loss=0.08356, simple_loss=0.08548, pruned_loss=0.02524, audio_tagging_loss=0.01558, over 14013.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1225, pruned_loss=0.03692, audio_tagging_loss=0.01184, over 3038508.08 frames. ], batch size: 54, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:42:33,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=252786.66666666666, ans=0.1 2023-11-18 13:42:37,088 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 9.907e+01 1.064e+02 1.171e+02 1.741e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-18 13:43:18,350 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.21 vs. limit=22.5 2023-11-18 13:43:22,178 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1900, loss[loss=0.1311, simple_loss=0.1587, pruned_loss=0.04347, audio_tagging_loss=0.008319, over 15560.00 frames. ], tot_loss[loss=0.1109, simple_loss=0.1236, pruned_loss=0.03732, audio_tagging_loss=0.01176, over 3039085.34 frames. ], batch size: 58, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:43:22,776 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.38 vs. limit=15.0 2023-11-18 13:43:23,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=253120.0, ans=0.125 2023-11-18 13:43:48,213 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=12.0 2023-11-18 13:44:03,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=253320.0, ans=0.125 2023-11-18 13:44:10,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=253386.66666666666, ans=0.125 2023-11-18 13:44:11,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=253386.66666666666, ans=0.2 2023-11-18 13:44:18,709 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 1950, loss[loss=0.0916, simple_loss=0.1086, pruned_loss=0.02616, audio_tagging_loss=0.01115, over 15183.00 frames. ], tot_loss[loss=0.1107, simple_loss=0.1234, pruned_loss=0.03729, audio_tagging_loss=0.01169, over 3040259.17 frames. ], batch size: 55, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:44:21,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=253453.33333333334, ans=0.5 2023-11-18 13:44:21,434 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.03 vs. limit=15.0 2023-11-18 13:44:28,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=253453.33333333334, ans=0.125 2023-11-18 13:44:29,460 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 9.223e+01 1.021e+02 1.142e+02 1.490e+02, threshold=2.042e+02, percent-clipped=0.0 2023-11-18 13:44:59,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=253653.33333333334, ans=0.125 2023-11-18 13:45:03,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=253720.0, ans=0.125 2023-11-18 13:45:15,422 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2000, loss[loss=0.1385, simple_loss=0.14, pruned_loss=0.05523, audio_tagging_loss=0.01331, over 15508.00 frames. ], tot_loss[loss=0.1102, simple_loss=0.1224, pruned_loss=0.03713, audio_tagging_loss=0.01185, over 3038263.63 frames. ], batch size: 59, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:45:27,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=253853.33333333334, ans=0.125 2023-11-18 13:45:55,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=253986.66666666666, ans=0.0 2023-11-18 13:45:59,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=254053.33333333334, ans=0.2 2023-11-18 13:46:01,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=254053.33333333334, ans=0.0 2023-11-18 13:46:10,806 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2050, loss[loss=0.08874, simple_loss=0.1031, pruned_loss=0.02441, audio_tagging_loss=0.01277, over 15416.00 frames. ], tot_loss[loss=0.1105, simple_loss=0.1228, pruned_loss=0.03717, audio_tagging_loss=0.01194, over 3039973.48 frames. ], batch size: 57, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:46:20,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=254186.66666666666, ans=0.2 2023-11-18 13:46:21,820 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 9.338e+01 1.033e+02 1.135e+02 2.200e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 13:46:32,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=254253.33333333334, ans=0.0 2023-11-18 13:46:33,658 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:46:35,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=254253.33333333334, ans=0.2 2023-11-18 13:46:49,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=254320.0, ans=0.0 2023-11-18 13:46:57,695 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2023-11-18 13:46:57,700 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2023-11-18 13:47:01,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=254386.66666666666, ans=0.125 2023-11-18 13:47:04,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=254386.66666666666, ans=0.0 2023-11-18 13:47:06,220 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2100, loss[loss=0.07067, simple_loss=0.07107, pruned_loss=0.02009, audio_tagging_loss=0.01505, over 16056.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1223, pruned_loss=0.03686, audio_tagging_loss=0.01197, over 3049065.50 frames. ], batch size: 64, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:47:22,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=254520.0, ans=0.2 2023-11-18 13:47:24,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=254520.0, ans=0.1 2023-11-18 13:47:39,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=254653.33333333334, ans=0.2 2023-11-18 13:47:39,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=254653.33333333334, ans=0.125 2023-11-18 13:47:40,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=254653.33333333334, ans=0.2 2023-11-18 13:47:42,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=254653.33333333334, ans=0.0 2023-11-18 13:48:03,509 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2150, loss[loss=0.1163, simple_loss=0.1368, pruned_loss=0.03801, audio_tagging_loss=0.009847, over 15091.00 frames. ], tot_loss[loss=0.1099, simple_loss=0.1222, pruned_loss=0.03684, audio_tagging_loss=0.01192, over 3043336.72 frames. ], batch size: 55, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:48:04,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=254786.66666666666, ans=0.125 2023-11-18 13:48:14,125 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.668e+01 9.557e+01 1.080e+02 1.239e+02 1.582e+02, threshold=2.161e+02, percent-clipped=1.0 2023-11-18 13:48:21,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=254853.33333333334, ans=0.0 2023-11-18 13:48:26,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=254920.0, ans=0.125 2023-11-18 13:48:31,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=254920.0, ans=0.0 2023-11-18 13:48:36,056 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:48:58,276 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2200, loss[loss=0.1731, simple_loss=0.1971, pruned_loss=0.06764, audio_tagging_loss=0.006928, over 16251.00 frames. ], tot_loss[loss=0.1101, simple_loss=0.1223, pruned_loss=0.03701, audio_tagging_loss=0.01198, over 3052065.28 frames. ], batch size: 59, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:49:01,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=255120.0, ans=0.125 2023-11-18 13:49:20,945 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:49:20,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=255253.33333333334, ans=0.1 2023-11-18 13:49:25,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=255253.33333333334, ans=0.1 2023-11-18 13:49:31,220 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.29 vs. limit=15.0 2023-11-18 13:49:36,760 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=25.06 vs. limit=22.5 2023-11-18 13:49:37,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=255320.0, ans=0.2 2023-11-18 13:49:39,085 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.38 vs. limit=10.0 2023-11-18 13:49:51,095 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.24 vs. limit=15.0 2023-11-18 13:49:53,795 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2250, loss[loss=0.1028, simple_loss=0.119, pruned_loss=0.03391, audio_tagging_loss=0.009417, over 15883.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1212, pruned_loss=0.03676, audio_tagging_loss=0.01203, over 3047460.63 frames. ], batch size: 58, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:50:01,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=255453.33333333334, ans=0.0 2023-11-18 13:50:05,602 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.545e+01 9.450e+01 1.063e+02 1.205e+02 1.681e+02, threshold=2.126e+02, percent-clipped=0.0 2023-11-18 13:50:29,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=255653.33333333334, ans=0.2 2023-11-18 13:50:30,864 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.40 vs. limit=6.0 2023-11-18 13:50:32,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=255653.33333333334, ans=10.0 2023-11-18 13:50:35,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=255653.33333333334, ans=0.015 2023-11-18 13:50:38,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=255720.0, ans=0.0 2023-11-18 13:50:41,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=255720.0, ans=0.0 2023-11-18 13:50:50,931 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2300, loss[loss=0.1288, simple_loss=0.1277, pruned_loss=0.04815, audio_tagging_loss=0.01677, over 15486.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1218, pruned_loss=0.03702, audio_tagging_loss=0.01204, over 3053551.72 frames. ], batch size: 61, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:50:56,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=255786.66666666666, ans=0.0 2023-11-18 13:51:03,379 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.11 vs. limit=12.0 2023-11-18 13:51:09,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=255853.33333333334, ans=0.125 2023-11-18 13:51:09,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=255853.33333333334, ans=0.125 2023-11-18 13:51:10,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=255853.33333333334, ans=0.07 2023-11-18 13:51:17,482 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.08 vs. limit=15.0 2023-11-18 13:51:23,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=255986.66666666666, ans=0.125 2023-11-18 13:51:24,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=255986.66666666666, ans=0.125 2023-11-18 13:51:28,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=255986.66666666666, ans=0.07 2023-11-18 13:51:39,955 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:51:46,285 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2350, loss[loss=0.127, simple_loss=0.1427, pruned_loss=0.04179, audio_tagging_loss=0.01388, over 15108.00 frames. ], tot_loss[loss=0.1103, simple_loss=0.1223, pruned_loss=0.0371, audio_tagging_loss=0.01204, over 3057664.78 frames. ], batch size: 57, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:51:47,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=256120.0, ans=0.1 2023-11-18 13:51:57,478 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 9.372e+01 1.028e+02 1.162e+02 1.776e+02, threshold=2.057e+02, percent-clipped=0.0 2023-11-18 13:52:03,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=256186.66666666666, ans=0.2 2023-11-18 13:52:05,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=256186.66666666666, ans=0.1 2023-11-18 13:52:07,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=256253.33333333334, ans=0.125 2023-11-18 13:52:13,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=256253.33333333334, ans=0.2 2023-11-18 13:52:26,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=256320.0, ans=0.1 2023-11-18 13:52:27,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=256320.0, ans=0.1 2023-11-18 13:52:33,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=256386.66666666666, ans=0.125 2023-11-18 13:52:42,247 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2400, loss[loss=0.1172, simple_loss=0.1284, pruned_loss=0.04237, audio_tagging_loss=0.01059, over 14999.00 frames. ], tot_loss[loss=0.1105, simple_loss=0.1223, pruned_loss=0.03721, audio_tagging_loss=0.01212, over 3051389.09 frames. ], batch size: 55, lr: 1.72e-02, grad_scale: 32.0 2023-11-18 13:52:45,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=256453.33333333334, ans=0.0 2023-11-18 13:52:46,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=256453.33333333334, ans=0.125 2023-11-18 13:52:53,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=256520.0, ans=0.1 2023-11-18 13:53:09,944 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.48 vs. limit=22.5 2023-11-18 13:53:16,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=256653.33333333334, ans=0.1 2023-11-18 13:53:17,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=256653.33333333334, ans=0.0 2023-11-18 13:53:19,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=256653.33333333334, ans=0.0 2023-11-18 13:53:23,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff2.min_abs, batch_count=256653.33333333334, ans=0.1 2023-11-18 13:53:38,564 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2450, loss[loss=0.1118, simple_loss=0.1286, pruned_loss=0.03585, audio_tagging_loss=0.01163, over 14742.00 frames. ], tot_loss[loss=0.1109, simple_loss=0.123, pruned_loss=0.0373, audio_tagging_loss=0.01214, over 3049476.00 frames. ], batch size: 56, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:53:49,115 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.57 vs. limit=12.0 2023-11-18 13:53:49,526 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 9.544e+01 1.043e+02 1.156e+02 1.781e+02, threshold=2.086e+02, percent-clipped=0.0 2023-11-18 13:53:52,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=256853.33333333334, ans=0.125 2023-11-18 13:54:03,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.86 vs. limit=10.0 2023-11-18 13:54:03,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=256920.0, ans=0.125 2023-11-18 13:54:13,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=256986.66666666666, ans=0.0 2023-11-18 13:54:13,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=256986.66666666666, ans=0.0 2023-11-18 13:54:13,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=256986.66666666666, ans=0.0 2023-11-18 13:54:24,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=257053.33333333334, ans=0.0 2023-11-18 13:54:26,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=257053.33333333334, ans=0.125 2023-11-18 13:54:33,905 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2500, loss[loss=0.1228, simple_loss=0.1435, pruned_loss=0.04147, audio_tagging_loss=0.009573, over 15108.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.1211, pruned_loss=0.03633, audio_tagging_loss=0.01218, over 3051467.72 frames. ], batch size: 55, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:54:34,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=257120.0, ans=0.09899494936611666 2023-11-18 13:54:35,745 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.57 vs. limit=6.0 2023-11-18 13:55:07,892 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.66 vs. limit=22.5 2023-11-18 13:55:11,942 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=257320.0, ans=0.1 2023-11-18 13:55:29,859 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2550, loss[loss=0.1063, simple_loss=0.119, pruned_loss=0.03649, audio_tagging_loss=0.01032, over 14375.00 frames. ], tot_loss[loss=0.1088, simple_loss=0.121, pruned_loss=0.03628, audio_tagging_loss=0.01204, over 3054375.35 frames. ], batch size: 55, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:55:35,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=257453.33333333334, ans=0.0 2023-11-18 13:55:40,565 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.163e+01 9.922e+01 1.114e+02 1.302e+02 1.822e+02, threshold=2.229e+02, percent-clipped=0.0 2023-11-18 13:56:21,506 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.99 vs. limit=10.0 2023-11-18 13:56:25,576 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2600, loss[loss=0.08191, simple_loss=0.08932, pruned_loss=0.02435, audio_tagging_loss=0.0129, over 15251.00 frames. ], tot_loss[loss=0.1085, simple_loss=0.1207, pruned_loss=0.03628, audio_tagging_loss=0.01193, over 3059896.35 frames. ], batch size: 60, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:56:29,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=257786.66666666666, ans=0.125 2023-11-18 13:56:31,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=257786.66666666666, ans=0.125 2023-11-18 13:56:35,116 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.62 vs. limit=12.0 2023-11-18 13:56:44,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=257853.33333333334, ans=0.125 2023-11-18 13:57:09,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=257986.66666666666, ans=0.0 2023-11-18 13:57:14,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=258053.33333333334, ans=0.0 2023-11-18 13:57:21,395 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2650, loss[loss=0.1105, simple_loss=0.1208, pruned_loss=0.03711, audio_tagging_loss=0.01302, over 15481.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1205, pruned_loss=0.0362, audio_tagging_loss=0.01189, over 3053443.24 frames. ], batch size: 59, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:57:32,558 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.000e+01 9.528e+01 1.033e+02 1.143e+02 1.471e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 13:57:35,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=258186.66666666666, ans=0.125 2023-11-18 13:57:44,388 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2023-11-18 13:57:47,533 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.67 vs. limit=15.0 2023-11-18 13:57:54,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=258320.0, ans=0.2 2023-11-18 13:58:04,474 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.01 vs. limit=22.5 2023-11-18 13:58:13,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=258386.66666666666, ans=0.125 2023-11-18 13:58:17,145 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2700, loss[loss=0.1337, simple_loss=0.1509, pruned_loss=0.04711, audio_tagging_loss=0.01116, over 14071.00 frames. ], tot_loss[loss=0.1085, simple_loss=0.1205, pruned_loss=0.03632, audio_tagging_loss=0.01189, over 3051145.65 frames. ], batch size: 54, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:58:22,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=258453.33333333334, ans=0.125 2023-11-18 13:59:00,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=258653.33333333334, ans=0.2 2023-11-18 13:59:13,214 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2750, loss[loss=0.1255, simple_loss=0.1424, pruned_loss=0.04464, audio_tagging_loss=0.009613, over 14365.00 frames. ], tot_loss[loss=0.108, simple_loss=0.1197, pruned_loss=0.03618, audio_tagging_loss=0.01191, over 3044954.02 frames. ], batch size: 55, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:59:24,866 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.704e+01 9.301e+01 1.031e+02 1.106e+02 1.514e+02, threshold=2.061e+02, percent-clipped=0.0 2023-11-18 13:59:31,811 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.42 vs. limit=22.5 2023-11-18 13:59:48,695 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=15.0 2023-11-18 14:00:00,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=259053.33333333334, ans=0.125 2023-11-18 14:00:01,474 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:00:08,824 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2800, loss[loss=0.1029, simple_loss=0.123, pruned_loss=0.03439, audio_tagging_loss=0.006996, over 14313.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1203, pruned_loss=0.03639, audio_tagging_loss=0.01187, over 3040325.38 frames. ], batch size: 53, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 14:00:34,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=259253.33333333334, ans=0.1 2023-11-18 14:01:04,439 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2850, loss[loss=0.09291, simple_loss=0.09286, pruned_loss=0.03246, audio_tagging_loss=0.01402, over 14194.00 frames. ], tot_loss[loss=0.1093, simple_loss=0.1214, pruned_loss=0.0369, audio_tagging_loss=0.01175, over 3042080.81 frames. ], batch size: 55, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 14:01:15,600 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 9.766e+01 1.049e+02 1.164e+02 1.614e+02, threshold=2.099e+02, percent-clipped=0.0 2023-11-18 14:01:18,107 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:01:39,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=259653.33333333334, ans=0.125 2023-11-18 14:01:40,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=259653.33333333334, ans=0.125 2023-11-18 14:01:48,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=259720.0, ans=0.025 2023-11-18 14:01:55,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=259720.0, ans=0.1 2023-11-18 14:02:00,193 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2900, loss[loss=0.08218, simple_loss=0.08075, pruned_loss=0.02363, audio_tagging_loss=0.01817, over 15545.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1218, pruned_loss=0.03677, audio_tagging_loss=0.0117, over 3042886.02 frames. ], batch size: 61, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:02:01,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=259786.66666666666, ans=0.125 2023-11-18 14:02:01,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=259786.66666666666, ans=0.125 2023-11-18 14:02:22,901 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2023-11-18 14:02:27,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=259920.0, ans=0.125 2023-11-18 14:02:32,392 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2023-11-18 14:02:38,175 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.25 vs. limit=10.0 2023-11-18 14:02:40,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=259986.66666666666, ans=0.125 2023-11-18 14:02:56,626 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 2950, loss[loss=0.1263, simple_loss=0.1508, pruned_loss=0.04268, audio_tagging_loss=0.008212, over 15353.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1217, pruned_loss=0.03671, audio_tagging_loss=0.0117, over 3045156.60 frames. ], batch size: 58, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:03:04,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=260120.0, ans=0.2 2023-11-18 14:03:07,250 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 9.370e+01 1.013e+02 1.101e+02 1.808e+02, threshold=2.027e+02, percent-clipped=0.0 2023-11-18 14:03:12,075 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.39 vs. limit=15.0 2023-11-18 14:03:51,841 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3000, loss[loss=0.1183, simple_loss=0.1389, pruned_loss=0.0357, audio_tagging_loss=0.01316, over 16395.00 frames. ], tot_loss[loss=0.1099, simple_loss=0.1224, pruned_loss=0.03691, audio_tagging_loss=0.01174, over 3055231.74 frames. ], batch size: 64, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:03:51,842 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 14:04:13,955 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([3.7952, 2.5014, 2.8140, 3.4442, 2.5236, 2.7961, 3.0574, 3.1798], device='cuda:2') 2023-11-18 14:04:25,240 INFO [train_asr.py:1147] (2/4) Epoch 4, validation: loss=0.07718, simple_loss=0.06278, pruned_loss=0.01045, audio_tagging_loss=0.03534, over 4681554.00 frames. 2023-11-18 14:04:25,246 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 14:04:52,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=260586.66666666666, ans=0.125 2023-11-18 14:04:55,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=260586.66666666666, ans=0.125 2023-11-18 14:04:55,988 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.33 vs. limit=6.0 2023-11-18 14:05:07,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=260653.33333333334, ans=0.125 2023-11-18 14:05:13,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=260720.0, ans=0.0 2023-11-18 14:05:20,215 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3050, loss[loss=0.154, simple_loss=0.1729, pruned_loss=0.05643, audio_tagging_loss=0.01111, over 15323.00 frames. ], tot_loss[loss=0.1102, simple_loss=0.1225, pruned_loss=0.0371, audio_tagging_loss=0.01179, over 3049392.59 frames. ], batch size: 55, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:05:30,850 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.697e+01 9.501e+01 1.094e+02 1.227e+02 1.890e+02, threshold=2.188e+02, percent-clipped=0.0 2023-11-18 14:05:32,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=260853.33333333334, ans=0.2 2023-11-18 14:05:43,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=260920.0, ans=0.2 2023-11-18 14:05:53,408 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:06:09,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=261053.33333333334, ans=0.02 2023-11-18 14:06:09,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=261053.33333333334, ans=0.1 2023-11-18 14:06:15,724 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3100, loss[loss=0.1082, simple_loss=0.1313, pruned_loss=0.03233, audio_tagging_loss=0.01019, over 14933.00 frames. ], tot_loss[loss=0.1102, simple_loss=0.1227, pruned_loss=0.03705, audio_tagging_loss=0.01178, over 3044082.70 frames. ], batch size: 56, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:06:36,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=261186.66666666666, ans=0.0 2023-11-18 14:06:41,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=261253.33333333334, ans=0.1 2023-11-18 14:06:43,751 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.25 vs. limit=12.0 2023-11-18 14:06:46,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=261253.33333333334, ans=0.025 2023-11-18 14:06:58,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=261320.0, ans=0.1 2023-11-18 14:07:11,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=261453.33333333334, ans=0.0 2023-11-18 14:07:12,445 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3150, loss[loss=0.0946, simple_loss=0.1002, pruned_loss=0.03005, audio_tagging_loss=0.01445, over 15725.00 frames. ], tot_loss[loss=0.1112, simple_loss=0.1244, pruned_loss=0.03723, audio_tagging_loss=0.01178, over 3049789.20 frames. ], batch size: 59, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:07:14,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=261453.33333333334, ans=0.125 2023-11-18 14:07:24,260 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 9.617e+01 1.054e+02 1.142e+02 1.769e+02, threshold=2.109e+02, percent-clipped=0.0 2023-11-18 14:07:40,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=261586.66666666666, ans=0.0 2023-11-18 14:07:47,686 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.59 vs. limit=6.0 2023-11-18 14:07:50,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=261653.33333333334, ans=0.125 2023-11-18 14:08:09,111 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3200, loss[loss=0.09623, simple_loss=0.1065, pruned_loss=0.03145, audio_tagging_loss=0.01151, over 14896.00 frames. ], tot_loss[loss=0.1115, simple_loss=0.1245, pruned_loss=0.03738, audio_tagging_loss=0.01185, over 3057268.09 frames. ], batch size: 57, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:08:37,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=261920.0, ans=0.1 2023-11-18 14:08:43,459 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.06 vs. limit=15.0 2023-11-18 14:08:51,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=261986.66666666666, ans=0.125 2023-11-18 14:09:04,136 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3250, loss[loss=0.1226, simple_loss=0.1402, pruned_loss=0.04416, audio_tagging_loss=0.008394, over 15893.00 frames. ], tot_loss[loss=0.1106, simple_loss=0.1231, pruned_loss=0.03703, audio_tagging_loss=0.012, over 3055828.32 frames. ], batch size: 57, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:09:12,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=262120.0, ans=0.125 2023-11-18 14:09:15,321 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.287e+01 9.218e+01 1.067e+02 1.190e+02 1.746e+02, threshold=2.133e+02, percent-clipped=0.0 2023-11-18 14:09:21,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=262186.6666666667, ans=0.125 2023-11-18 14:09:27,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=262253.3333333333, ans=0.125 2023-11-18 14:09:37,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=262320.0, ans=0.0 2023-11-18 14:09:56,401 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:09:59,321 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3300, loss[loss=0.1273, simple_loss=0.1544, pruned_loss=0.03981, audio_tagging_loss=0.01032, over 16469.00 frames. ], tot_loss[loss=0.1101, simple_loss=0.1227, pruned_loss=0.03677, audio_tagging_loss=0.01197, over 3056227.52 frames. ], batch size: 60, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:10:09,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=262453.3333333333, ans=0.125 2023-11-18 14:10:11,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=262520.0, ans=0.125 2023-11-18 14:10:17,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=262520.0, ans=0.2 2023-11-18 14:10:37,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=262653.3333333333, ans=0.04949747468305833 2023-11-18 14:10:38,501 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.19 vs. limit=15.0 2023-11-18 14:10:56,114 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.95 vs. limit=12.0 2023-11-18 14:10:56,640 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3350, loss[loss=0.08954, simple_loss=0.09419, pruned_loss=0.02998, audio_tagging_loss=0.01246, over 14057.00 frames. ], tot_loss[loss=0.1101, simple_loss=0.1228, pruned_loss=0.03674, audio_tagging_loss=0.01193, over 3057381.68 frames. ], batch size: 53, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:11:07,053 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.720e+01 9.463e+01 1.035e+02 1.183e+02 1.659e+02, threshold=2.070e+02, percent-clipped=0.0 2023-11-18 14:11:31,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=262986.6666666667, ans=10.0 2023-11-18 14:11:38,276 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.61 vs. limit=22.5 2023-11-18 14:11:41,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=263053.3333333333, ans=0.125 2023-11-18 14:11:51,602 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3400, loss[loss=0.08451, simple_loss=0.09444, pruned_loss=0.02779, audio_tagging_loss=0.009505, over 15137.00 frames. ], tot_loss[loss=0.11, simple_loss=0.123, pruned_loss=0.03667, audio_tagging_loss=0.0118, over 3056099.34 frames. ], batch size: 58, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:12:02,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=263186.6666666667, ans=0.125 2023-11-18 14:12:29,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=263320.0, ans=0.125 2023-11-18 14:12:42,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=263386.6666666667, ans=0.0 2023-11-18 14:12:44,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=263386.6666666667, ans=0.95 2023-11-18 14:12:47,603 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3450, loss[loss=0.0865, simple_loss=0.102, pruned_loss=0.02505, audio_tagging_loss=0.01045, over 14961.00 frames. ], tot_loss[loss=0.1097, simple_loss=0.123, pruned_loss=0.03651, audio_tagging_loss=0.01167, over 3054012.68 frames. ], batch size: 54, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:12:55,303 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.24 vs. limit=10.0 2023-11-18 14:12:59,385 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 9.398e+01 1.018e+02 1.161e+02 1.639e+02, threshold=2.037e+02, percent-clipped=0.0 2023-11-18 14:13:03,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=263520.0, ans=0.0 2023-11-18 14:13:20,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=263653.3333333333, ans=0.1 2023-11-18 14:13:26,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=263653.3333333333, ans=0.0 2023-11-18 14:13:43,827 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.19 vs. limit=6.0 2023-11-18 14:13:44,354 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3500, loss[loss=0.09987, simple_loss=0.1144, pruned_loss=0.03262, audio_tagging_loss=0.01003, over 15266.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1225, pruned_loss=0.0364, audio_tagging_loss=0.01157, over 3056088.29 frames. ], batch size: 57, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:13:44,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=263786.6666666667, ans=10.0 2023-11-18 14:14:00,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=263853.3333333333, ans=0.2 2023-11-18 14:14:05,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=263920.0, ans=0.125 2023-11-18 14:14:11,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=263920.0, ans=0.125 2023-11-18 14:14:13,275 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:14:29,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=264053.3333333333, ans=0.2 2023-11-18 14:14:33,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264053.3333333333, ans=0.1 2023-11-18 14:14:35,326 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.33 vs. limit=15.0 2023-11-18 14:14:40,051 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3550, loss[loss=0.1014, simple_loss=0.111, pruned_loss=0.03176, audio_tagging_loss=0.01411, over 15102.00 frames. ], tot_loss[loss=0.1088, simple_loss=0.1219, pruned_loss=0.03623, audio_tagging_loss=0.01158, over 3054996.15 frames. ], batch size: 57, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:14:42,562 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.51 vs. limit=15.0 2023-11-18 14:14:47,917 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.36 vs. limit=10.0 2023-11-18 14:14:51,084 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 9.400e+01 1.087e+02 1.239e+02 1.521e+02, threshold=2.174e+02, percent-clipped=0.0 2023-11-18 14:14:52,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=264186.6666666667, ans=0.125 2023-11-18 14:14:54,571 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.537e-02 2023-11-18 14:15:09,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=264253.3333333333, ans=0.2 2023-11-18 14:15:25,884 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.73 vs. limit=15.0 2023-11-18 14:15:33,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=264386.6666666667, ans=0.125 2023-11-18 14:15:35,545 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3600, loss[loss=0.1147, simple_loss=0.1184, pruned_loss=0.04035, audio_tagging_loss=0.01519, over 14587.00 frames. ], tot_loss[loss=0.109, simple_loss=0.122, pruned_loss=0.03644, audio_tagging_loss=0.01159, over 3052887.07 frames. ], batch size: 55, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:15:56,288 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.04 vs. limit=15.0 2023-11-18 14:16:00,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=264586.6666666667, ans=0.125 2023-11-18 14:16:09,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=264653.3333333333, ans=0.0 2023-11-18 14:16:12,533 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:16:17,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=264653.3333333333, ans=0.125 2023-11-18 14:16:26,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264720.0, ans=0.1 2023-11-18 14:16:26,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=264720.0, ans=0.0 2023-11-18 14:16:32,062 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3650, loss[loss=0.1132, simple_loss=0.1284, pruned_loss=0.03627, audio_tagging_loss=0.01275, over 16288.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1223, pruned_loss=0.03657, audio_tagging_loss=0.01146, over 3052089.20 frames. ], batch size: 61, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:16:43,160 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 9.557e+01 1.072e+02 1.214e+02 1.788e+02, threshold=2.145e+02, percent-clipped=0.0 2023-11-18 14:16:46,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=264853.3333333333, ans=0.125 2023-11-18 14:16:53,932 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:16:57,762 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.04 vs. limit=22.5 2023-11-18 14:17:04,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=264986.6666666667, ans=0.0 2023-11-18 14:17:04,347 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.69 vs. limit=15.0 2023-11-18 14:17:07,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=264986.6666666667, ans=0.125 2023-11-18 14:17:13,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=264986.6666666667, ans=0.0 2023-11-18 14:17:16,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=265053.3333333333, ans=0.125 2023-11-18 14:17:25,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=265053.3333333333, ans=0.07 2023-11-18 14:17:27,643 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3700, loss[loss=0.1189, simple_loss=0.1314, pruned_loss=0.04112, audio_tagging_loss=0.01213, over 14359.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1232, pruned_loss=0.03699, audio_tagging_loss=0.01142, over 3049612.02 frames. ], batch size: 53, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:17:32,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=265120.0, ans=0.0 2023-11-18 14:17:37,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=265186.6666666667, ans=0.2 2023-11-18 14:17:44,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=265186.6666666667, ans=0.0 2023-11-18 14:18:02,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=265320.0, ans=0.2 2023-11-18 14:18:14,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=265386.6666666667, ans=0.125 2023-11-18 14:18:23,559 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3750, loss[loss=0.1553, simple_loss=0.1632, pruned_loss=0.06156, audio_tagging_loss=0.01211, over 16365.00 frames. ], tot_loss[loss=0.1111, simple_loss=0.1243, pruned_loss=0.0375, audio_tagging_loss=0.01147, over 3054968.17 frames. ], batch size: 59, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:18:27,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=265453.3333333333, ans=0.125 2023-11-18 14:18:34,677 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.605e+01 1.034e+02 1.153e+02 1.284e+02 1.931e+02, threshold=2.306e+02, percent-clipped=0.0 2023-11-18 14:18:51,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=265586.6666666667, ans=0.0 2023-11-18 14:19:00,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=265653.3333333333, ans=0.1 2023-11-18 14:19:02,316 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:19:02,913 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2023-11-18 14:19:09,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=265720.0, ans=0.1 2023-11-18 14:19:19,912 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3800, loss[loss=0.11, simple_loss=0.1321, pruned_loss=0.0352, audio_tagging_loss=0.008715, over 15335.00 frames. ], tot_loss[loss=0.111, simple_loss=0.1243, pruned_loss=0.0373, audio_tagging_loss=0.01152, over 3050233.03 frames. ], batch size: 58, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:19:24,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=265786.6666666667, ans=0.1 2023-11-18 14:19:28,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=265786.6666666667, ans=0.5 2023-11-18 14:19:35,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=265853.3333333333, ans=0.0 2023-11-18 14:19:48,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=265920.0, ans=0.0 2023-11-18 14:19:52,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=265986.6666666667, ans=0.04949747468305833 2023-11-18 14:20:01,770 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.86 vs. limit=15.0 2023-11-18 14:20:15,030 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3850, loss[loss=0.1202, simple_loss=0.1313, pruned_loss=0.04082, audio_tagging_loss=0.01374, over 15408.00 frames. ], tot_loss[loss=0.1103, simple_loss=0.1233, pruned_loss=0.03689, audio_tagging_loss=0.01175, over 3052733.61 frames. ], batch size: 57, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:20:26,229 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.817e+01 9.423e+01 1.054e+02 1.147e+02 1.619e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 14:20:34,985 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.00 vs. limit=15.0 2023-11-18 14:20:36,115 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2023-11-18 14:20:45,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=266253.3333333333, ans=0.0 2023-11-18 14:20:56,021 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2023-11-18 14:21:05,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=266386.6666666667, ans=0.0 2023-11-18 14:21:10,653 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3900, loss[loss=0.09791, simple_loss=0.0989, pruned_loss=0.03053, audio_tagging_loss=0.01793, over 14025.00 frames. ], tot_loss[loss=0.1093, simple_loss=0.1221, pruned_loss=0.0364, audio_tagging_loss=0.01185, over 3043518.38 frames. ], batch size: 56, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:21:14,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=266453.3333333333, ans=0.0 2023-11-18 14:21:22,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=266520.0, ans=0.125 2023-11-18 14:21:30,893 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2023-11-18 14:21:36,125 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.73 vs. limit=15.0 2023-11-18 14:21:52,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=266653.3333333333, ans=0.125 2023-11-18 14:22:03,292 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.79 vs. limit=15.0 2023-11-18 14:22:10,118 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 3950, loss[loss=0.07807, simple_loss=0.08932, pruned_loss=0.02, audio_tagging_loss=0.01341, over 15155.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1218, pruned_loss=0.03635, audio_tagging_loss=0.01193, over 3039551.21 frames. ], batch size: 58, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:22:14,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=266786.6666666667, ans=0.0 2023-11-18 14:22:20,214 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.97 vs. limit=15.0 2023-11-18 14:22:20,676 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 9.384e+01 1.022e+02 1.131e+02 1.477e+02, threshold=2.044e+02, percent-clipped=0.0 2023-11-18 14:22:36,562 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.35 vs. limit=15.0 2023-11-18 14:22:50,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=266986.6666666667, ans=0.125 2023-11-18 14:22:52,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=266986.6666666667, ans=0.0 2023-11-18 14:22:59,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=267053.3333333333, ans=0.04949747468305833 2023-11-18 14:23:05,087 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4000, loss[loss=0.1115, simple_loss=0.115, pruned_loss=0.03604, audio_tagging_loss=0.01801, over 14880.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1217, pruned_loss=0.03648, audio_tagging_loss=0.01208, over 3044984.33 frames. ], batch size: 55, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:23:39,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=267320.0, ans=0.0 2023-11-18 14:23:52,528 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.15 vs. limit=12.0 2023-11-18 14:24:01,219 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4050, loss[loss=0.1291, simple_loss=0.169, pruned_loss=0.03655, audio_tagging_loss=0.00806, over 15813.00 frames. ], tot_loss[loss=0.1105, simple_loss=0.1232, pruned_loss=0.03688, audio_tagging_loss=0.012, over 3041729.75 frames. ], batch size: 56, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:24:03,475 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:24:07,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=267453.3333333333, ans=0.125 2023-11-18 14:24:12,555 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 9.604e+01 1.092e+02 1.269e+02 1.663e+02, threshold=2.185e+02, percent-clipped=0.0 2023-11-18 14:24:22,634 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.86 vs. limit=15.0 2023-11-18 14:24:26,982 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.26 vs. limit=15.0 2023-11-18 14:24:40,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=267653.3333333333, ans=0.0 2023-11-18 14:24:53,263 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.18 vs. limit=15.0 2023-11-18 14:24:57,451 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4100, loss[loss=0.09079, simple_loss=0.1002, pruned_loss=0.03131, audio_tagging_loss=0.009382, over 16114.00 frames. ], tot_loss[loss=0.1099, simple_loss=0.1227, pruned_loss=0.03665, audio_tagging_loss=0.01193, over 3042113.62 frames. ], batch size: 58, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:24:58,902 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.57 vs. limit=22.5 2023-11-18 14:25:03,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=267786.6666666667, ans=0.0 2023-11-18 14:25:03,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=267786.6666666667, ans=0.125 2023-11-18 14:25:04,823 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2023-11-18 14:25:07,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=267853.3333333333, ans=0.0 2023-11-18 14:25:13,219 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.12 vs. limit=15.0 2023-11-18 14:25:20,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=267920.0, ans=0.05 2023-11-18 14:25:22,464 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:25:31,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=267986.6666666667, ans=15.0 2023-11-18 14:25:34,826 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.29 vs. limit=15.0 2023-11-18 14:25:43,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=268053.3333333333, ans=0.125 2023-11-18 14:25:53,586 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4150, loss[loss=0.1047, simple_loss=0.1149, pruned_loss=0.03579, audio_tagging_loss=0.01145, over 15477.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1221, pruned_loss=0.03648, audio_tagging_loss=0.01185, over 3041096.15 frames. ], batch size: 59, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:26:04,177 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.914e+01 9.487e+01 1.055e+02 1.166e+02 1.501e+02, threshold=2.109e+02, percent-clipped=0.0 2023-11-18 14:26:04,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=268186.6666666667, ans=0.125 2023-11-18 14:26:19,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=268253.3333333333, ans=0.2 2023-11-18 14:26:25,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=268253.3333333333, ans=0.2 2023-11-18 14:26:33,421 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:26:48,309 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4200, loss[loss=0.09175, simple_loss=0.1071, pruned_loss=0.02763, audio_tagging_loss=0.01057, over 14737.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1231, pruned_loss=0.03662, audio_tagging_loss=0.01159, over 3048285.97 frames. ], batch size: 56, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:26:53,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=268453.3333333333, ans=0.1 2023-11-18 14:27:02,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=268520.0, ans=0.0 2023-11-18 14:27:05,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=268520.0, ans=0.125 2023-11-18 14:27:36,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=268720.0, ans=0.1 2023-11-18 14:27:44,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=268786.6666666667, ans=0.125 2023-11-18 14:27:44,856 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4250, loss[loss=0.09175, simple_loss=0.09591, pruned_loss=0.03073, audio_tagging_loss=0.01307, over 15530.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1226, pruned_loss=0.03636, audio_tagging_loss=0.0115, over 3045703.50 frames. ], batch size: 57, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:27:51,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=268786.6666666667, ans=0.125 2023-11-18 14:27:57,587 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 9.513e+01 1.037e+02 1.128e+02 1.811e+02, threshold=2.074e+02, percent-clipped=0.0 2023-11-18 14:28:25,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=268986.6666666667, ans=0.1 2023-11-18 14:28:27,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=268986.6666666667, ans=0.0 2023-11-18 14:28:32,973 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.32 vs. limit=15.0 2023-11-18 14:28:33,039 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.73 vs. limit=15.0 2023-11-18 14:28:37,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=269053.3333333333, ans=0.125 2023-11-18 14:28:41,086 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4300, loss[loss=0.101, simple_loss=0.1077, pruned_loss=0.03445, audio_tagging_loss=0.01273, over 15777.00 frames. ], tot_loss[loss=0.1085, simple_loss=0.1218, pruned_loss=0.03611, audio_tagging_loss=0.01143, over 3049847.38 frames. ], batch size: 60, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:28:48,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=269120.0, ans=0.125 2023-11-18 14:29:11,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=269253.3333333333, ans=0.125 2023-11-18 14:29:15,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=269320.0, ans=0.1 2023-11-18 14:29:22,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=269320.0, ans=0.05 2023-11-18 14:29:36,919 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4350, loss[loss=0.1028, simple_loss=0.1139, pruned_loss=0.03388, audio_tagging_loss=0.01199, over 14644.00 frames. ], tot_loss[loss=0.1088, simple_loss=0.1218, pruned_loss=0.03637, audio_tagging_loss=0.01151, over 3047970.75 frames. ], batch size: 56, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:29:42,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=269453.3333333333, ans=0.0 2023-11-18 14:29:48,991 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.972e+01 1.042e+02 1.123e+02 1.311e+02 1.927e+02, threshold=2.246e+02, percent-clipped=0.0 2023-11-18 14:29:49,674 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.39 vs. limit=10.0 2023-11-18 14:30:03,254 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.05 vs. limit=22.5 2023-11-18 14:30:13,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=269653.3333333333, ans=0.07 2023-11-18 14:30:13,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=269653.3333333333, ans=0.2 2023-11-18 14:30:26,127 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.82 vs. limit=6.0 2023-11-18 14:30:31,879 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4400, loss[loss=0.107, simple_loss=0.1098, pruned_loss=0.03812, audio_tagging_loss=0.01393, over 14973.00 frames. ], tot_loss[loss=0.1103, simple_loss=0.1235, pruned_loss=0.03716, audio_tagging_loss=0.01143, over 3044909.95 frames. ], batch size: 57, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:30:35,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=269786.6666666667, ans=0.1 2023-11-18 14:31:28,577 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4450, loss[loss=0.1245, simple_loss=0.1385, pruned_loss=0.04365, audio_tagging_loss=0.01166, over 15094.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1228, pruned_loss=0.03696, audio_tagging_loss=0.01149, over 3051957.45 frames. ], batch size: 55, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:31:40,207 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 9.826e+01 1.064e+02 1.191e+02 1.732e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-18 14:31:56,910 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:32:23,822 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4500, loss[loss=0.1318, simple_loss=0.1438, pruned_loss=0.05162, audio_tagging_loss=0.008241, over 15344.00 frames. ], tot_loss[loss=0.1102, simple_loss=0.1231, pruned_loss=0.03716, audio_tagging_loss=0.01151, over 3037310.09 frames. ], batch size: 55, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:32:38,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=270520.0, ans=0.0 2023-11-18 14:32:51,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=270586.6666666667, ans=0.09899494936611666 2023-11-18 14:32:56,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=270586.6666666667, ans=0.1 2023-11-18 14:33:05,360 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.90 vs. limit=15.0 2023-11-18 14:33:05,378 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=15.0 2023-11-18 14:33:12,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=270720.0, ans=0.0 2023-11-18 14:33:14,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=270720.0, ans=0.125 2023-11-18 14:33:20,080 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4550, loss[loss=0.1313, simple_loss=0.1428, pruned_loss=0.04848, audio_tagging_loss=0.01146, over 16064.00 frames. ], tot_loss[loss=0.1095, simple_loss=0.1219, pruned_loss=0.03695, audio_tagging_loss=0.0116, over 3046129.75 frames. ], batch size: 59, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:33:33,255 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.057e+01 9.598e+01 1.091e+02 1.194e+02 2.832e+02, threshold=2.183e+02, percent-clipped=1.0 2023-11-18 14:33:35,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=270853.3333333333, ans=0.1 2023-11-18 14:33:36,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=270853.3333333333, ans=0.125 2023-11-18 14:33:37,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=270853.3333333333, ans=0.1 2023-11-18 14:33:48,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=270920.0, ans=0.125 2023-11-18 14:34:00,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=270986.6666666667, ans=0.0 2023-11-18 14:34:02,605 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:34:09,966 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=12.0 2023-11-18 14:34:17,072 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4600, loss[loss=0.09608, simple_loss=0.1016, pruned_loss=0.03309, audio_tagging_loss=0.01219, over 14775.00 frames. ], tot_loss[loss=0.1096, simple_loss=0.1218, pruned_loss=0.03698, audio_tagging_loss=0.01172, over 3040083.28 frames. ], batch size: 57, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:34:17,640 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.65 vs. limit=15.0 2023-11-18 14:34:19,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=271120.0, ans=0.0 2023-11-18 14:34:42,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=271253.3333333333, ans=0.0 2023-11-18 14:35:02,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=271386.6666666667, ans=0.1 2023-11-18 14:35:12,026 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4650, loss[loss=0.07264, simple_loss=0.07257, pruned_loss=0.02099, audio_tagging_loss=0.01537, over 15586.00 frames. ], tot_loss[loss=0.1089, simple_loss=0.1212, pruned_loss=0.03648, audio_tagging_loss=0.01182, over 3043594.52 frames. ], batch size: 60, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:35:12,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=271453.3333333333, ans=0.125 2023-11-18 14:35:13,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=271453.3333333333, ans=0.0 2023-11-18 14:35:14,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=271453.3333333333, ans=0.125 2023-11-18 14:35:18,072 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.59 vs. limit=22.5 2023-11-18 14:35:24,106 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.025e+01 1.020e+02 1.140e+02 1.306e+02 2.124e+02, threshold=2.280e+02, percent-clipped=0.0 2023-11-18 14:35:31,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=271520.0, ans=0.125 2023-11-18 14:35:44,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=271586.6666666667, ans=10.0 2023-11-18 14:35:48,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=271653.3333333333, ans=0.125 2023-11-18 14:35:58,291 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.86 vs. limit=22.5 2023-11-18 14:36:00,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=271720.0, ans=0.0 2023-11-18 14:36:06,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=271786.6666666667, ans=0.125 2023-11-18 14:36:07,763 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4700, loss[loss=0.1132, simple_loss=0.1381, pruned_loss=0.03572, audio_tagging_loss=0.008442, over 15382.00 frames. ], tot_loss[loss=0.1093, simple_loss=0.1223, pruned_loss=0.03641, audio_tagging_loss=0.01178, over 3040979.32 frames. ], batch size: 58, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:36:21,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=271853.3333333333, ans=0.125 2023-11-18 14:36:24,970 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=12.0 2023-11-18 14:36:30,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=271920.0, ans=0.0 2023-11-18 14:36:34,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=271920.0, ans=0.125 2023-11-18 14:36:40,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=271986.6666666667, ans=0.125 2023-11-18 14:36:53,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=272053.3333333333, ans=0.125 2023-11-18 14:36:54,070 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-11-18 14:37:02,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=272053.3333333333, ans=0.2 2023-11-18 14:37:04,171 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4750, loss[loss=0.1367, simple_loss=0.1514, pruned_loss=0.05085, audio_tagging_loss=0.01018, over 15220.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.122, pruned_loss=0.03622, audio_tagging_loss=0.0119, over 3035407.07 frames. ], batch size: 54, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:37:08,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=272120.0, ans=0.125 2023-11-18 14:37:16,362 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 9.592e+01 1.080e+02 1.195e+02 1.652e+02, threshold=2.159e+02, percent-clipped=0.0 2023-11-18 14:37:43,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=272320.0, ans=0.025 2023-11-18 14:37:48,243 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.85 vs. limit=15.0 2023-11-18 14:37:50,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=272386.6666666667, ans=0.125 2023-11-18 14:37:59,705 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4800, loss[loss=0.1038, simple_loss=0.1199, pruned_loss=0.0326, audio_tagging_loss=0.01124, over 14658.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.1211, pruned_loss=0.03579, audio_tagging_loss=0.01199, over 3034636.35 frames. ], batch size: 57, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:38:06,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=272453.3333333333, ans=0.1 2023-11-18 14:38:11,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=272520.0, ans=0.2 2023-11-18 14:38:15,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=272520.0, ans=0.1 2023-11-18 14:38:15,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=272520.0, ans=0.0 2023-11-18 14:38:40,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=272653.3333333333, ans=0.125 2023-11-18 14:38:53,334 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.48 vs. limit=15.0 2023-11-18 14:38:55,226 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4850, loss[loss=0.1013, simple_loss=0.1185, pruned_loss=0.03155, audio_tagging_loss=0.01046, over 14920.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.12, pruned_loss=0.03541, audio_tagging_loss=0.01219, over 3034311.97 frames. ], batch size: 56, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:39:07,952 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.577e+01 9.439e+01 1.075e+02 1.233e+02 2.240e+02, threshold=2.150e+02, percent-clipped=1.0 2023-11-18 14:39:11,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=272853.3333333333, ans=0.0 2023-11-18 14:39:13,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=272853.3333333333, ans=0.0 2023-11-18 14:39:34,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=272986.6666666667, ans=0.2 2023-11-18 14:39:51,331 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4900, loss[loss=0.1165, simple_loss=0.1358, pruned_loss=0.03735, audio_tagging_loss=0.01123, over 14937.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.12, pruned_loss=0.03512, audio_tagging_loss=0.01212, over 3037565.50 frames. ], batch size: 56, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:40:07,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=273186.6666666667, ans=0.125 2023-11-18 14:40:37,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=273386.6666666667, ans=0.125 2023-11-18 14:40:40,573 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:40:46,586 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 4950, loss[loss=0.1193, simple_loss=0.1302, pruned_loss=0.04342, audio_tagging_loss=0.01075, over 15763.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1204, pruned_loss=0.03529, audio_tagging_loss=0.01192, over 3032608.17 frames. ], batch size: 59, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:40:53,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=273453.3333333333, ans=0.0 2023-11-18 14:40:58,631 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.761e+01 9.503e+01 1.074e+02 1.226e+02 1.825e+02, threshold=2.148e+02, percent-clipped=0.0 2023-11-18 14:41:03,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=273520.0, ans=0.125 2023-11-18 14:41:12,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=273586.6666666667, ans=0.0 2023-11-18 14:41:16,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=273586.6666666667, ans=0.09899494936611666 2023-11-18 14:41:21,040 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.24 vs. limit=15.0 2023-11-18 14:41:27,238 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.91 vs. limit=6.0 2023-11-18 14:41:42,364 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5000, loss[loss=0.08096, simple_loss=0.08765, pruned_loss=0.02016, audio_tagging_loss=0.01697, over 14951.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.1207, pruned_loss=0.03552, audio_tagging_loss=0.01188, over 3040760.01 frames. ], batch size: 56, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:41:42,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=273786.6666666667, ans=0.0 2023-11-18 14:41:49,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=273786.6666666667, ans=0.0 2023-11-18 14:41:50,648 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.78 vs. limit=15.0 2023-11-18 14:41:58,119 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.82 vs. limit=22.5 2023-11-18 14:42:06,652 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.25 vs. limit=10.0 2023-11-18 14:42:19,785 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.94 vs. limit=15.0 2023-11-18 14:42:38,357 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5050, loss[loss=0.1229, simple_loss=0.1355, pruned_loss=0.04764, audio_tagging_loss=0.007488, over 14234.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1228, pruned_loss=0.03618, audio_tagging_loss=0.01164, over 3040855.47 frames. ], batch size: 54, lr: 1.66e-02, grad_scale: 16.0 2023-11-18 14:42:40,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=274120.0, ans=0.125 2023-11-18 14:42:41,771 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:42:51,030 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 9.577e+01 1.097e+02 1.238e+02 1.791e+02, threshold=2.193e+02, percent-clipped=0.0 2023-11-18 14:42:56,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=274186.6666666667, ans=0.1 2023-11-18 14:42:57,851 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.34 vs. limit=10.0 2023-11-18 14:43:00,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=274253.3333333333, ans=0.125 2023-11-18 14:43:03,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=274253.3333333333, ans=0.125 2023-11-18 14:43:12,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=274320.0, ans=0.0 2023-11-18 14:43:32,742 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5100, loss[loss=0.1306, simple_loss=0.1452, pruned_loss=0.04785, audio_tagging_loss=0.01017, over 14468.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.1225, pruned_loss=0.03615, audio_tagging_loss=0.01168, over 3035977.73 frames. ], batch size: 55, lr: 1.66e-02, grad_scale: 16.0 2023-11-18 14:43:35,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=274453.3333333333, ans=0.0 2023-11-18 14:44:19,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=274720.0, ans=0.125 2023-11-18 14:44:27,772 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5150, loss[loss=0.1284, simple_loss=0.1425, pruned_loss=0.04518, audio_tagging_loss=0.01193, over 15086.00 frames. ], tot_loss[loss=0.1087, simple_loss=0.1218, pruned_loss=0.03596, audio_tagging_loss=0.0118, over 3038864.06 frames. ], batch size: 57, lr: 1.66e-02, grad_scale: 16.0 2023-11-18 14:44:41,459 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.141e+01 9.655e+01 1.078e+02 1.221e+02 1.622e+02, threshold=2.156e+02, percent-clipped=0.0 2023-11-18 14:44:52,177 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.56 vs. limit=15.0 2023-11-18 14:45:06,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=274986.6666666667, ans=0.1 2023-11-18 14:45:09,093 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.43 vs. limit=12.0 2023-11-18 14:45:22,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=275120.0, ans=0.05 2023-11-18 14:45:23,352 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5200, loss[loss=0.07668, simple_loss=0.08877, pruned_loss=0.02107, audio_tagging_loss=0.01123, over 16394.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.121, pruned_loss=0.03545, audio_tagging_loss=0.01175, over 3039426.40 frames. ], batch size: 62, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:45:26,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=275120.0, ans=0.125 2023-11-18 14:45:28,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=275120.0, ans=0.0 2023-11-18 14:45:47,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=275253.3333333333, ans=0.125 2023-11-18 14:46:12,781 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.60 vs. limit=6.0 2023-11-18 14:46:18,240 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5250, loss[loss=0.1028, simple_loss=0.1075, pruned_loss=0.03235, audio_tagging_loss=0.01665, over 15299.00 frames. ], tot_loss[loss=0.108, simple_loss=0.1214, pruned_loss=0.03562, audio_tagging_loss=0.01162, over 3046708.67 frames. ], batch size: 60, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:46:30,876 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.969e+01 9.429e+01 1.029e+02 1.136e+02 1.567e+02, threshold=2.057e+02, percent-clipped=0.0 2023-11-18 14:46:38,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=275520.0, ans=0.2 2023-11-18 14:46:40,668 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=15.0 2023-11-18 14:46:51,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=275653.3333333333, ans=0.125 2023-11-18 14:47:10,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=275720.0, ans=0.1 2023-11-18 14:47:12,126 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5300, loss[loss=0.1074, simple_loss=0.126, pruned_loss=0.03488, audio_tagging_loss=0.00952, over 15894.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.1213, pruned_loss=0.03595, audio_tagging_loss=0.01156, over 3045904.46 frames. ], batch size: 58, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:47:15,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=275786.6666666667, ans=0.0 2023-11-18 14:47:16,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=275786.6666666667, ans=0.125 2023-11-18 14:47:24,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=275853.3333333333, ans=10.0 2023-11-18 14:47:26,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=275853.3333333333, ans=0.1 2023-11-18 14:47:42,974 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.53 vs. limit=6.0 2023-11-18 14:47:45,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=275986.6666666667, ans=15.0 2023-11-18 14:47:49,258 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:47:51,731 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.97 vs. limit=6.0 2023-11-18 14:48:07,981 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5350, loss[loss=0.104, simple_loss=0.1181, pruned_loss=0.03261, audio_tagging_loss=0.01238, over 13635.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1228, pruned_loss=0.03638, audio_tagging_loss=0.01146, over 3044823.02 frames. ], batch size: 53, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:48:11,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=276120.0, ans=0.0 2023-11-18 14:48:21,237 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 9.718e+01 1.034e+02 1.191e+02 1.805e+02, threshold=2.068e+02, percent-clipped=0.0 2023-11-18 14:48:24,040 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.02 vs. limit=22.5 2023-11-18 14:48:49,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=276320.0, ans=0.0 2023-11-18 14:49:00,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=276386.6666666667, ans=0.125 2023-11-18 14:49:03,116 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5400, loss[loss=0.1203, simple_loss=0.1378, pruned_loss=0.03977, audio_tagging_loss=0.01162, over 14734.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1218, pruned_loss=0.03608, audio_tagging_loss=0.01148, over 3042515.21 frames. ], batch size: 55, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:49:08,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=276453.3333333333, ans=15.0 2023-11-18 14:49:14,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=276520.0, ans=0.125 2023-11-18 14:49:14,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=276520.0, ans=0.02 2023-11-18 14:49:33,538 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=15.0 2023-11-18 14:49:45,692 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.98 vs. limit=15.0 2023-11-18 14:49:45,738 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.87 vs. limit=15.0 2023-11-18 14:49:54,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=276720.0, ans=0.125 2023-11-18 14:49:57,656 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5450, loss[loss=0.09073, simple_loss=0.09501, pruned_loss=0.02932, audio_tagging_loss=0.0139, over 14453.00 frames. ], tot_loss[loss=0.1097, simple_loss=0.1227, pruned_loss=0.03676, audio_tagging_loss=0.01158, over 3040982.03 frames. ], batch size: 55, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:50:06,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=276786.6666666667, ans=0.125 2023-11-18 14:50:08,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=276853.3333333333, ans=0.125 2023-11-18 14:50:10,737 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 9.674e+01 1.094e+02 1.267e+02 1.723e+02, threshold=2.188e+02, percent-clipped=0.0 2023-11-18 14:50:19,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=276920.0, ans=0.05 2023-11-18 14:50:23,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=276920.0, ans=0.125 2023-11-18 14:50:32,048 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:50:32,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=276986.6666666667, ans=0.07 2023-11-18 14:50:34,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=276986.6666666667, ans=0.125 2023-11-18 14:50:48,857 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.47 vs. limit=22.5 2023-11-18 14:50:52,383 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5500, loss[loss=0.08332, simple_loss=0.08947, pruned_loss=0.02553, audio_tagging_loss=0.01305, over 14903.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1219, pruned_loss=0.0365, audio_tagging_loss=0.01174, over 3038473.90 frames. ], batch size: 57, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:51:29,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=277320.0, ans=0.1 2023-11-18 14:51:34,752 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:51:46,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=277453.3333333333, ans=0.0 2023-11-18 14:51:47,657 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5550, loss[loss=0.08885, simple_loss=0.09823, pruned_loss=0.02724, audio_tagging_loss=0.0125, over 15035.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1227, pruned_loss=0.03683, audio_tagging_loss=0.0118, over 3036939.18 frames. ], batch size: 59, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:52:00,292 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.658e+01 9.567e+01 1.041e+02 1.171e+02 1.468e+02, threshold=2.082e+02, percent-clipped=0.0 2023-11-18 14:52:03,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=277520.0, ans=0.0 2023-11-18 14:52:15,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=277586.6666666667, ans=0.0 2023-11-18 14:52:23,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=277653.3333333333, ans=0.1 2023-11-18 14:52:33,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=277720.0, ans=0.125 2023-11-18 14:52:41,984 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5600, loss[loss=0.1131, simple_loss=0.1255, pruned_loss=0.03502, audio_tagging_loss=0.01539, over 14084.00 frames. ], tot_loss[loss=0.1093, simple_loss=0.1221, pruned_loss=0.03626, audio_tagging_loss=0.01204, over 3037921.01 frames. ], batch size: 54, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:52:50,894 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.14 vs. limit=10.0 2023-11-18 14:52:59,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=277853.3333333333, ans=0.1 2023-11-18 14:53:07,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=277920.0, ans=0.2 2023-11-18 14:53:17,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=277986.6666666667, ans=0.07 2023-11-18 14:53:21,545 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:53:24,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=278053.3333333333, ans=0.125 2023-11-18 14:53:26,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=278053.3333333333, ans=0.1 2023-11-18 14:53:32,695 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.31 vs. limit=22.5 2023-11-18 14:53:36,760 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5650, loss[loss=0.121, simple_loss=0.1341, pruned_loss=0.03939, audio_tagging_loss=0.01455, over 15659.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.1217, pruned_loss=0.0362, audio_tagging_loss=0.01205, over 3044481.72 frames. ], batch size: 58, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:53:43,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=278120.0, ans=0.125 2023-11-18 14:53:50,490 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.691e+01 9.354e+01 1.022e+02 1.173e+02 1.530e+02, threshold=2.043e+02, percent-clipped=0.0 2023-11-18 14:53:51,126 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.26 vs. limit=22.5 2023-11-18 14:54:02,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=278253.3333333333, ans=0.125 2023-11-18 14:54:07,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=278253.3333333333, ans=0.125 2023-11-18 14:54:10,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=278320.0, ans=0.0 2023-11-18 14:54:16,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=278320.0, ans=0.1 2023-11-18 14:54:32,127 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5700, loss[loss=0.1191, simple_loss=0.128, pruned_loss=0.04164, audio_tagging_loss=0.01346, over 14705.00 frames. ], tot_loss[loss=0.1087, simple_loss=0.1213, pruned_loss=0.03605, audio_tagging_loss=0.01196, over 3042488.97 frames. ], batch size: 54, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:54:33,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=278453.3333333333, ans=0.0 2023-11-18 14:54:41,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=278520.0, ans=0.2 2023-11-18 14:54:44,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=278520.0, ans=0.0 2023-11-18 14:54:48,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=278520.0, ans=0.0 2023-11-18 14:54:50,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=278520.0, ans=0.0 2023-11-18 14:55:03,629 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2023-11-18 14:55:07,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=278653.3333333333, ans=0.125 2023-11-18 14:55:20,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=278720.0, ans=0.125 2023-11-18 14:55:27,008 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5750, loss[loss=0.1225, simple_loss=0.1421, pruned_loss=0.0417, audio_tagging_loss=0.009699, over 16197.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1214, pruned_loss=0.03607, audio_tagging_loss=0.01186, over 3041882.32 frames. ], batch size: 58, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:55:40,043 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.59 vs. limit=22.5 2023-11-18 14:55:40,247 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.975e+01 9.668e+01 1.031e+02 1.141e+02 1.503e+02, threshold=2.062e+02, percent-clipped=0.0 2023-11-18 14:55:42,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=278853.3333333333, ans=0.0 2023-11-18 14:55:43,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=278853.3333333333, ans=0.125 2023-11-18 14:56:05,782 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.44 vs. limit=22.5 2023-11-18 14:56:22,401 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5800, loss[loss=0.1143, simple_loss=0.136, pruned_loss=0.03541, audio_tagging_loss=0.01095, over 14753.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.1213, pruned_loss=0.03584, audio_tagging_loss=0.01177, over 3041564.24 frames. ], batch size: 56, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:56:26,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=279120.0, ans=0.125 2023-11-18 14:56:37,764 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=12.0 2023-11-18 14:56:42,899 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2023-11-18 14:56:47,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=279253.3333333333, ans=0.0 2023-11-18 14:56:51,928 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-11-18 14:56:54,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=279320.0, ans=0.0 2023-11-18 14:57:18,286 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5850, loss[loss=0.1268, simple_loss=0.1485, pruned_loss=0.04199, audio_tagging_loss=0.01053, over 15047.00 frames. ], tot_loss[loss=0.1088, simple_loss=0.122, pruned_loss=0.03616, audio_tagging_loss=0.01167, over 3038944.57 frames. ], batch size: 54, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:57:27,842 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.04 vs. limit=15.0 2023-11-18 14:57:31,475 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.015e+01 9.648e+01 1.054e+02 1.215e+02 1.872e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 14:57:36,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=279520.0, ans=0.0 2023-11-18 14:57:44,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=279586.6666666667, ans=0.0 2023-11-18 14:58:13,667 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5900, loss[loss=0.09386, simple_loss=0.1125, pruned_loss=0.02594, audio_tagging_loss=0.01166, over 15679.00 frames. ], tot_loss[loss=0.1088, simple_loss=0.1219, pruned_loss=0.03628, audio_tagging_loss=0.01158, over 3042836.83 frames. ], batch size: 58, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 14:58:17,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=279786.6666666667, ans=0.0 2023-11-18 14:58:34,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=279920.0, ans=0.1 2023-11-18 14:58:41,327 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.39 vs. limit=15.0 2023-11-18 14:59:07,495 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.79 vs. limit=15.0 2023-11-18 14:59:08,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=280120.0, ans=0.2 2023-11-18 14:59:08,889 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 5950, loss[loss=0.1216, simple_loss=0.1219, pruned_loss=0.04678, audio_tagging_loss=0.01391, over 14494.00 frames. ], tot_loss[loss=0.1088, simple_loss=0.1223, pruned_loss=0.03615, audio_tagging_loss=0.01151, over 3046693.01 frames. ], batch size: 55, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 14:59:23,196 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 1.041e+02 1.163e+02 1.306e+02 1.742e+02, threshold=2.325e+02, percent-clipped=0.0 2023-11-18 14:59:36,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=280253.3333333333, ans=0.125 2023-11-18 14:59:46,295 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.70 vs. limit=10.0 2023-11-18 14:59:49,340 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=3.93 vs. limit=12.0 2023-11-18 14:59:53,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=280386.6666666667, ans=0.2 2023-11-18 14:59:58,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=280386.6666666667, ans=0.5 2023-11-18 15:00:05,305 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6000, loss[loss=0.07819, simple_loss=0.08237, pruned_loss=0.02429, audio_tagging_loss=0.01272, over 14157.00 frames. ], tot_loss[loss=0.1089, simple_loss=0.1221, pruned_loss=0.03628, audio_tagging_loss=0.01153, over 3044525.19 frames. ], batch size: 56, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:00:05,306 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 15:00:29,918 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9092, 5.8992, 5.9478, 5.4985], device='cuda:2') 2023-11-18 15:00:38,409 INFO [train_asr.py:1147] (2/4) Epoch 4, validation: loss=0.07584, simple_loss=0.06235, pruned_loss=0.0102, audio_tagging_loss=0.03446, over 4681554.00 frames. 2023-11-18 15:00:38,410 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 15:00:49,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=280520.0, ans=0.125 2023-11-18 15:00:52,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=280520.0, ans=0.125 2023-11-18 15:00:54,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=280520.0, ans=0.125 2023-11-18 15:01:01,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=280586.6666666667, ans=0.0 2023-11-18 15:01:17,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=280653.3333333333, ans=0.125 2023-11-18 15:01:18,963 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:01:33,813 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6050, loss[loss=0.08354, simple_loss=0.09578, pruned_loss=0.0218, audio_tagging_loss=0.01386, over 15178.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1215, pruned_loss=0.03625, audio_tagging_loss=0.01165, over 3044851.86 frames. ], batch size: 59, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:01:46,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=280853.3333333333, ans=0.1 2023-11-18 15:01:47,523 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.202e+01 9.320e+01 1.035e+02 1.195e+02 1.658e+02, threshold=2.071e+02, percent-clipped=0.0 2023-11-18 15:01:58,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=280920.0, ans=0.1 2023-11-18 15:02:07,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=280986.6666666667, ans=0.1 2023-11-18 15:02:21,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=281053.3333333333, ans=0.0 2023-11-18 15:02:23,590 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.98 vs. limit=22.5 2023-11-18 15:02:29,895 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6100, loss[loss=0.1146, simple_loss=0.135, pruned_loss=0.03616, audio_tagging_loss=0.01095, over 15228.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1223, pruned_loss=0.03639, audio_tagging_loss=0.01165, over 3053945.49 frames. ], batch size: 57, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:02:36,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=281120.0, ans=0.07 2023-11-18 15:03:12,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=281320.0, ans=0.125 2023-11-18 15:03:16,713 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.48 vs. limit=15.0 2023-11-18 15:03:17,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=281386.6666666667, ans=0.125 2023-11-18 15:03:24,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=281453.3333333333, ans=0.0 2023-11-18 15:03:24,781 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6150, loss[loss=0.1161, simple_loss=0.1302, pruned_loss=0.03927, audio_tagging_loss=0.01177, over 15403.00 frames. ], tot_loss[loss=0.1087, simple_loss=0.122, pruned_loss=0.03603, audio_tagging_loss=0.01168, over 3057150.39 frames. ], batch size: 57, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:03:38,021 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 9.712e+01 1.096e+02 1.258e+02 1.781e+02, threshold=2.192e+02, percent-clipped=0.0 2023-11-18 15:03:43,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=281520.0, ans=0.2 2023-11-18 15:03:50,825 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.01 vs. limit=22.5 2023-11-18 15:03:53,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=281586.6666666667, ans=0.1 2023-11-18 15:04:11,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=281720.0, ans=0.125 2023-11-18 15:04:15,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=281720.0, ans=0.125 2023-11-18 15:04:15,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=281720.0, ans=0.125 2023-11-18 15:04:17,939 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.90 vs. limit=22.5 2023-11-18 15:04:20,403 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6200, loss[loss=0.1233, simple_loss=0.1395, pruned_loss=0.04174, audio_tagging_loss=0.01179, over 16037.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.1218, pruned_loss=0.03576, audio_tagging_loss=0.01159, over 3055470.00 frames. ], batch size: 60, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:04:22,474 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.93 vs. limit=6.0 2023-11-18 15:04:34,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=281853.3333333333, ans=0.125 2023-11-18 15:04:48,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=281920.0, ans=0.125 2023-11-18 15:04:55,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=281986.6666666667, ans=0.125 2023-11-18 15:05:10,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=282053.3333333333, ans=0.125 2023-11-18 15:05:17,030 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6250, loss[loss=0.1191, simple_loss=0.1348, pruned_loss=0.03933, audio_tagging_loss=0.01241, over 14981.00 frames. ], tot_loss[loss=0.1079, simple_loss=0.1212, pruned_loss=0.03561, audio_tagging_loss=0.01168, over 3049172.92 frames. ], batch size: 56, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:05:29,636 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 9.446e+01 1.080e+02 1.226e+02 1.932e+02, threshold=2.161e+02, percent-clipped=0.0 2023-11-18 15:05:38,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=282253.3333333333, ans=0.125 2023-11-18 15:06:10,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=282386.6666666667, ans=0.125 2023-11-18 15:06:11,965 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6300, loss[loss=0.0908, simple_loss=0.09689, pruned_loss=0.03054, audio_tagging_loss=0.01181, over 14911.00 frames. ], tot_loss[loss=0.108, simple_loss=0.1208, pruned_loss=0.03572, audio_tagging_loss=0.01189, over 3054656.69 frames. ], batch size: 57, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:06:14,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=282453.3333333333, ans=0.125 2023-11-18 15:06:21,611 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:06:21,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=282520.0, ans=0.125 2023-11-18 15:06:39,646 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=282586.6666666667, ans=10.0 2023-11-18 15:06:50,753 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.90 vs. limit=15.0 2023-11-18 15:06:52,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=282653.3333333333, ans=0.0 2023-11-18 15:06:56,849 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.65 vs. limit=15.0 2023-11-18 15:07:07,480 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6350, loss[loss=0.1019, simple_loss=0.107, pruned_loss=0.03346, audio_tagging_loss=0.01493, over 15429.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.1206, pruned_loss=0.03547, audio_tagging_loss=0.01199, over 3048077.06 frames. ], batch size: 58, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:07:16,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=282786.6666666667, ans=0.125 2023-11-18 15:07:20,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=282853.3333333333, ans=0.2 2023-11-18 15:07:21,674 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 9.648e+01 1.090e+02 1.229e+02 1.753e+02, threshold=2.179e+02, percent-clipped=0.0 2023-11-18 15:07:22,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=282853.3333333333, ans=0.125 2023-11-18 15:07:46,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=282986.6666666667, ans=0.125 2023-11-18 15:07:46,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=282986.6666666667, ans=0.125 2023-11-18 15:07:49,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=282986.6666666667, ans=0.1 2023-11-18 15:07:56,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=283053.3333333333, ans=0.1 2023-11-18 15:08:01,749 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.12 vs. limit=22.5 2023-11-18 15:08:03,794 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6400, loss[loss=0.1457, simple_loss=0.1662, pruned_loss=0.05403, audio_tagging_loss=0.008613, over 17153.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.1209, pruned_loss=0.03562, audio_tagging_loss=0.012, over 3047919.03 frames. ], batch size: 61, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:08:18,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=283186.6666666667, ans=0.1 2023-11-18 15:08:26,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=283253.3333333333, ans=0.0 2023-11-18 15:08:26,330 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.05 vs. limit=6.0 2023-11-18 15:08:30,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=283253.3333333333, ans=0.0 2023-11-18 15:08:41,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.71 vs. limit=6.0 2023-11-18 15:08:51,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=283386.6666666667, ans=0.1 2023-11-18 15:08:58,500 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6450, loss[loss=0.08543, simple_loss=0.09206, pruned_loss=0.02415, audio_tagging_loss=0.01525, over 14488.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1202, pruned_loss=0.03522, audio_tagging_loss=0.01212, over 3050871.17 frames. ], batch size: 55, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:09:05,538 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.96 vs. limit=15.0 2023-11-18 15:09:08,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=283520.0, ans=0.1 2023-11-18 15:09:11,021 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.683e+01 9.197e+01 1.014e+02 1.179e+02 1.440e+02, threshold=2.029e+02, percent-clipped=0.0 2023-11-18 15:09:30,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=283586.6666666667, ans=0.1 2023-11-18 15:09:30,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.74 vs. limit=15.0 2023-11-18 15:09:51,438 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=8.683e-02 2023-11-18 15:09:53,327 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6500, loss[loss=0.1266, simple_loss=0.1455, pruned_loss=0.04139, audio_tagging_loss=0.01252, over 14627.00 frames. ], tot_loss[loss=0.1059, simple_loss=0.1185, pruned_loss=0.03453, audio_tagging_loss=0.01217, over 3044183.94 frames. ], batch size: 55, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:09:57,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=283786.6666666667, ans=0.0 2023-11-18 15:10:17,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=283920.0, ans=0.0 2023-11-18 15:10:41,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=284053.3333333333, ans=0.125 2023-11-18 15:10:49,937 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6550, loss[loss=0.1488, simple_loss=0.1664, pruned_loss=0.05554, audio_tagging_loss=0.01009, over 15253.00 frames. ], tot_loss[loss=0.1057, simple_loss=0.1182, pruned_loss=0.03457, audio_tagging_loss=0.012, over 3038062.41 frames. ], batch size: 55, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:10:50,404 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2023-11-18 15:11:00,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=284186.6666666667, ans=0.2 2023-11-18 15:11:03,060 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 9.628e+01 1.072e+02 1.195e+02 1.710e+02, threshold=2.144e+02, percent-clipped=0.0 2023-11-18 15:11:11,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=284253.3333333333, ans=0.1 2023-11-18 15:11:14,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=284253.3333333333, ans=0.125 2023-11-18 15:11:19,559 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.78 vs. limit=22.5 2023-11-18 15:11:20,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=284253.3333333333, ans=0.0 2023-11-18 15:11:39,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=284386.6666666667, ans=0.1 2023-11-18 15:11:45,580 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6600, loss[loss=0.08138, simple_loss=0.08817, pruned_loss=0.02355, audio_tagging_loss=0.01374, over 15235.00 frames. ], tot_loss[loss=0.1062, simple_loss=0.1191, pruned_loss=0.03481, audio_tagging_loss=0.0118, over 3040079.74 frames. ], batch size: 60, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:11:54,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=284453.3333333333, ans=0.025 2023-11-18 15:12:02,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=284520.0, ans=0.2 2023-11-18 15:12:11,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=284586.6666666667, ans=0.2 2023-11-18 15:12:12,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=284586.6666666667, ans=0.1 2023-11-18 15:12:20,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=284653.3333333333, ans=0.2 2023-11-18 15:12:27,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=284653.3333333333, ans=0.025 2023-11-18 15:12:34,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=284720.0, ans=0.125 2023-11-18 15:12:40,470 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6650, loss[loss=0.1141, simple_loss=0.128, pruned_loss=0.04262, audio_tagging_loss=0.007448, over 15324.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1193, pruned_loss=0.03483, audio_tagging_loss=0.01167, over 3040692.46 frames. ], batch size: 57, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:12:42,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=284786.6666666667, ans=0.0 2023-11-18 15:12:46,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=284786.6666666667, ans=0.125 2023-11-18 15:12:54,221 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.970e+01 9.511e+01 1.065e+02 1.198e+02 1.619e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-18 15:12:56,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=284853.3333333333, ans=0.1 2023-11-18 15:13:09,146 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.64 vs. limit=12.0 2023-11-18 15:13:17,551 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.22 vs. limit=22.5 2023-11-18 15:13:31,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=285053.3333333333, ans=0.125 2023-11-18 15:13:33,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=285053.3333333333, ans=0.0 2023-11-18 15:13:36,272 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6700, loss[loss=0.07777, simple_loss=0.08258, pruned_loss=0.0241, audio_tagging_loss=0.01239, over 14461.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1191, pruned_loss=0.03487, audio_tagging_loss=0.01169, over 3033501.09 frames. ], batch size: 55, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:13:42,942 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=285120.0, ans=0.125 2023-11-18 15:13:49,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=285186.6666666667, ans=0.125 2023-11-18 15:13:53,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=285186.6666666667, ans=0.125 2023-11-18 15:14:09,477 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:14:27,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=285386.6666666667, ans=0.125 2023-11-18 15:14:30,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=285386.6666666667, ans=0.2 2023-11-18 15:14:32,999 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6750, loss[loss=0.1195, simple_loss=0.1344, pruned_loss=0.04045, audio_tagging_loss=0.01188, over 14874.00 frames. ], tot_loss[loss=0.107, simple_loss=0.12, pruned_loss=0.03539, audio_tagging_loss=0.01166, over 3035706.63 frames. ], batch size: 56, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:14:37,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=285453.3333333333, ans=0.125 2023-11-18 15:14:37,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=285453.3333333333, ans=0.125 2023-11-18 15:14:44,135 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.44 vs. limit=15.0 2023-11-18 15:14:45,690 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 9.541e+01 1.044e+02 1.172e+02 1.686e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 15:14:47,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=285520.0, ans=0.0 2023-11-18 15:14:52,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=285520.0, ans=0.0 2023-11-18 15:15:14,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=285653.3333333333, ans=0.125 2023-11-18 15:15:27,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=285786.6666666667, ans=0.125 2023-11-18 15:15:28,136 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6800, loss[loss=0.1056, simple_loss=0.1158, pruned_loss=0.0385, audio_tagging_loss=0.009199, over 14686.00 frames. ], tot_loss[loss=0.1068, simple_loss=0.1195, pruned_loss=0.03545, audio_tagging_loss=0.01161, over 3036079.43 frames. ], batch size: 57, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:15:46,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=285853.3333333333, ans=0.125 2023-11-18 15:15:47,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=285853.3333333333, ans=0.2 2023-11-18 15:15:48,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=285853.3333333333, ans=0.125 2023-11-18 15:16:11,731 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:16:23,775 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6850, loss[loss=0.07295, simple_loss=0.07925, pruned_loss=0.01917, audio_tagging_loss=0.01415, over 14845.00 frames. ], tot_loss[loss=0.107, simple_loss=0.1199, pruned_loss=0.03548, audio_tagging_loss=0.01155, over 3033950.65 frames. ], batch size: 58, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:16:28,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=286120.0, ans=0.1 2023-11-18 15:16:37,993 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.225e+01 9.571e+01 1.055e+02 1.193e+02 1.601e+02, threshold=2.111e+02, percent-clipped=0.0 2023-11-18 15:16:49,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=286253.3333333333, ans=0.125 2023-11-18 15:17:15,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=286386.6666666667, ans=0.125 2023-11-18 15:17:18,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=286386.6666666667, ans=10.0 2023-11-18 15:17:19,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=286453.3333333333, ans=0.1 2023-11-18 15:17:20,141 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6900, loss[loss=0.1061, simple_loss=0.119, pruned_loss=0.03496, audio_tagging_loss=0.01164, over 13846.00 frames. ], tot_loss[loss=0.1073, simple_loss=0.1206, pruned_loss=0.03554, audio_tagging_loss=0.01148, over 3042297.69 frames. ], batch size: 54, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:17:46,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=286586.6666666667, ans=0.2 2023-11-18 15:18:04,988 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:18:06,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=286720.0, ans=0.1 2023-11-18 15:18:06,473 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.36 vs. limit=15.0 2023-11-18 15:18:08,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=286720.0, ans=0.1 2023-11-18 15:18:15,641 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 6950, loss[loss=0.09548, simple_loss=0.09641, pruned_loss=0.03188, audio_tagging_loss=0.0154, over 15073.00 frames. ], tot_loss[loss=0.1066, simple_loss=0.1198, pruned_loss=0.03515, audio_tagging_loss=0.01159, over 3044179.02 frames. ], batch size: 57, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:18:18,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=286786.6666666667, ans=0.1 2023-11-18 15:18:23,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=286786.6666666667, ans=0.125 2023-11-18 15:18:28,742 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.933e+01 9.398e+01 1.033e+02 1.158e+02 1.660e+02, threshold=2.066e+02, percent-clipped=0.0 2023-11-18 15:18:35,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=286853.3333333333, ans=0.1 2023-11-18 15:19:05,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=287053.3333333333, ans=0.0 2023-11-18 15:19:11,042 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7000, loss[loss=0.1112, simple_loss=0.1279, pruned_loss=0.03411, audio_tagging_loss=0.01312, over 16629.00 frames. ], tot_loss[loss=0.1066, simple_loss=0.1196, pruned_loss=0.03516, audio_tagging_loss=0.01167, over 3046657.71 frames. ], batch size: 65, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:19:30,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=287186.6666666667, ans=0.0 2023-11-18 15:19:45,992 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.98 vs. limit=12.0 2023-11-18 15:19:49,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=287320.0, ans=0.125 2023-11-18 15:19:53,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=287320.0, ans=0.125 2023-11-18 15:19:57,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=287386.6666666667, ans=0.0 2023-11-18 15:20:02,632 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.89 vs. limit=15.0 2023-11-18 15:20:07,123 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7050, loss[loss=0.1096, simple_loss=0.1152, pruned_loss=0.03755, audio_tagging_loss=0.0145, over 15054.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1186, pruned_loss=0.03498, audio_tagging_loss=0.01182, over 3041074.94 frames. ], batch size: 58, lr: 1.62e-02, grad_scale: 64.0 2023-11-18 15:20:08,625 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.38 vs. limit=15.0 2023-11-18 15:20:10,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=287453.3333333333, ans=0.125 2023-11-18 15:20:10,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=287453.3333333333, ans=0.125 2023-11-18 15:20:20,225 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.175e+01 9.557e+01 1.044e+02 1.189e+02 1.971e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 15:20:20,783 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.96 vs. limit=10.0 2023-11-18 15:20:20,824 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=12.0 2023-11-18 15:20:47,834 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.82 vs. limit=22.5 2023-11-18 15:20:52,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=287720.0, ans=0.0 2023-11-18 15:20:58,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=287720.0, ans=0.0 2023-11-18 15:21:02,536 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7100, loss[loss=0.08416, simple_loss=0.08947, pruned_loss=0.02646, audio_tagging_loss=0.01297, over 16878.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1187, pruned_loss=0.03476, audio_tagging_loss=0.01199, over 3044117.29 frames. ], batch size: 64, lr: 1.62e-02, grad_scale: 64.0 2023-11-18 15:21:17,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=287853.3333333333, ans=0.1 2023-11-18 15:21:18,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=287853.3333333333, ans=0.125 2023-11-18 15:21:18,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=287853.3333333333, ans=0.0 2023-11-18 15:21:30,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=287920.0, ans=0.125 2023-11-18 15:21:41,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=287986.6666666667, ans=0.125 2023-11-18 15:21:58,403 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7150, loss[loss=0.1332, simple_loss=0.1562, pruned_loss=0.04184, audio_tagging_loss=0.01326, over 14697.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1201, pruned_loss=0.03531, audio_tagging_loss=0.012, over 3046627.41 frames. ], batch size: 54, lr: 1.62e-02, grad_scale: 64.0 2023-11-18 15:22:06,870 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2023-11-18 15:22:12,086 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.925e+01 9.651e+01 1.094e+02 1.204e+02 1.585e+02, threshold=2.188e+02, percent-clipped=0.0 2023-11-18 15:22:29,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=288253.3333333333, ans=0.125 2023-11-18 15:22:31,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=288320.0, ans=0.0 2023-11-18 15:22:43,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=288386.6666666667, ans=10.0 2023-11-18 15:22:54,531 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7200, loss[loss=0.08585, simple_loss=0.09917, pruned_loss=0.02308, audio_tagging_loss=0.01318, over 14610.00 frames. ], tot_loss[loss=0.107, simple_loss=0.1197, pruned_loss=0.03513, audio_tagging_loss=0.01203, over 3042784.21 frames. ], batch size: 55, lr: 1.62e-02, grad_scale: 64.0 2023-11-18 15:23:00,375 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.26 vs. limit=15.0 2023-11-18 15:23:14,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=288520.0, ans=0.2 2023-11-18 15:23:39,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=288720.0, ans=0.0 2023-11-18 15:23:40,215 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=12.0 2023-11-18 15:23:49,859 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7250, loss[loss=0.09293, simple_loss=0.1022, pruned_loss=0.02911, audio_tagging_loss=0.01272, over 15124.00 frames. ], tot_loss[loss=0.108, simple_loss=0.1209, pruned_loss=0.03553, audio_tagging_loss=0.012, over 3045570.71 frames. ], batch size: 59, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:24:03,639 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 9.776e+01 1.072e+02 1.209e+02 1.575e+02, threshold=2.144e+02, percent-clipped=0.0 2023-11-18 15:24:44,975 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7300, loss[loss=0.1094, simple_loss=0.1248, pruned_loss=0.0407, audio_tagging_loss=0.006316, over 14706.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.1226, pruned_loss=0.03594, audio_tagging_loss=0.01182, over 3044876.10 frames. ], batch size: 57, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:24:47,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=289120.0, ans=0.0 2023-11-18 15:25:05,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=289186.6666666667, ans=0.1 2023-11-18 15:25:22,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=289320.0, ans=0.125 2023-11-18 15:25:25,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=289320.0, ans=0.125 2023-11-18 15:25:40,812 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7350, loss[loss=0.1069, simple_loss=0.1169, pruned_loss=0.03448, audio_tagging_loss=0.01397, over 16025.00 frames. ], tot_loss[loss=0.109, simple_loss=0.1227, pruned_loss=0.03603, audio_tagging_loss=0.01162, over 3045902.58 frames. ], batch size: 62, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:25:54,546 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.880e+01 9.633e+01 1.075e+02 1.263e+02 1.928e+02, threshold=2.150e+02, percent-clipped=0.0 2023-11-18 15:26:07,060 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.64 vs. limit=10.0 2023-11-18 15:26:16,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=289653.3333333333, ans=0.1 2023-11-18 15:26:22,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=289653.3333333333, ans=0.125 2023-11-18 15:26:25,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=289720.0, ans=0.0 2023-11-18 15:26:27,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=289720.0, ans=0.125 2023-11-18 15:26:33,079 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.50 vs. limit=15.0 2023-11-18 15:26:35,458 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7400, loss[loss=0.114, simple_loss=0.1294, pruned_loss=0.03991, audio_tagging_loss=0.009394, over 15098.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.1219, pruned_loss=0.03574, audio_tagging_loss=0.01151, over 3050873.79 frames. ], batch size: 58, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:26:45,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=289853.3333333333, ans=0.0 2023-11-18 15:26:45,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=289853.3333333333, ans=12.0 2023-11-18 15:27:00,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=289920.0, ans=0.0 2023-11-18 15:27:05,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=289920.0, ans=0.125 2023-11-18 15:27:17,239 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.62 vs. limit=22.5 2023-11-18 15:27:22,523 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2023-11-18 15:27:27,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=290053.3333333333, ans=10.0 2023-11-18 15:27:30,959 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7450, loss[loss=0.09245, simple_loss=0.09919, pruned_loss=0.02707, audio_tagging_loss=0.01578, over 14374.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.122, pruned_loss=0.03577, audio_tagging_loss=0.01136, over 3045266.70 frames. ], batch size: 54, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:27:46,285 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 9.437e+01 1.026e+02 1.201e+02 2.000e+02, threshold=2.053e+02, percent-clipped=0.0 2023-11-18 15:27:50,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=290186.6666666667, ans=0.125 2023-11-18 15:28:04,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=290320.0, ans=0.125 2023-11-18 15:28:22,538 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.27 vs. limit=15.0 2023-11-18 15:28:27,308 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7500, loss[loss=0.1074, simple_loss=0.1272, pruned_loss=0.03317, audio_tagging_loss=0.0106, over 15723.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1213, pruned_loss=0.0354, audio_tagging_loss=0.01139, over 3046312.85 frames. ], batch size: 58, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:28:38,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=290520.0, ans=0.0 2023-11-18 15:28:42,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=290520.0, ans=0.125 2023-11-18 15:28:51,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=290586.6666666667, ans=0.125 2023-11-18 15:29:07,706 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.14 vs. limit=10.0 2023-11-18 15:29:08,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=290653.3333333333, ans=0.125 2023-11-18 15:29:17,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=290720.0, ans=0.125 2023-11-18 15:29:22,436 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7550, loss[loss=0.1055, simple_loss=0.1146, pruned_loss=0.0369, audio_tagging_loss=0.01133, over 15728.00 frames. ], tot_loss[loss=0.1075, simple_loss=0.1215, pruned_loss=0.03539, audio_tagging_loss=0.01137, over 3049384.27 frames. ], batch size: 57, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:29:33,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=290853.3333333333, ans=0.125 2023-11-18 15:29:36,108 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.781e+01 9.490e+01 1.043e+02 1.208e+02 1.931e+02, threshold=2.087e+02, percent-clipped=0.0 2023-11-18 15:29:39,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=290853.3333333333, ans=0.0 2023-11-18 15:29:47,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=290920.0, ans=0.125 2023-11-18 15:29:49,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=290920.0, ans=0.125 2023-11-18 15:29:49,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=290920.0, ans=0.1 2023-11-18 15:29:54,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=290920.0, ans=0.125 2023-11-18 15:30:00,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=290986.6666666667, ans=0.0 2023-11-18 15:30:04,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=290986.6666666667, ans=0.1 2023-11-18 15:30:10,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=291053.3333333333, ans=0.125 2023-11-18 15:30:14,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=291053.3333333333, ans=0.125 2023-11-18 15:30:15,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=291053.3333333333, ans=0.1 2023-11-18 15:30:17,206 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7600, loss[loss=0.09869, simple_loss=0.1113, pruned_loss=0.03146, audio_tagging_loss=0.01157, over 15541.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.1209, pruned_loss=0.03532, audio_tagging_loss=0.01145, over 3047834.34 frames. ], batch size: 59, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:30:27,430 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.37 vs. limit=15.0 2023-11-18 15:30:30,120 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:30:36,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=291186.6666666667, ans=0.0 2023-11-18 15:30:45,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=291253.3333333333, ans=0.1 2023-11-18 15:31:01,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=291386.6666666667, ans=0.04949747468305833 2023-11-18 15:31:05,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=291386.6666666667, ans=0.125 2023-11-18 15:31:10,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=291386.6666666667, ans=0.2 2023-11-18 15:31:13,067 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7650, loss[loss=0.1253, simple_loss=0.1347, pruned_loss=0.04505, audio_tagging_loss=0.01291, over 15130.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.1206, pruned_loss=0.03532, audio_tagging_loss=0.01156, over 3044752.41 frames. ], batch size: 55, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:31:13,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=291453.3333333333, ans=0.0 2023-11-18 15:31:15,837 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:31:27,123 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 9.408e+01 1.037e+02 1.133e+02 1.442e+02, threshold=2.074e+02, percent-clipped=0.0 2023-11-18 15:31:33,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=291586.6666666667, ans=0.05 2023-11-18 15:31:47,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=291653.3333333333, ans=0.0 2023-11-18 15:31:53,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.15 vs. limit=15.0 2023-11-18 15:32:02,624 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.13 vs. limit=22.5 2023-11-18 15:32:06,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=291720.0, ans=0.1 2023-11-18 15:32:08,460 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7700, loss[loss=0.1108, simple_loss=0.1264, pruned_loss=0.03631, audio_tagging_loss=0.01125, over 15459.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.1204, pruned_loss=0.03526, audio_tagging_loss=0.01146, over 3039732.28 frames. ], batch size: 56, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:32:30,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=291920.0, ans=0.5 2023-11-18 15:32:47,104 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:32:50,536 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.14 vs. limit=15.0 2023-11-18 15:32:51,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=291986.6666666667, ans=0.125 2023-11-18 15:32:53,549 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.856e-02 2023-11-18 15:32:56,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=292053.3333333333, ans=0.0 2023-11-18 15:33:03,724 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7750, loss[loss=0.1448, simple_loss=0.1606, pruned_loss=0.0555, audio_tagging_loss=0.009032, over 16651.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1193, pruned_loss=0.03477, audio_tagging_loss=0.01165, over 3036951.41 frames. ], batch size: 62, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:33:05,257 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.41 vs. limit=15.0 2023-11-18 15:33:11,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=292120.0, ans=0.125 2023-11-18 15:33:18,447 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.257e+01 9.507e+01 1.083e+02 1.273e+02 2.415e+02, threshold=2.165e+02, percent-clipped=1.0 2023-11-18 15:33:22,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=292186.6666666667, ans=0.125 2023-11-18 15:33:46,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=292320.0, ans=0.1 2023-11-18 15:33:59,633 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7800, loss[loss=0.1436, simple_loss=0.176, pruned_loss=0.04706, audio_tagging_loss=0.00849, over 14783.00 frames. ], tot_loss[loss=0.1075, simple_loss=0.1213, pruned_loss=0.03521, audio_tagging_loss=0.01161, over 3040731.39 frames. ], batch size: 52, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:34:16,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=292520.0, ans=0.0 2023-11-18 15:34:23,490 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.00 vs. limit=15.0 2023-11-18 15:34:28,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=292586.6666666667, ans=0.2 2023-11-18 15:34:31,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=292653.3333333333, ans=0.125 2023-11-18 15:34:46,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=292720.0, ans=0.125 2023-11-18 15:34:55,492 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7850, loss[loss=0.1213, simple_loss=0.1443, pruned_loss=0.03767, audio_tagging_loss=0.01151, over 15887.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.1213, pruned_loss=0.03544, audio_tagging_loss=0.01164, over 3042243.39 frames. ], batch size: 57, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:34:59,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=292786.6666666667, ans=0.0 2023-11-18 15:35:00,048 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.83 vs. limit=15.0 2023-11-18 15:35:02,326 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.72 vs. limit=22.5 2023-11-18 15:35:05,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=292853.3333333333, ans=0.125 2023-11-18 15:35:08,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=292853.3333333333, ans=0.125 2023-11-18 15:35:09,098 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.474e+01 9.851e+01 1.052e+02 1.175e+02 1.725e+02, threshold=2.105e+02, percent-clipped=0.0 2023-11-18 15:35:09,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=292853.3333333333, ans=0.025 2023-11-18 15:35:22,787 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.54 vs. limit=12.0 2023-11-18 15:35:36,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=292986.6666666667, ans=0.0 2023-11-18 15:35:43,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=293053.3333333333, ans=0.125 2023-11-18 15:35:50,172 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7900, loss[loss=0.08563, simple_loss=0.09421, pruned_loss=0.02824, audio_tagging_loss=0.01028, over 14575.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1222, pruned_loss=0.03578, audio_tagging_loss=0.01172, over 3046173.45 frames. ], batch size: 56, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:35:53,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=293120.0, ans=0.035 2023-11-18 15:35:55,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=293120.0, ans=0.0 2023-11-18 15:35:57,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=293120.0, ans=0.0 2023-11-18 15:36:04,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=293186.6666666667, ans=0.125 2023-11-18 15:36:10,473 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.41 vs. limit=6.0 2023-11-18 15:36:20,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=293253.3333333333, ans=0.1 2023-11-18 15:36:32,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=293320.0, ans=0.2 2023-11-18 15:36:40,161 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.72 vs. limit=15.0 2023-11-18 15:36:44,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=293386.6666666667, ans=0.125 2023-11-18 15:36:45,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=293386.6666666667, ans=0.05 2023-11-18 15:36:47,574 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 7950, loss[loss=0.1041, simple_loss=0.1206, pruned_loss=0.0309, audio_tagging_loss=0.01286, over 14824.00 frames. ], tot_loss[loss=0.1102, simple_loss=0.1239, pruned_loss=0.03645, audio_tagging_loss=0.0118, over 3055189.94 frames. ], batch size: 58, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:36:49,236 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.46 vs. limit=22.5 2023-11-18 15:36:54,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=293453.3333333333, ans=0.1 2023-11-18 15:37:02,816 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.429e+01 9.694e+01 1.093e+02 1.229e+02 1.791e+02, threshold=2.186e+02, percent-clipped=0.0 2023-11-18 15:37:02,874 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:37:04,157 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:37:07,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=293520.0, ans=0.1 2023-11-18 15:37:26,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=293653.3333333333, ans=0.2 2023-11-18 15:37:28,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=293653.3333333333, ans=0.125 2023-11-18 15:37:29,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=293653.3333333333, ans=0.125 2023-11-18 15:37:43,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=293786.6666666667, ans=0.125 2023-11-18 15:37:43,990 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8000, loss[loss=0.05779, simple_loss=0.06767, pruned_loss=0.01399, audio_tagging_loss=0.00997, over 14663.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1229, pruned_loss=0.03614, audio_tagging_loss=0.01178, over 3044081.31 frames. ], batch size: 55, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:37:44,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=293786.6666666667, ans=0.09899494936611666 2023-11-18 15:38:20,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=293986.6666666667, ans=0.1 2023-11-18 15:38:38,553 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8050, loss[loss=0.1063, simple_loss=0.1166, pruned_loss=0.03361, audio_tagging_loss=0.01444, over 15577.00 frames. ], tot_loss[loss=0.1079, simple_loss=0.1209, pruned_loss=0.03543, audio_tagging_loss=0.01199, over 3044121.35 frames. ], batch size: 58, lr: 1.61e-02, grad_scale: 16.0 2023-11-18 15:38:53,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.548e+01 1.018e+02 1.096e+02 1.204e+02 1.820e+02, threshold=2.193e+02, percent-clipped=0.0 2023-11-18 15:38:56,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=294186.6666666667, ans=0.0 2023-11-18 15:39:07,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=294253.3333333333, ans=0.125 2023-11-18 15:39:08,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=294253.3333333333, ans=0.2 2023-11-18 15:39:22,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=294386.6666666667, ans=0.125 2023-11-18 15:39:25,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=294386.6666666667, ans=0.125 2023-11-18 15:39:25,759 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:39:27,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=294386.6666666667, ans=0.125 2023-11-18 15:39:33,370 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8100, loss[loss=0.1214, simple_loss=0.1282, pruned_loss=0.04587, audio_tagging_loss=0.01146, over 14549.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1219, pruned_loss=0.03583, audio_tagging_loss=0.01184, over 3049343.17 frames. ], batch size: 54, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:39:34,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=294453.3333333333, ans=0.0 2023-11-18 15:39:36,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=294453.3333333333, ans=0.125 2023-11-18 15:39:40,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=294453.3333333333, ans=0.1 2023-11-18 15:40:29,788 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8150, loss[loss=0.1193, simple_loss=0.1355, pruned_loss=0.03868, audio_tagging_loss=0.01291, over 15934.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1218, pruned_loss=0.03597, audio_tagging_loss=0.01151, over 3051824.46 frames. ], batch size: 57, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:40:35,700 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.82 vs. limit=6.0 2023-11-18 15:40:41,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=294853.3333333333, ans=0.125 2023-11-18 15:40:42,665 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:40:44,587 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.218e+01 9.329e+01 1.045e+02 1.150e+02 1.655e+02, threshold=2.090e+02, percent-clipped=0.0 2023-11-18 15:40:58,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=294920.0, ans=0.125 2023-11-18 15:40:59,492 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2023-11-18 15:41:24,199 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8200, loss[loss=0.1305, simple_loss=0.1549, pruned_loss=0.04312, audio_tagging_loss=0.009965, over 16180.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.1218, pruned_loss=0.03577, audio_tagging_loss=0.01141, over 3050340.28 frames. ], batch size: 57, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:41:26,336 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:41:29,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=295120.0, ans=0.0 2023-11-18 15:41:52,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=295253.3333333333, ans=0.125 2023-11-18 15:41:53,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=295253.3333333333, ans=0.125 2023-11-18 15:42:01,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=295320.0, ans=0.0 2023-11-18 15:42:03,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=295320.0, ans=0.1 2023-11-18 15:42:19,559 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8250, loss[loss=0.1408, simple_loss=0.1528, pruned_loss=0.0505, audio_tagging_loss=0.01394, over 15452.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.1217, pruned_loss=0.03598, audio_tagging_loss=0.01145, over 3048864.05 frames. ], batch size: 57, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:42:25,099 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:42:34,805 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.584e+01 9.274e+01 1.030e+02 1.127e+02 2.119e+02, threshold=2.060e+02, percent-clipped=1.0 2023-11-18 15:42:50,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=295586.6666666667, ans=0.2 2023-11-18 15:43:02,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=295720.0, ans=0.125 2023-11-18 15:43:14,700 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.20 vs. limit=15.0 2023-11-18 15:43:15,126 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8300, loss[loss=0.08776, simple_loss=0.09495, pruned_loss=0.0272, audio_tagging_loss=0.01308, over 14841.00 frames. ], tot_loss[loss=0.107, simple_loss=0.12, pruned_loss=0.03549, audio_tagging_loss=0.0115, over 3047008.75 frames. ], batch size: 56, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:43:29,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=295853.3333333333, ans=0.0 2023-11-18 15:43:30,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=295853.3333333333, ans=0.1 2023-11-18 15:43:39,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=295920.0, ans=0.2 2023-11-18 15:43:52,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=295986.6666666667, ans=0.125 2023-11-18 15:44:11,153 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8350, loss[loss=0.1223, simple_loss=0.1293, pruned_loss=0.04383, audio_tagging_loss=0.01381, over 14807.00 frames. ], tot_loss[loss=0.1075, simple_loss=0.1205, pruned_loss=0.0358, audio_tagging_loss=0.01141, over 3053899.23 frames. ], batch size: 56, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:44:14,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296120.0, ans=0.1 2023-11-18 15:44:16,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296120.0, ans=0.1 2023-11-18 15:44:22,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=296186.6666666667, ans=0.125 2023-11-18 15:44:26,494 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.804e+01 9.554e+01 1.077e+02 1.196e+02 1.483e+02, threshold=2.155e+02, percent-clipped=0.0 2023-11-18 15:44:48,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=296320.0, ans=0.125 2023-11-18 15:44:54,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=296386.6666666667, ans=0.125 2023-11-18 15:45:06,080 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8400, loss[loss=0.1445, simple_loss=0.1583, pruned_loss=0.05287, audio_tagging_loss=0.01248, over 14729.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1201, pruned_loss=0.03553, audio_tagging_loss=0.01149, over 3060552.97 frames. ], batch size: 53, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:45:11,391 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.15 vs. limit=15.0 2023-11-18 15:45:25,020 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.52 vs. limit=22.5 2023-11-18 15:45:31,854 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=15.0 2023-11-18 15:45:44,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=296653.3333333333, ans=0.125 2023-11-18 15:45:45,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=296653.3333333333, ans=0.0 2023-11-18 15:45:53,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=296720.0, ans=0.125 2023-11-18 15:45:57,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=296720.0, ans=0.0 2023-11-18 15:45:59,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=296720.0, ans=0.0 2023-11-18 15:46:02,604 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8450, loss[loss=0.119, simple_loss=0.1359, pruned_loss=0.03769, audio_tagging_loss=0.0134, over 15559.00 frames. ], tot_loss[loss=0.1066, simple_loss=0.1196, pruned_loss=0.03527, audio_tagging_loss=0.01152, over 3056180.78 frames. ], batch size: 58, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:46:17,856 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.027e+01 9.338e+01 1.042e+02 1.138e+02 1.608e+02, threshold=2.084e+02, percent-clipped=0.0 2023-11-18 15:46:21,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296853.3333333333, ans=0.1 2023-11-18 15:46:28,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=296920.0, ans=0.04949747468305833 2023-11-18 15:46:35,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=296986.6666666667, ans=0.125 2023-11-18 15:46:41,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=296986.6666666667, ans=0.125 2023-11-18 15:46:46,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=297053.3333333333, ans=0.1 2023-11-18 15:46:57,412 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8500, loss[loss=0.08531, simple_loss=0.0996, pruned_loss=0.02349, audio_tagging_loss=0.01202, over 15089.00 frames. ], tot_loss[loss=0.1067, simple_loss=0.12, pruned_loss=0.03516, audio_tagging_loss=0.01155, over 3062499.57 frames. ], batch size: 59, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:47:11,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=297186.6666666667, ans=0.125 2023-11-18 15:47:16,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=297186.6666666667, ans=0.1 2023-11-18 15:47:17,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=297186.6666666667, ans=0.2 2023-11-18 15:47:21,640 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.39 vs. limit=5.0 2023-11-18 15:47:26,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=297253.3333333333, ans=0.125 2023-11-18 15:47:29,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=297253.3333333333, ans=0.0 2023-11-18 15:47:36,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=297320.0, ans=0.125 2023-11-18 15:47:39,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=297320.0, ans=0.0 2023-11-18 15:47:48,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=297386.6666666667, ans=0.125 2023-11-18 15:47:48,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=297386.6666666667, ans=0.0 2023-11-18 15:47:50,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=297386.6666666667, ans=0.125 2023-11-18 15:47:53,003 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8550, loss[loss=0.09702, simple_loss=0.11, pruned_loss=0.0299, audio_tagging_loss=0.01211, over 14842.00 frames. ], tot_loss[loss=0.1065, simple_loss=0.1197, pruned_loss=0.03509, audio_tagging_loss=0.01158, over 3060567.30 frames. ], batch size: 56, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:47:57,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=297453.3333333333, ans=0.125 2023-11-18 15:48:01,632 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.68 vs. limit=15.0 2023-11-18 15:48:09,325 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.282e+01 9.983e+01 1.095e+02 1.210e+02 1.627e+02, threshold=2.189e+02, percent-clipped=0.0 2023-11-18 15:48:17,464 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.94 vs. limit=15.0 2023-11-18 15:48:21,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=297586.6666666667, ans=0.2 2023-11-18 15:48:49,341 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8600, loss[loss=0.1214, simple_loss=0.1406, pruned_loss=0.03919, audio_tagging_loss=0.01187, over 15081.00 frames. ], tot_loss[loss=0.1063, simple_loss=0.1192, pruned_loss=0.035, audio_tagging_loss=0.01169, over 3051846.88 frames. ], batch size: 55, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:49:30,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=297986.6666666667, ans=0.2 2023-11-18 15:49:31,172 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.23 vs. limit=15.0 2023-11-18 15:49:39,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=298053.3333333333, ans=0.1 2023-11-18 15:49:43,350 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8650, loss[loss=0.1116, simple_loss=0.1247, pruned_loss=0.03677, audio_tagging_loss=0.01249, over 16244.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.1204, pruned_loss=0.03529, audio_tagging_loss=0.01174, over 3049692.77 frames. ], batch size: 63, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:49:47,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=298120.0, ans=0.125 2023-11-18 15:49:56,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=298186.6666666667, ans=0.0 2023-11-18 15:49:58,139 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.23 vs. limit=22.5 2023-11-18 15:49:58,604 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.945e+01 9.623e+01 1.078e+02 1.210e+02 1.696e+02, threshold=2.155e+02, percent-clipped=0.0 2023-11-18 15:50:07,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=298253.3333333333, ans=0.0 2023-11-18 15:50:11,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=298253.3333333333, ans=0.125 2023-11-18 15:50:35,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=298386.6666666667, ans=0.1 2023-11-18 15:50:38,384 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8700, loss[loss=0.1227, simple_loss=0.1434, pruned_loss=0.04002, audio_tagging_loss=0.01096, over 15849.00 frames. ], tot_loss[loss=0.1079, simple_loss=0.121, pruned_loss=0.03558, audio_tagging_loss=0.01179, over 3049286.52 frames. ], batch size: 59, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:50:43,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=298453.3333333333, ans=0.0 2023-11-18 15:50:46,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=298453.3333333333, ans=0.125 2023-11-18 15:50:54,657 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.58 vs. limit=6.0 2023-11-18 15:51:04,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=298586.6666666667, ans=0.125 2023-11-18 15:51:09,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=298586.6666666667, ans=0.1 2023-11-18 15:51:15,869 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:51:18,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=298653.3333333333, ans=0.2 2023-11-18 15:51:25,639 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-11-18 15:51:33,506 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8750, loss[loss=0.113, simple_loss=0.1186, pruned_loss=0.03894, audio_tagging_loss=0.01473, over 16287.00 frames. ], tot_loss[loss=0.1093, simple_loss=0.1229, pruned_loss=0.03608, audio_tagging_loss=0.01177, over 3053874.42 frames. ], batch size: 63, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:51:40,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=298786.6666666667, ans=0.0 2023-11-18 15:51:43,201 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.13 vs. limit=15.0 2023-11-18 15:51:48,793 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.159e+01 9.840e+01 1.091e+02 1.232e+02 1.815e+02, threshold=2.181e+02, percent-clipped=0.0 2023-11-18 15:52:12,557 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.58 vs. limit=22.5 2023-11-18 15:52:28,376 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8800, loss[loss=0.08476, simple_loss=0.0911, pruned_loss=0.0249, audio_tagging_loss=0.01431, over 14423.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.1225, pruned_loss=0.03597, audio_tagging_loss=0.01186, over 3050054.49 frames. ], batch size: 57, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:52:28,777 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.10 vs. limit=22.5 2023-11-18 15:52:39,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=299186.6666666667, ans=0.125 2023-11-18 15:52:42,553 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.95 vs. limit=15.0 2023-11-18 15:52:50,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=299253.3333333333, ans=0.125 2023-11-18 15:52:53,047 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.55 vs. limit=6.0 2023-11-18 15:52:56,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=299253.3333333333, ans=0.0 2023-11-18 15:52:59,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=299253.3333333333, ans=0.125 2023-11-18 15:52:59,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=299253.3333333333, ans=0.1 2023-11-18 15:53:03,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=299320.0, ans=0.125 2023-11-18 15:53:10,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=299320.0, ans=0.1 2023-11-18 15:53:22,534 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8850, loss[loss=0.08986, simple_loss=0.09655, pruned_loss=0.02874, audio_tagging_loss=0.01284, over 14119.00 frames. ], tot_loss[loss=0.1093, simple_loss=0.1232, pruned_loss=0.03597, audio_tagging_loss=0.01176, over 3051185.86 frames. ], batch size: 53, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:53:31,641 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.13 vs. limit=15.0 2023-11-18 15:53:35,237 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:53:38,341 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 9.407e+01 1.047e+02 1.181e+02 1.757e+02, threshold=2.094e+02, percent-clipped=0.0 2023-11-18 15:54:17,928 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8900, loss[loss=0.1198, simple_loss=0.14, pruned_loss=0.04073, audio_tagging_loss=0.009036, over 15017.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.122, pruned_loss=0.03558, audio_tagging_loss=0.01171, over 3048422.82 frames. ], batch size: 55, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:54:27,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=299786.6666666667, ans=0.125 2023-11-18 15:54:30,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=299853.3333333333, ans=0.1 2023-11-18 15:54:47,646 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.39 vs. limit=6.0 2023-11-18 15:54:48,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=299920.0, ans=10.0 2023-11-18 15:55:12,594 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 8950, loss[loss=0.1224, simple_loss=0.1272, pruned_loss=0.04617, audio_tagging_loss=0.01259, over 14693.00 frames. ], tot_loss[loss=0.108, simple_loss=0.122, pruned_loss=0.03548, audio_tagging_loss=0.0115, over 3050141.84 frames. ], batch size: 55, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:55:27,255 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 9.233e+01 1.016e+02 1.150e+02 1.659e+02, threshold=2.033e+02, percent-clipped=0.0 2023-11-18 15:55:27,891 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.22 vs. limit=22.5 2023-11-18 15:55:47,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=300320.0, ans=0.125 2023-11-18 15:55:57,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=300386.6666666667, ans=0.025 2023-11-18 15:56:02,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=300386.6666666667, ans=0.125 2023-11-18 15:56:06,848 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9000, loss[loss=0.1087, simple_loss=0.126, pruned_loss=0.03463, audio_tagging_loss=0.01111, over 14229.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.1226, pruned_loss=0.03559, audio_tagging_loss=0.01135, over 3050990.06 frames. ], batch size: 56, lr: 1.59e-02, grad_scale: 16.0 2023-11-18 15:56:06,849 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 15:56:23,109 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([1.7645, 1.4817, 2.6258, 2.8718, 2.1644, 2.2927, 2.3998, 2.8532], device='cuda:2') 2023-11-18 15:56:40,142 INFO [train_asr.py:1147] (2/4) Epoch 4, validation: loss=0.07668, simple_loss=0.06181, pruned_loss=0.009869, audio_tagging_loss=0.03591, over 4681554.00 frames. 2023-11-18 15:56:40,143 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 15:56:40,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=300453.3333333333, ans=0.125 2023-11-18 15:56:41,935 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.98 vs. limit=15.0 2023-11-18 15:56:46,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=300453.3333333333, ans=0.125 2023-11-18 15:56:57,398 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.03 vs. limit=22.5 2023-11-18 15:57:02,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=300586.6666666667, ans=0.1 2023-11-18 15:57:13,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=300653.3333333333, ans=0.125 2023-11-18 15:57:22,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=300653.3333333333, ans=0.0 2023-11-18 15:57:29,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=300720.0, ans=0.1 2023-11-18 15:57:34,506 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9050, loss[loss=0.09466, simple_loss=0.1053, pruned_loss=0.02975, audio_tagging_loss=0.01225, over 16758.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.1217, pruned_loss=0.03539, audio_tagging_loss=0.01143, over 3051765.18 frames. ], batch size: 62, lr: 1.59e-02, grad_scale: 16.0 2023-11-18 15:57:49,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=300853.3333333333, ans=0.0 2023-11-18 15:57:50,189 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 9.255e+01 1.039e+02 1.147e+02 2.056e+02, threshold=2.078e+02, percent-clipped=1.0 2023-11-18 15:57:51,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=300853.3333333333, ans=0.125 2023-11-18 15:57:58,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=300920.0, ans=0.125 2023-11-18 15:58:05,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.97 vs. limit=22.5 2023-11-18 15:58:06,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=300986.6666666667, ans=0.0 2023-11-18 15:58:28,435 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9100, loss[loss=0.1225, simple_loss=0.1478, pruned_loss=0.04198, audio_tagging_loss=0.006605, over 15571.00 frames. ], tot_loss[loss=0.1073, simple_loss=0.1214, pruned_loss=0.0352, audio_tagging_loss=0.01141, over 3050875.89 frames. ], batch size: 57, lr: 1.59e-02, grad_scale: 16.0 2023-11-18 15:58:32,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=301120.0, ans=0.04949747468305833 2023-11-18 15:58:33,881 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:58:35,781 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.97 vs. limit=12.0 2023-11-18 15:58:39,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=301186.6666666667, ans=0.1 2023-11-18 15:58:43,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=301186.6666666667, ans=0.125 2023-11-18 15:58:56,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=301253.3333333333, ans=0.125 2023-11-18 15:59:00,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=301253.3333333333, ans=0.125 2023-11-18 15:59:03,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=301320.0, ans=0.125 2023-11-18 15:59:05,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=301320.0, ans=0.125 2023-11-18 15:59:07,169 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.09 vs. limit=15.0 2023-11-18 15:59:11,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=301386.6666666667, ans=0.0 2023-11-18 15:59:14,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=301386.6666666667, ans=0.0 2023-11-18 15:59:23,958 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9150, loss[loss=0.1128, simple_loss=0.1265, pruned_loss=0.03846, audio_tagging_loss=0.01113, over 15762.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.1215, pruned_loss=0.03514, audio_tagging_loss=0.01129, over 3051168.86 frames. ], batch size: 61, lr: 1.59e-02, grad_scale: 16.0 2023-11-18 15:59:24,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=301453.3333333333, ans=0.125 2023-11-18 15:59:30,206 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.436e+00 2023-11-18 15:59:36,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=301520.0, ans=0.125 2023-11-18 15:59:36,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=301520.0, ans=10.0 2023-11-18 15:59:41,554 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.692e+01 9.042e+01 1.024e+02 1.134e+02 1.471e+02, threshold=2.048e+02, percent-clipped=0.0 2023-11-18 15:59:43,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=301520.0, ans=0.125 2023-11-18 15:59:47,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=301586.6666666667, ans=0.1 2023-11-18 15:59:48,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.22 vs. limit=22.5 2023-11-18 15:59:51,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=301586.6666666667, ans=0.125 2023-11-18 15:59:52,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=301586.6666666667, ans=0.09899494936611666 2023-11-18 15:59:56,919 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.05 vs. limit=6.0 2023-11-18 16:00:13,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=301720.0, ans=0.1 2023-11-18 16:00:14,686 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:00:21,039 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9200, loss[loss=0.09816, simple_loss=0.1178, pruned_loss=0.02979, audio_tagging_loss=0.009475, over 14616.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.1218, pruned_loss=0.03542, audio_tagging_loss=0.0113, over 3055338.04 frames. ], batch size: 54, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 16:00:32,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=301853.3333333333, ans=0.05 2023-11-18 16:00:34,353 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2023-11-18 16:00:42,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=301920.0, ans=0.2 2023-11-18 16:00:42,452 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2023-11-18 16:00:44,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=301920.0, ans=0.125 2023-11-18 16:01:14,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=302053.3333333333, ans=0.2 2023-11-18 16:01:16,224 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9250, loss[loss=0.1029, simple_loss=0.1141, pruned_loss=0.03432, audio_tagging_loss=0.01146, over 15277.00 frames. ], tot_loss[loss=0.1068, simple_loss=0.1206, pruned_loss=0.03509, audio_tagging_loss=0.01139, over 3059620.25 frames. ], batch size: 59, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:01:33,126 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.552e+01 9.477e+01 1.067e+02 1.208e+02 1.657e+02, threshold=2.134e+02, percent-clipped=0.0 2023-11-18 16:01:52,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=302320.0, ans=0.04949747468305833 2023-11-18 16:01:58,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=302320.0, ans=0.125 2023-11-18 16:02:05,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=302386.6666666667, ans=0.0 2023-11-18 16:02:11,924 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9300, loss[loss=0.1091, simple_loss=0.1247, pruned_loss=0.0357, audio_tagging_loss=0.01108, over 16259.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.1207, pruned_loss=0.03518, audio_tagging_loss=0.01136, over 3055202.69 frames. ], batch size: 61, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:02:12,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=302453.3333333333, ans=0.0 2023-11-18 16:02:26,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=302520.0, ans=0.1 2023-11-18 16:02:36,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=302586.6666666667, ans=0.125 2023-11-18 16:02:38,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=302586.6666666667, ans=0.125 2023-11-18 16:02:51,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=302653.3333333333, ans=0.09899494936611666 2023-11-18 16:03:09,204 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9350, loss[loss=0.1116, simple_loss=0.1259, pruned_loss=0.0375, audio_tagging_loss=0.01119, over 15129.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.1212, pruned_loss=0.03494, audio_tagging_loss=0.01141, over 3056844.93 frames. ], batch size: 55, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:03:19,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=302853.3333333333, ans=0.0 2023-11-18 16:03:24,974 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 9.020e+01 1.030e+02 1.167e+02 1.548e+02, threshold=2.059e+02, percent-clipped=0.0 2023-11-18 16:03:41,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=302986.6666666667, ans=0.125 2023-11-18 16:03:52,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=303053.3333333333, ans=0.2 2023-11-18 16:03:53,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=303053.3333333333, ans=0.1 2023-11-18 16:04:03,785 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.47 vs. limit=10.0 2023-11-18 16:04:04,190 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9400, loss[loss=0.09101, simple_loss=0.102, pruned_loss=0.0303, audio_tagging_loss=0.009687, over 15096.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1213, pruned_loss=0.03493, audio_tagging_loss=0.01149, over 3058023.33 frames. ], batch size: 57, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:04:18,012 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2023-11-18 16:04:42,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=303320.0, ans=0.0 2023-11-18 16:04:44,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=303320.0, ans=0.2 2023-11-18 16:04:51,204 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:04:52,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=303386.6666666667, ans=0.0 2023-11-18 16:05:00,033 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9450, loss[loss=0.1255, simple_loss=0.1332, pruned_loss=0.04297, audio_tagging_loss=0.01593, over 15481.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.122, pruned_loss=0.03528, audio_tagging_loss=0.01154, over 3058662.67 frames. ], batch size: 58, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:05:00,060 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:05:03,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=303453.3333333333, ans=0.125 2023-11-18 16:05:05,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=303453.3333333333, ans=0.0 2023-11-18 16:05:09,923 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.780e-02 2023-11-18 16:05:14,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=12.0 2023-11-18 16:05:17,120 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.847e+01 9.577e+01 1.061e+02 1.222e+02 1.461e+02, threshold=2.121e+02, percent-clipped=0.0 2023-11-18 16:05:33,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=303653.3333333333, ans=0.125 2023-11-18 16:05:34,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=303653.3333333333, ans=0.05 2023-11-18 16:05:41,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=303653.3333333333, ans=0.05 2023-11-18 16:05:56,426 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9500, loss[loss=0.08775, simple_loss=0.09963, pruned_loss=0.02655, audio_tagging_loss=0.01138, over 14186.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1221, pruned_loss=0.03559, audio_tagging_loss=0.01177, over 3052538.63 frames. ], batch size: 54, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:06:05,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=303786.6666666667, ans=0.0 2023-11-18 16:06:07,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=303853.3333333333, ans=0.125 2023-11-18 16:06:24,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=303920.0, ans=0.125 2023-11-18 16:06:33,179 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.08 vs. limit=15.0 2023-11-18 16:06:35,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=303986.6666666667, ans=0.125 2023-11-18 16:06:52,132 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9550, loss[loss=0.09523, simple_loss=0.1187, pruned_loss=0.02558, audio_tagging_loss=0.01031, over 16252.00 frames. ], tot_loss[loss=0.109, simple_loss=0.1227, pruned_loss=0.03582, audio_tagging_loss=0.01185, over 3052900.28 frames. ], batch size: 58, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:06:56,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=304120.0, ans=15.0 2023-11-18 16:07:08,530 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.602e+01 1.044e+02 1.160e+02 1.697e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 16:07:14,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=304253.3333333333, ans=0.125 2023-11-18 16:07:48,097 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9600, loss[loss=0.09428, simple_loss=0.107, pruned_loss=0.03047, audio_tagging_loss=0.01032, over 15241.00 frames. ], tot_loss[loss=0.1085, simple_loss=0.1219, pruned_loss=0.03557, audio_tagging_loss=0.01197, over 3045831.57 frames. ], batch size: 60, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:08:28,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=304653.3333333333, ans=0.0 2023-11-18 16:08:38,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=304720.0, ans=0.125 2023-11-18 16:08:43,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=304786.6666666667, ans=0.05 2023-11-18 16:08:44,186 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9650, loss[loss=0.1125, simple_loss=0.1247, pruned_loss=0.03994, audio_tagging_loss=0.01027, over 15471.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.1216, pruned_loss=0.03551, audio_tagging_loss=0.01199, over 3051161.65 frames. ], batch size: 56, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:09:00,537 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.907e+01 9.312e+01 1.013e+02 1.091e+02 1.612e+02, threshold=2.027e+02, percent-clipped=0.0 2023-11-18 16:09:16,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=304986.6666666667, ans=0.125 2023-11-18 16:09:21,444 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.60 vs. limit=15.0 2023-11-18 16:09:24,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=304986.6666666667, ans=0.125 2023-11-18 16:09:28,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=305053.3333333333, ans=0.125 2023-11-18 16:09:39,306 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9700, loss[loss=0.1173, simple_loss=0.1421, pruned_loss=0.03545, audio_tagging_loss=0.01083, over 15375.00 frames. ], tot_loss[loss=0.1085, simple_loss=0.1224, pruned_loss=0.03566, audio_tagging_loss=0.01171, over 3053011.90 frames. ], batch size: 56, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:09:41,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=305120.0, ans=0.125 2023-11-18 16:09:46,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=305120.0, ans=0.0 2023-11-18 16:10:24,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=305386.6666666667, ans=0.125 2023-11-18 16:10:35,411 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9750, loss[loss=0.08797, simple_loss=0.09969, pruned_loss=0.02933, audio_tagging_loss=0.008804, over 14807.00 frames. ], tot_loss[loss=0.1067, simple_loss=0.1201, pruned_loss=0.03497, audio_tagging_loss=0.01165, over 3048477.84 frames. ], batch size: 56, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:10:38,593 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.41 vs. limit=22.5 2023-11-18 16:10:53,087 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 9.337e+01 1.028e+02 1.130e+02 1.491e+02, threshold=2.056e+02, percent-clipped=0.0 2023-11-18 16:11:04,215 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.56 vs. limit=10.0 2023-11-18 16:11:07,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=305586.6666666667, ans=0.05 2023-11-18 16:11:08,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=305653.3333333333, ans=0.125 2023-11-18 16:11:26,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=305720.0, ans=0.1 2023-11-18 16:11:32,510 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9800, loss[loss=0.08368, simple_loss=0.09094, pruned_loss=0.02541, audio_tagging_loss=0.01279, over 13683.00 frames. ], tot_loss[loss=0.1062, simple_loss=0.1193, pruned_loss=0.03484, audio_tagging_loss=0.01167, over 3041961.25 frames. ], batch size: 53, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:11:47,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=305853.3333333333, ans=0.0 2023-11-18 16:12:00,754 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.34 vs. limit=22.5 2023-11-18 16:12:04,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=305986.6666666667, ans=10.0 2023-11-18 16:12:23,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=306053.3333333333, ans=0.04949747468305833 2023-11-18 16:12:23,803 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:12:24,082 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:12:28,133 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9850, loss[loss=0.1092, simple_loss=0.1309, pruned_loss=0.03509, audio_tagging_loss=0.008675, over 15078.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1211, pruned_loss=0.03538, audio_tagging_loss=0.01151, over 3044830.01 frames. ], batch size: 56, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:12:37,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=306186.6666666667, ans=0.125 2023-11-18 16:12:38,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=306186.6666666667, ans=0.125 2023-11-18 16:12:45,061 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.089e+01 9.433e+01 1.029e+02 1.148e+02 1.487e+02, threshold=2.058e+02, percent-clipped=0.0 2023-11-18 16:12:54,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=306253.3333333333, ans=0.125 2023-11-18 16:13:04,619 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=12.0 2023-11-18 16:13:04,809 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2023-11-18 16:13:09,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=306320.0, ans=0.2 2023-11-18 16:13:09,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=306320.0, ans=0.125 2023-11-18 16:13:23,971 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9900, loss[loss=0.101, simple_loss=0.1184, pruned_loss=0.0297, audio_tagging_loss=0.01209, over 15494.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.1204, pruned_loss=0.03541, audio_tagging_loss=0.0116, over 3044464.39 frames. ], batch size: 56, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:13:33,531 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.22 vs. limit=15.0 2023-11-18 16:13:40,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=306520.0, ans=0.125 2023-11-18 16:13:50,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=306586.6666666667, ans=0.125 2023-11-18 16:14:20,552 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 9950, loss[loss=0.06649, simple_loss=0.06456, pruned_loss=0.02117, audio_tagging_loss=0.01303, over 14459.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.1203, pruned_loss=0.03514, audio_tagging_loss=0.01155, over 3050564.15 frames. ], batch size: 56, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:14:23,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=306786.6666666667, ans=0.125 2023-11-18 16:14:23,128 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.27 vs. limit=15.0 2023-11-18 16:14:29,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=306786.6666666667, ans=0.125 2023-11-18 16:14:32,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=306853.3333333333, ans=0.09899494936611666 2023-11-18 16:14:33,917 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.81 vs. limit=22.5 2023-11-18 16:14:36,439 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.909e+01 9.576e+01 1.088e+02 1.219e+02 1.506e+02, threshold=2.175e+02, percent-clipped=0.0 2023-11-18 16:14:42,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=306920.0, ans=0.2 2023-11-18 16:14:54,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=306986.6666666667, ans=0.1 2023-11-18 16:15:05,858 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2023-11-18 16:15:15,755 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10000, loss[loss=0.1142, simple_loss=0.1276, pruned_loss=0.03766, audio_tagging_loss=0.01272, over 16432.00 frames. ], tot_loss[loss=0.1066, simple_loss=0.1199, pruned_loss=0.03506, audio_tagging_loss=0.01157, over 3048058.70 frames. ], batch size: 59, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:16:11,316 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10050, loss[loss=0.0705, simple_loss=0.0656, pruned_loss=0.02087, audio_tagging_loss=0.01683, over 14250.00 frames. ], tot_loss[loss=0.1062, simple_loss=0.1194, pruned_loss=0.03494, audio_tagging_loss=0.01155, over 3054497.28 frames. ], batch size: 57, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:16:11,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=307453.3333333333, ans=0.0 2023-11-18 16:16:25,101 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.23 vs. limit=15.0 2023-11-18 16:16:29,453 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 9.427e+01 1.040e+02 1.141e+02 1.376e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-18 16:16:33,403 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.67 vs. limit=10.0 2023-11-18 16:16:38,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=307586.6666666667, ans=0.125 2023-11-18 16:16:39,779 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.41 vs. limit=12.0 2023-11-18 16:16:56,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=307720.0, ans=0.0 2023-11-18 16:16:57,738 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=307720.0, ans=0.1 2023-11-18 16:16:59,126 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.17 vs. limit=15.0 2023-11-18 16:17:04,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=307720.0, ans=0.1 2023-11-18 16:17:05,127 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2023-11-18 16:17:08,297 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10100, loss[loss=0.1299, simple_loss=0.1532, pruned_loss=0.04399, audio_tagging_loss=0.009276, over 15304.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.1207, pruned_loss=0.03539, audio_tagging_loss=0.01153, over 3055704.59 frames. ], batch size: 54, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:17:08,743 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.87 vs. limit=12.0 2023-11-18 16:17:11,555 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.68 vs. limit=12.0 2023-11-18 16:17:13,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=307786.6666666667, ans=0.09899494936611666 2023-11-18 16:17:29,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=307920.0, ans=0.125 2023-11-18 16:17:37,112 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.62 vs. limit=6.0 2023-11-18 16:17:52,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=308053.3333333333, ans=0.0 2023-11-18 16:17:55,297 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:18:03,728 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10150, loss[loss=0.1031, simple_loss=0.1096, pruned_loss=0.03257, audio_tagging_loss=0.01573, over 15735.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.1207, pruned_loss=0.03558, audio_tagging_loss=0.01172, over 3053326.10 frames. ], batch size: 59, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:18:19,761 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 9.614e+01 1.045e+02 1.146e+02 1.690e+02, threshold=2.090e+02, percent-clipped=0.0 2023-11-18 16:18:26,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=308253.3333333333, ans=0.2 2023-11-18 16:18:29,585 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.55 vs. limit=22.5 2023-11-18 16:18:31,570 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:18:32,070 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.93 vs. limit=15.0 2023-11-18 16:18:39,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=308320.0, ans=0.125 2023-11-18 16:18:43,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=308320.0, ans=0.125 2023-11-18 16:18:47,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=308386.6666666667, ans=0.125 2023-11-18 16:18:49,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=308386.6666666667, ans=0.0 2023-11-18 16:18:51,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=308386.6666666667, ans=0.1 2023-11-18 16:18:52,044 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=308386.6666666667, ans=0.2 2023-11-18 16:18:54,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=308386.6666666667, ans=0.125 2023-11-18 16:18:59,160 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10200, loss[loss=0.114, simple_loss=0.1209, pruned_loss=0.0421, audio_tagging_loss=0.01149, over 15193.00 frames. ], tot_loss[loss=0.1066, simple_loss=0.1196, pruned_loss=0.03511, audio_tagging_loss=0.01173, over 3055697.38 frames. ], batch size: 56, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:19:11,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=308520.0, ans=0.0 2023-11-18 16:19:18,001 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.25 vs. limit=15.0 2023-11-18 16:19:22,796 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:19:36,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=308653.3333333333, ans=0.0 2023-11-18 16:19:37,263 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.26 vs. limit=10.0 2023-11-18 16:19:43,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=308720.0, ans=0.125 2023-11-18 16:19:55,121 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10250, loss[loss=0.07531, simple_loss=0.0779, pruned_loss=0.02082, audio_tagging_loss=0.01554, over 15168.00 frames. ], tot_loss[loss=0.1066, simple_loss=0.1197, pruned_loss=0.03502, audio_tagging_loss=0.01176, over 3059918.84 frames. ], batch size: 60, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:20:11,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=308853.3333333333, ans=0.0 2023-11-18 16:20:12,687 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 9.476e+01 1.039e+02 1.199e+02 1.617e+02, threshold=2.078e+02, percent-clipped=0.0 2023-11-18 16:20:17,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=308920.0, ans=0.125 2023-11-18 16:20:35,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=308986.6666666667, ans=0.1 2023-11-18 16:20:41,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=309053.3333333333, ans=0.125 2023-11-18 16:20:46,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=309053.3333333333, ans=0.125 2023-11-18 16:20:51,934 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10300, loss[loss=0.1139, simple_loss=0.1309, pruned_loss=0.03819, audio_tagging_loss=0.0103, over 16979.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.1204, pruned_loss=0.03553, audio_tagging_loss=0.0119, over 3058886.81 frames. ], batch size: 62, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:21:10,808 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.80 vs. limit=5.0 2023-11-18 16:21:13,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=309253.3333333333, ans=0.0 2023-11-18 16:21:21,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=309253.3333333333, ans=0.0 2023-11-18 16:21:25,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=309320.0, ans=0.09899494936611666 2023-11-18 16:21:27,259 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.55 vs. limit=15.0 2023-11-18 16:21:35,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=309320.0, ans=0.0 2023-11-18 16:21:36,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=309386.6666666667, ans=0.5 2023-11-18 16:21:47,525 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10350, loss[loss=0.1333, simple_loss=0.1515, pruned_loss=0.04573, audio_tagging_loss=0.01184, over 15224.00 frames. ], tot_loss[loss=0.1079, simple_loss=0.121, pruned_loss=0.03548, audio_tagging_loss=0.01189, over 3055028.41 frames. ], batch size: 56, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:22:03,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=309520.0, ans=0.125 2023-11-18 16:22:04,391 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 9.661e+01 1.063e+02 1.175e+02 1.992e+02, threshold=2.126e+02, percent-clipped=0.0 2023-11-18 16:22:29,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=309653.3333333333, ans=0.125 2023-11-18 16:22:32,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=309720.0, ans=0.125 2023-11-18 16:22:42,812 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.50 vs. limit=22.5 2023-11-18 16:22:42,860 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.15 vs. limit=15.0 2023-11-18 16:22:43,318 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10400, loss[loss=0.1412, simple_loss=0.1626, pruned_loss=0.04868, audio_tagging_loss=0.01128, over 16118.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.1214, pruned_loss=0.0354, audio_tagging_loss=0.01197, over 3053912.36 frames. ], batch size: 57, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:22:44,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=309786.6666666667, ans=10.0 2023-11-18 16:22:45,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=309786.6666666667, ans=0.0 2023-11-18 16:23:02,203 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:23:10,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=309920.0, ans=0.07 2023-11-18 16:23:26,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=309986.6666666667, ans=0.0 2023-11-18 16:23:35,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=310053.3333333333, ans=0.025 2023-11-18 16:23:39,769 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10450, loss[loss=0.09084, simple_loss=0.1075, pruned_loss=0.02704, audio_tagging_loss=0.01007, over 14676.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.1216, pruned_loss=0.03541, audio_tagging_loss=0.01197, over 3054553.45 frames. ], batch size: 56, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:23:56,257 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 9.086e+01 9.811e+01 1.148e+02 1.710e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-18 16:24:12,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=310320.0, ans=0.2 2023-11-18 16:24:22,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=310320.0, ans=0.0 2023-11-18 16:24:35,561 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10500, loss[loss=0.08611, simple_loss=0.09149, pruned_loss=0.029, audio_tagging_loss=0.01137, over 14292.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.1205, pruned_loss=0.0352, audio_tagging_loss=0.01179, over 3058113.16 frames. ], batch size: 55, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:24:37,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=310453.3333333333, ans=0.0 2023-11-18 16:24:38,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=310453.3333333333, ans=0.0 2023-11-18 16:24:58,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=310586.6666666667, ans=0.0 2023-11-18 16:24:59,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=310586.6666666667, ans=0.125 2023-11-18 16:25:23,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=310720.0, ans=0.125 2023-11-18 16:25:26,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=310720.0, ans=0.125 2023-11-18 16:25:26,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=310720.0, ans=0.125 2023-11-18 16:25:31,998 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10550, loss[loss=0.1031, simple_loss=0.1256, pruned_loss=0.02941, audio_tagging_loss=0.01086, over 15901.00 frames. ], tot_loss[loss=0.1067, simple_loss=0.1206, pruned_loss=0.03492, audio_tagging_loss=0.01152, over 3056305.31 frames. ], batch size: 61, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:25:38,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=310786.6666666667, ans=0.125 2023-11-18 16:25:40,006 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.30 vs. limit=22.5 2023-11-18 16:25:49,163 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 9.120e+01 1.006e+02 1.112e+02 1.547e+02, threshold=2.011e+02, percent-clipped=0.0 2023-11-18 16:25:58,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=310920.0, ans=0.0 2023-11-18 16:26:05,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=310986.6666666667, ans=0.1 2023-11-18 16:26:07,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=310986.6666666667, ans=0.125 2023-11-18 16:26:28,617 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10600, loss[loss=0.0792, simple_loss=0.08681, pruned_loss=0.02307, audio_tagging_loss=0.01273, over 14197.00 frames. ], tot_loss[loss=0.1059, simple_loss=0.1198, pruned_loss=0.03446, audio_tagging_loss=0.01152, over 3058305.05 frames. ], batch size: 55, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:26:28,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=311120.0, ans=0.125 2023-11-18 16:26:32,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=311120.0, ans=0.0 2023-11-18 16:26:33,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=311120.0, ans=0.0 2023-11-18 16:26:50,066 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.51 vs. limit=15.0 2023-11-18 16:27:01,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=311320.0, ans=0.2 2023-11-18 16:27:07,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=311320.0, ans=0.125 2023-11-18 16:27:24,506 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10650, loss[loss=0.1138, simple_loss=0.1313, pruned_loss=0.03921, audio_tagging_loss=0.008967, over 16347.00 frames. ], tot_loss[loss=0.1058, simple_loss=0.1198, pruned_loss=0.03442, audio_tagging_loss=0.01146, over 3059468.56 frames. ], batch size: 60, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:27:33,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=311453.3333333333, ans=0.1 2023-11-18 16:27:33,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=311453.3333333333, ans=0.125 2023-11-18 16:27:40,906 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 9.767e+01 1.078e+02 1.173e+02 1.612e+02, threshold=2.157e+02, percent-clipped=0.0 2023-11-18 16:27:42,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=311520.0, ans=0.0 2023-11-18 16:28:00,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=311653.3333333333, ans=0.125 2023-11-18 16:28:20,379 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10700, loss[loss=0.1034, simple_loss=0.118, pruned_loss=0.03241, audio_tagging_loss=0.01204, over 14944.00 frames. ], tot_loss[loss=0.1063, simple_loss=0.1203, pruned_loss=0.03464, audio_tagging_loss=0.0115, over 3061769.71 frames. ], batch size: 56, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:28:39,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=311853.3333333333, ans=0.125 2023-11-18 16:29:06,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=312053.3333333333, ans=0.125 2023-11-18 16:29:17,076 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10750, loss[loss=0.09636, simple_loss=0.1037, pruned_loss=0.02993, audio_tagging_loss=0.01458, over 14749.00 frames. ], tot_loss[loss=0.1067, simple_loss=0.1208, pruned_loss=0.03477, audio_tagging_loss=0.01152, over 3062962.64 frames. ], batch size: 57, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:29:33,583 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 9.141e+01 9.911e+01 1.128e+02 1.714e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-18 16:30:02,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=312386.6666666667, ans=0.0 2023-11-18 16:30:08,621 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2023-11-18 16:30:10,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=312386.6666666667, ans=0.125 2023-11-18 16:30:12,489 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10800, loss[loss=0.09646, simple_loss=0.1176, pruned_loss=0.02785, audio_tagging_loss=0.009778, over 15068.00 frames. ], tot_loss[loss=0.107, simple_loss=0.1215, pruned_loss=0.03487, audio_tagging_loss=0.01138, over 3062097.17 frames. ], batch size: 54, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:30:13,974 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.75 vs. limit=10.0 2023-11-18 16:30:29,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=312520.0, ans=0.0 2023-11-18 16:30:36,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=312586.6666666667, ans=0.125 2023-11-18 16:30:39,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=312586.6666666667, ans=10.0 2023-11-18 16:30:40,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=312586.6666666667, ans=0.125 2023-11-18 16:30:45,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=312653.3333333333, ans=0.0 2023-11-18 16:30:45,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=312653.3333333333, ans=0.0 2023-11-18 16:30:47,871 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:30:55,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=312653.3333333333, ans=0.1 2023-11-18 16:31:08,846 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10850, loss[loss=0.09153, simple_loss=0.09342, pruned_loss=0.03101, audio_tagging_loss=0.01381, over 15275.00 frames. ], tot_loss[loss=0.1065, simple_loss=0.1208, pruned_loss=0.03468, audio_tagging_loss=0.01145, over 3057651.87 frames. ], batch size: 60, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:31:25,304 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 9.301e+01 1.024e+02 1.166e+02 1.801e+02, threshold=2.048e+02, percent-clipped=0.0 2023-11-18 16:31:38,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=312920.0, ans=0.125 2023-11-18 16:31:43,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=312986.6666666667, ans=0.1 2023-11-18 16:31:44,028 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.12 vs. limit=10.0 2023-11-18 16:31:58,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=313053.3333333333, ans=0.125 2023-11-18 16:32:02,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=313053.3333333333, ans=0.125 2023-11-18 16:32:03,382 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:32:04,481 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10900, loss[loss=0.1433, simple_loss=0.1621, pruned_loss=0.04975, audio_tagging_loss=0.01254, over 15510.00 frames. ], tot_loss[loss=0.107, simple_loss=0.1213, pruned_loss=0.03497, audio_tagging_loss=0.01143, over 3051098.84 frames. ], batch size: 56, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:32:04,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=313120.0, ans=0.2 2023-11-18 16:32:25,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=313253.3333333333, ans=0.0 2023-11-18 16:32:39,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=313320.0, ans=15.0 2023-11-18 16:32:40,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=313320.0, ans=0.0 2023-11-18 16:32:54,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=313386.6666666667, ans=0.0 2023-11-18 16:32:56,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=313386.6666666667, ans=0.125 2023-11-18 16:32:59,373 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 10950, loss[loss=0.123, simple_loss=0.1501, pruned_loss=0.03718, audio_tagging_loss=0.01076, over 15428.00 frames. ], tot_loss[loss=0.1062, simple_loss=0.1201, pruned_loss=0.03459, audio_tagging_loss=0.0116, over 3053255.18 frames. ], batch size: 56, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:33:07,422 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.24 vs. limit=15.0 2023-11-18 16:33:13,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=313520.0, ans=0.125 2023-11-18 16:33:16,521 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.810e+01 9.324e+01 1.025e+02 1.137e+02 1.491e+02, threshold=2.050e+02, percent-clipped=0.0 2023-11-18 16:33:31,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=313586.6666666667, ans=0.05 2023-11-18 16:33:36,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=313653.3333333333, ans=0.0 2023-11-18 16:33:40,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=313653.3333333333, ans=0.0 2023-11-18 16:33:41,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=313653.3333333333, ans=0.1 2023-11-18 16:33:43,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=313720.0, ans=0.125 2023-11-18 16:33:47,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=313720.0, ans=0.125 2023-11-18 16:33:54,815 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11000, loss[loss=0.104, simple_loss=0.1143, pruned_loss=0.03554, audio_tagging_loss=0.01138, over 14798.00 frames. ], tot_loss[loss=0.1062, simple_loss=0.1201, pruned_loss=0.03455, audio_tagging_loss=0.01158, over 3054171.61 frames. ], batch size: 53, lr: 1.56e-02, grad_scale: 64.0 2023-11-18 16:34:05,361 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.52 vs. limit=12.0 2023-11-18 16:34:05,956 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:34:15,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=313853.3333333333, ans=0.125 2023-11-18 16:34:28,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=313986.6666666667, ans=0.2 2023-11-18 16:34:40,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=314053.3333333333, ans=0.1 2023-11-18 16:34:42,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=314053.3333333333, ans=0.125 2023-11-18 16:34:48,618 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2023-11-18 16:34:50,193 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11050, loss[loss=0.114, simple_loss=0.1255, pruned_loss=0.04049, audio_tagging_loss=0.01076, over 15166.00 frames. ], tot_loss[loss=0.1056, simple_loss=0.1192, pruned_loss=0.03433, audio_tagging_loss=0.01164, over 3046353.58 frames. ], batch size: 59, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:35:06,684 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.121e+01 9.418e+01 1.036e+02 1.168e+02 1.751e+02, threshold=2.073e+02, percent-clipped=0.0 2023-11-18 16:35:18,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=314253.3333333333, ans=0.1 2023-11-18 16:35:30,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=314320.0, ans=0.0 2023-11-18 16:35:44,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=314453.3333333333, ans=0.0 2023-11-18 16:35:45,673 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11100, loss[loss=0.0892, simple_loss=0.09962, pruned_loss=0.02834, audio_tagging_loss=0.01105, over 14093.00 frames. ], tot_loss[loss=0.1051, simple_loss=0.1184, pruned_loss=0.03415, audio_tagging_loss=0.0118, over 3047335.48 frames. ], batch size: 54, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:35:45,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=314453.3333333333, ans=0.0 2023-11-18 16:36:01,976 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.79 vs. limit=15.0 2023-11-18 16:36:05,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=314520.0, ans=0.125 2023-11-18 16:36:05,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=314520.0, ans=0.125 2023-11-18 16:36:13,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=314586.6666666667, ans=0.125 2023-11-18 16:36:27,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=314653.3333333333, ans=0.07 2023-11-18 16:36:35,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=314720.0, ans=0.0 2023-11-18 16:36:38,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=314720.0, ans=0.1 2023-11-18 16:36:40,811 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11150, loss[loss=0.09927, simple_loss=0.1037, pruned_loss=0.03176, audio_tagging_loss=0.01566, over 14048.00 frames. ], tot_loss[loss=0.1053, simple_loss=0.118, pruned_loss=0.03432, audio_tagging_loss=0.01198, over 3048139.04 frames. ], batch size: 55, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:36:42,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=314786.6666666667, ans=0.1 2023-11-18 16:36:44,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=314786.6666666667, ans=0.0 2023-11-18 16:36:58,978 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.048e+01 9.570e+01 1.059e+02 1.181e+02 1.990e+02, threshold=2.118e+02, percent-clipped=0.0 2023-11-18 16:37:01,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=314853.3333333333, ans=0.0 2023-11-18 16:37:07,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=314920.0, ans=0.2 2023-11-18 16:37:21,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=314986.6666666667, ans=0.1 2023-11-18 16:37:22,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=314986.6666666667, ans=10.0 2023-11-18 16:37:37,290 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11200, loss[loss=0.09042, simple_loss=0.09927, pruned_loss=0.02698, audio_tagging_loss=0.0138, over 14150.00 frames. ], tot_loss[loss=0.1052, simple_loss=0.118, pruned_loss=0.03418, audio_tagging_loss=0.01205, over 3039422.86 frames. ], batch size: 54, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:37:37,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=315120.0, ans=10.0 2023-11-18 16:38:06,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=315253.3333333333, ans=0.5 2023-11-18 16:38:24,771 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2023-11-18 16:38:27,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=315386.6666666667, ans=0.2 2023-11-18 16:38:32,606 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11250, loss[loss=0.09646, simple_loss=0.1093, pruned_loss=0.03026, audio_tagging_loss=0.01156, over 15281.00 frames. ], tot_loss[loss=0.1048, simple_loss=0.1175, pruned_loss=0.0341, audio_tagging_loss=0.01194, over 3041489.77 frames. ], batch size: 56, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:38:34,176 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.06 vs. limit=15.0 2023-11-18 16:38:38,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=315453.3333333333, ans=0.0 2023-11-18 16:38:38,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=315453.3333333333, ans=0.125 2023-11-18 16:38:48,486 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 9.211e+01 1.045e+02 1.164e+02 1.761e+02, threshold=2.090e+02, percent-clipped=0.0 2023-11-18 16:39:01,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=315586.6666666667, ans=0.125 2023-11-18 16:39:13,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=315653.3333333333, ans=0.125 2023-11-18 16:39:16,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=315720.0, ans=0.125 2023-11-18 16:39:27,250 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11300, loss[loss=0.09752, simple_loss=0.1054, pruned_loss=0.03251, audio_tagging_loss=0.01233, over 16125.00 frames. ], tot_loss[loss=0.1049, simple_loss=0.1181, pruned_loss=0.03422, audio_tagging_loss=0.0116, over 3043067.86 frames. ], batch size: 63, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:39:44,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=315853.3333333333, ans=0.05 2023-11-18 16:39:52,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=315920.0, ans=0.95 2023-11-18 16:40:06,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=315986.6666666667, ans=0.125 2023-11-18 16:40:16,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=316053.3333333333, ans=0.05 2023-11-18 16:40:22,796 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11350, loss[loss=0.1019, simple_loss=0.1158, pruned_loss=0.03247, audio_tagging_loss=0.01151, over 15106.00 frames. ], tot_loss[loss=0.1053, simple_loss=0.1188, pruned_loss=0.03438, audio_tagging_loss=0.01146, over 3042840.80 frames. ], batch size: 58, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:40:34,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=316186.6666666667, ans=0.1 2023-11-18 16:40:39,331 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.711e+01 9.469e+01 1.052e+02 1.138e+02 1.718e+02, threshold=2.104e+02, percent-clipped=0.0 2023-11-18 16:41:13,519 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=15.0 2023-11-18 16:41:18,437 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11400, loss[loss=0.1014, simple_loss=0.1153, pruned_loss=0.03155, audio_tagging_loss=0.01216, over 14517.00 frames. ], tot_loss[loss=0.1057, simple_loss=0.1193, pruned_loss=0.03462, audio_tagging_loss=0.01142, over 3035995.36 frames. ], batch size: 55, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:41:30,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=316520.0, ans=0.0 2023-11-18 16:41:40,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=316586.6666666667, ans=0.125 2023-11-18 16:41:45,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=316586.6666666667, ans=0.1 2023-11-18 16:42:04,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=316720.0, ans=0.125 2023-11-18 16:42:13,244 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11450, loss[loss=0.1014, simple_loss=0.1175, pruned_loss=0.02968, audio_tagging_loss=0.01294, over 14298.00 frames. ], tot_loss[loss=0.1064, simple_loss=0.1199, pruned_loss=0.03503, audio_tagging_loss=0.0114, over 3040227.03 frames. ], batch size: 55, lr: 1.55e-02, grad_scale: 32.0 2023-11-18 16:42:23,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=316853.3333333333, ans=0.125 2023-11-18 16:42:27,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=316853.3333333333, ans=0.0 2023-11-18 16:42:30,664 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.176e+01 9.649e+01 1.077e+02 1.207e+02 1.681e+02, threshold=2.154e+02, percent-clipped=0.0 2023-11-18 16:42:51,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=316986.6666666667, ans=0.125 2023-11-18 16:43:02,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=317053.3333333333, ans=0.125 2023-11-18 16:43:04,049 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.36 vs. limit=15.0 2023-11-18 16:43:06,900 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:43:08,881 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11500, loss[loss=0.1016, simple_loss=0.1228, pruned_loss=0.02925, audio_tagging_loss=0.01094, over 14287.00 frames. ], tot_loss[loss=0.1066, simple_loss=0.12, pruned_loss=0.03516, audio_tagging_loss=0.0114, over 3047262.25 frames. ], batch size: 55, lr: 1.55e-02, grad_scale: 32.0 2023-11-18 16:43:17,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=317120.0, ans=0.0 2023-11-18 16:43:44,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=317320.0, ans=0.125 2023-11-18 16:43:52,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=317386.6666666667, ans=0.1 2023-11-18 16:43:57,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=317386.6666666667, ans=0.0 2023-11-18 16:44:05,435 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11550, loss[loss=0.113, simple_loss=0.1302, pruned_loss=0.03578, audio_tagging_loss=0.01206, over 15681.00 frames. ], tot_loss[loss=0.1064, simple_loss=0.1202, pruned_loss=0.03492, audio_tagging_loss=0.01137, over 3045084.15 frames. ], batch size: 60, lr: 1.55e-02, grad_scale: 16.0 2023-11-18 16:44:22,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=317520.0, ans=0.0 2023-11-18 16:44:23,382 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 9.289e+01 1.045e+02 1.175e+02 1.806e+02, threshold=2.091e+02, percent-clipped=0.0 2023-11-18 16:44:41,085 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:44:52,993 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-11-18 16:44:57,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=317720.0, ans=0.09899494936611666 2023-11-18 16:44:57,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=317720.0, ans=0.0 2023-11-18 16:45:00,901 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11600, loss[loss=0.1321, simple_loss=0.1563, pruned_loss=0.04403, audio_tagging_loss=0.009959, over 15877.00 frames. ], tot_loss[loss=0.108, simple_loss=0.1221, pruned_loss=0.03552, audio_tagging_loss=0.01138, over 3047221.04 frames. ], batch size: 57, lr: 1.55e-02, grad_scale: 32.0 2023-11-18 16:45:06,712 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2023-11-18 16:45:27,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=317920.0, ans=0.1 2023-11-18 16:45:39,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=317986.6666666667, ans=0.125 2023-11-18 16:45:52,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=318053.3333333333, ans=0.0 2023-11-18 16:45:53,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=318053.3333333333, ans=0.2 2023-11-18 16:45:56,545 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11650, loss[loss=0.06175, simple_loss=0.07175, pruned_loss=0.01107, audio_tagging_loss=0.01481, over 14303.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1211, pruned_loss=0.03505, audio_tagging_loss=0.01151, over 3045200.78 frames. ], batch size: 57, lr: 1.55e-02, grad_scale: 32.0 2023-11-18 16:45:56,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=318120.0, ans=0.125 2023-11-18 16:46:01,414 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.34 vs. limit=6.0 2023-11-18 16:46:12,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=318186.6666666667, ans=0.0 2023-11-18 16:46:14,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=318186.6666666667, ans=0.2 2023-11-18 16:46:15,785 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.644e+01 9.455e+01 1.056e+02 1.163e+02 1.452e+02, threshold=2.111e+02, percent-clipped=0.0 2023-11-18 16:46:20,405 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=12.0 2023-11-18 16:46:27,136 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.72 vs. limit=15.0 2023-11-18 16:46:37,133 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2023-11-18 16:46:43,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=318386.6666666667, ans=0.1 2023-11-18 16:46:51,866 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11700, loss[loss=0.135, simple_loss=0.1632, pruned_loss=0.04478, audio_tagging_loss=0.008659, over 15301.00 frames. ], tot_loss[loss=0.1062, simple_loss=0.1199, pruned_loss=0.0347, audio_tagging_loss=0.01158, over 3041460.98 frames. ], batch size: 57, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:47:02,445 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.92 vs. limit=15.0 2023-11-18 16:47:31,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=318653.3333333333, ans=0.0 2023-11-18 16:47:34,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=318653.3333333333, ans=0.125 2023-11-18 16:47:36,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=318720.0, ans=6.0 2023-11-18 16:47:43,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=318720.0, ans=0.125 2023-11-18 16:47:47,738 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11750, loss[loss=0.1052, simple_loss=0.1258, pruned_loss=0.031, audio_tagging_loss=0.01129, over 15468.00 frames. ], tot_loss[loss=0.1062, simple_loss=0.1201, pruned_loss=0.0346, audio_tagging_loss=0.01156, over 3038248.19 frames. ], batch size: 58, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:48:06,358 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.155e+01 9.788e+01 1.106e+02 1.226e+02 1.834e+02, threshold=2.212e+02, percent-clipped=0.0 2023-11-18 16:48:09,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=318920.0, ans=0.125 2023-11-18 16:48:15,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=318920.0, ans=0.0 2023-11-18 16:48:20,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=318986.6666666667, ans=0.0 2023-11-18 16:48:31,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=319053.3333333333, ans=0.2 2023-11-18 16:48:39,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=319053.3333333333, ans=0.125 2023-11-18 16:48:43,591 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11800, loss[loss=0.1142, simple_loss=0.1318, pruned_loss=0.03927, audio_tagging_loss=0.009033, over 16344.00 frames. ], tot_loss[loss=0.1064, simple_loss=0.1201, pruned_loss=0.03474, audio_tagging_loss=0.01159, over 3043395.03 frames. ], batch size: 61, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:49:32,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=319386.6666666667, ans=0.5 2023-11-18 16:49:39,793 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11850, loss[loss=0.1078, simple_loss=0.114, pruned_loss=0.03933, audio_tagging_loss=0.0115, over 15944.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1197, pruned_loss=0.03444, audio_tagging_loss=0.01175, over 3046932.36 frames. ], batch size: 60, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:49:41,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=319453.3333333333, ans=0.0 2023-11-18 16:49:44,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=319453.3333333333, ans=0.125 2023-11-18 16:49:47,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=319453.3333333333, ans=0.1 2023-11-18 16:49:47,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=319453.3333333333, ans=0.125 2023-11-18 16:49:50,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=319520.0, ans=0.0 2023-11-18 16:49:58,410 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 9.740e+01 1.079e+02 1.230e+02 2.254e+02, threshold=2.157e+02, percent-clipped=1.0 2023-11-18 16:49:58,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=319520.0, ans=0.05 2023-11-18 16:49:58,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=319520.0, ans=0.0 2023-11-18 16:50:13,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=319653.3333333333, ans=0.125 2023-11-18 16:50:30,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=319720.0, ans=0.125 2023-11-18 16:50:33,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=319720.0, ans=0.125 2023-11-18 16:50:34,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=319786.6666666667, ans=0.125 2023-11-18 16:50:34,990 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11900, loss[loss=0.09319, simple_loss=0.1032, pruned_loss=0.02734, audio_tagging_loss=0.01426, over 16551.00 frames. ], tot_loss[loss=0.1056, simple_loss=0.1192, pruned_loss=0.0341, audio_tagging_loss=0.01191, over 3048742.25 frames. ], batch size: 64, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:50:57,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=319920.0, ans=0.0 2023-11-18 16:51:06,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=319920.0, ans=0.125 2023-11-18 16:51:09,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=319986.6666666667, ans=0.0 2023-11-18 16:51:23,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=320053.3333333333, ans=0.125 2023-11-18 16:51:32,920 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 11950, loss[loss=0.0698, simple_loss=0.07539, pruned_loss=0.01726, audio_tagging_loss=0.01484, over 14071.00 frames. ], tot_loss[loss=0.1054, simple_loss=0.1188, pruned_loss=0.03396, audio_tagging_loss=0.01205, over 3048139.43 frames. ], batch size: 53, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:51:42,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=320120.0, ans=0.1 2023-11-18 16:51:52,479 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.985e+01 9.226e+01 1.013e+02 1.097e+02 1.681e+02, threshold=2.026e+02, percent-clipped=0.0 2023-11-18 16:52:11,298 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.63 vs. limit=15.0 2023-11-18 16:52:18,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=320386.6666666667, ans=0.125 2023-11-18 16:52:19,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=320386.6666666667, ans=0.125 2023-11-18 16:52:22,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=320386.6666666667, ans=0.125 2023-11-18 16:52:22,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=320386.6666666667, ans=0.125 2023-11-18 16:52:27,181 INFO [train_asr.py:1115] (2/4) Epoch 4, batch 12000, loss[loss=0.07661, simple_loss=0.07954, pruned_loss=0.02174, audio_tagging_loss=0.0151, over 14826.00 frames. ], tot_loss[loss=0.1056, simple_loss=0.1189, pruned_loss=0.03407, audio_tagging_loss=0.01211, over 3045989.99 frames. ], batch size: 59, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:52:27,181 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 16:53:00,116 INFO [train_asr.py:1147] (2/4) Epoch 4, validation: loss=0.07553, simple_loss=0.06151, pruned_loss=0.009833, audio_tagging_loss=0.03495, over 4681554.00 frames. 2023-11-18 16:53:00,116 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 16:53:15,751 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.74 vs. limit=22.5 2023-11-18 16:53:18,610 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=320520.0, ans=0.07 2023-11-18 16:54:03,927 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 0, loss[loss=0.1199, simple_loss=0.1223, pruned_loss=0.03302, audio_tagging_loss=0.0257, over 15806.00 frames. ], tot_loss[loss=0.1199, simple_loss=0.1223, pruned_loss=0.03302, audio_tagging_loss=0.0257, over 15806.00 frames. ], batch size: 58, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:54:03,928 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 16:54:35,515 INFO [train_asr.py:1147] (2/4) Epoch 5, validation: loss=0.07399, simple_loss=0.06162, pruned_loss=0.009934, audio_tagging_loss=0.03325, over 4681554.00 frames. 2023-11-18 16:54:35,516 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 16:54:53,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=320693.3333333333, ans=0.05 2023-11-18 16:54:54,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=320693.3333333333, ans=0.125 2023-11-18 16:55:03,151 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2023-11-18 16:55:19,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=320893.3333333333, ans=0.0 2023-11-18 16:55:21,603 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.951e+01 9.535e+01 1.056e+02 1.198e+02 1.542e+02, threshold=2.112e+02, percent-clipped=0.0 2023-11-18 16:55:23,526 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.83 vs. limit=15.0 2023-11-18 16:55:31,309 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 50, loss[loss=0.1044, simple_loss=0.1129, pruned_loss=0.02887, audio_tagging_loss=0.01908, over 15141.00 frames. ], tot_loss[loss=0.1131, simple_loss=0.1168, pruned_loss=0.03231, audio_tagging_loss=0.02246, over 695189.75 frames. ], batch size: 59, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:55:52,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=321093.3333333333, ans=0.05 2023-11-18 16:55:54,759 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.83 vs. limit=12.0 2023-11-18 16:55:58,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=321093.3333333333, ans=0.125 2023-11-18 16:56:13,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=321160.0, ans=0.125 2023-11-18 16:56:25,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=321293.3333333333, ans=0.04949747468305833 2023-11-18 16:56:26,667 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 100, loss[loss=0.1146, simple_loss=0.1055, pruned_loss=0.038, audio_tagging_loss=0.02388, over 15871.00 frames. ], tot_loss[loss=0.1117, simple_loss=0.1154, pruned_loss=0.03214, audio_tagging_loss=0.02183, over 1217517.64 frames. ], batch size: 61, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:56:33,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=321293.3333333333, ans=0.0 2023-11-18 16:56:38,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=321360.0, ans=6.0 2023-11-18 16:56:44,620 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.14 vs. limit=10.0 2023-11-18 16:57:12,269 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 9.566e+01 1.064e+02 1.154e+02 1.620e+02, threshold=2.127e+02, percent-clipped=0.0 2023-11-18 16:57:22,395 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 150, loss[loss=0.1164, simple_loss=0.1281, pruned_loss=0.03695, audio_tagging_loss=0.01546, over 15057.00 frames. ], tot_loss[loss=0.1101, simple_loss=0.1167, pruned_loss=0.03242, audio_tagging_loss=0.01938, over 1623273.43 frames. ], batch size: 57, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:57:27,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=321626.6666666667, ans=0.125 2023-11-18 16:57:30,010 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.60 vs. limit=15.0 2023-11-18 16:57:44,918 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:57:46,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=321760.0, ans=0.125 2023-11-18 16:57:46,445 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2023-11-18 16:58:02,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=321826.6666666667, ans=15.0 2023-11-18 16:58:13,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=321893.3333333333, ans=0.125 2023-11-18 16:58:17,753 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 200, loss[loss=0.1292, simple_loss=0.1512, pruned_loss=0.04119, audio_tagging_loss=0.01242, over 15798.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1179, pruned_loss=0.03302, audio_tagging_loss=0.01724, over 1934694.39 frames. ], batch size: 59, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:58:18,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=321960.0, ans=0.0 2023-11-18 16:58:19,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=321960.0, ans=0.125 2023-11-18 16:58:23,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=321960.0, ans=0.0 2023-11-18 16:58:38,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=322026.6666666667, ans=0.125 2023-11-18 16:59:03,728 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.692e+01 9.231e+01 1.044e+02 1.147e+02 1.591e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 16:59:14,445 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 250, loss[loss=0.1236, simple_loss=0.1484, pruned_loss=0.04125, audio_tagging_loss=0.008189, over 16312.00 frames. ], tot_loss[loss=0.1099, simple_loss=0.1207, pruned_loss=0.0342, audio_tagging_loss=0.01532, over 2176948.16 frames. ], batch size: 59, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:59:24,496 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.58 vs. limit=15.0 2023-11-18 16:59:28,991 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.81 vs. limit=15.0 2023-11-18 16:59:30,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=322360.0, ans=0.05 2023-11-18 16:59:42,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=322426.6666666667, ans=0.125 2023-11-18 16:59:44,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=322426.6666666667, ans=0.125 2023-11-18 17:00:00,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=322560.0, ans=0.0 2023-11-18 17:00:09,719 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 300, loss[loss=0.09766, simple_loss=0.1172, pruned_loss=0.02717, audio_tagging_loss=0.01191, over 15098.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.1197, pruned_loss=0.03374, audio_tagging_loss=0.01411, over 2372449.60 frames. ], batch size: 58, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:00:49,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=322826.6666666667, ans=0.1 2023-11-18 17:00:56,002 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 9.144e+01 1.032e+02 1.177e+02 1.892e+02, threshold=2.064e+02, percent-clipped=0.0 2023-11-18 17:00:56,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=322893.3333333333, ans=0.1 2023-11-18 17:01:02,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=322893.3333333333, ans=0.125 2023-11-18 17:01:06,801 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 350, loss[loss=0.1016, simple_loss=0.1142, pruned_loss=0.03217, audio_tagging_loss=0.01233, over 15529.00 frames. ], tot_loss[loss=0.1073, simple_loss=0.1199, pruned_loss=0.03393, audio_tagging_loss=0.01349, over 2516245.18 frames. ], batch size: 58, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:01:10,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=322960.0, ans=0.125 2023-11-18 17:01:17,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=323026.6666666667, ans=0.0 2023-11-18 17:01:18,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=323026.6666666667, ans=0.04949747468305833 2023-11-18 17:01:27,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=323026.6666666667, ans=0.2 2023-11-18 17:01:27,686 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.67 vs. limit=12.0 2023-11-18 17:02:03,607 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 400, loss[loss=0.0584, simple_loss=0.0608, pruned_loss=0.0157, audio_tagging_loss=0.01231, over 15179.00 frames. ], tot_loss[loss=0.1062, simple_loss=0.1193, pruned_loss=0.03363, audio_tagging_loss=0.01299, over 2635666.26 frames. ], batch size: 58, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:02:04,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=323293.3333333333, ans=0.125 2023-11-18 17:02:10,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=323293.3333333333, ans=0.125 2023-11-18 17:02:10,465 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.84 vs. limit=15.0 2023-11-18 17:02:27,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=323426.6666666667, ans=0.1 2023-11-18 17:02:35,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=323493.3333333333, ans=0.0 2023-11-18 17:02:37,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=323493.3333333333, ans=0.125 2023-11-18 17:02:42,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=323493.3333333333, ans=0.125 2023-11-18 17:02:49,144 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 9.956e+01 1.111e+02 1.271e+02 1.658e+02, threshold=2.223e+02, percent-clipped=0.0 2023-11-18 17:02:51,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=323560.0, ans=0.125 2023-11-18 17:02:55,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=323560.0, ans=0.125 2023-11-18 17:02:58,839 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 450, loss[loss=0.1177, simple_loss=0.1343, pruned_loss=0.04101, audio_tagging_loss=0.009553, over 15549.00 frames. ], tot_loss[loss=0.1051, simple_loss=0.1181, pruned_loss=0.03336, audio_tagging_loss=0.01267, over 2724582.01 frames. ], batch size: 58, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:03:07,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=323626.6666666667, ans=0.1 2023-11-18 17:03:14,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=323693.3333333333, ans=0.125 2023-11-18 17:03:16,611 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.79 vs. limit=15.0 2023-11-18 17:03:18,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=323693.3333333333, ans=0.0 2023-11-18 17:03:48,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=323893.3333333333, ans=0.0 2023-11-18 17:03:54,615 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 500, loss[loss=0.09592, simple_loss=0.1048, pruned_loss=0.03328, audio_tagging_loss=0.01024, over 14888.00 frames. ], tot_loss[loss=0.1054, simple_loss=0.1188, pruned_loss=0.03361, audio_tagging_loss=0.01233, over 2793853.18 frames. ], batch size: 56, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:03:55,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=323960.0, ans=10.0 2023-11-18 17:03:56,964 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:04:16,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=324093.3333333333, ans=0.125 2023-11-18 17:04:23,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=324093.3333333333, ans=0.1 2023-11-18 17:04:40,509 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.977e+01 9.068e+01 9.771e+01 1.090e+02 1.763e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-18 17:04:41,975 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=15.0 2023-11-18 17:04:43,713 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.46 vs. limit=15.0 2023-11-18 17:04:51,723 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 550, loss[loss=0.1024, simple_loss=0.1146, pruned_loss=0.03399, audio_tagging_loss=0.01115, over 14778.00 frames. ], tot_loss[loss=0.1051, simple_loss=0.119, pruned_loss=0.03343, audio_tagging_loss=0.01221, over 2854026.24 frames. ], batch size: 55, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:05:17,009 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:05:46,836 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 600, loss[loss=0.08246, simple_loss=0.08721, pruned_loss=0.02857, audio_tagging_loss=0.01028, over 15646.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1173, pruned_loss=0.03303, audio_tagging_loss=0.01207, over 2886686.00 frames. ], batch size: 61, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:06:32,197 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.460e+01 9.095e+01 1.023e+02 1.155e+02 1.808e+02, threshold=2.046e+02, percent-clipped=0.0 2023-11-18 17:06:34,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=324893.3333333333, ans=0.2 2023-11-18 17:06:39,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=324893.3333333333, ans=0.5 2023-11-18 17:06:42,442 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 650, loss[loss=0.1245, simple_loss=0.1427, pruned_loss=0.0415, audio_tagging_loss=0.01163, over 15729.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1158, pruned_loss=0.03275, audio_tagging_loss=0.01213, over 2927332.36 frames. ], batch size: 60, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:06:49,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=324960.0, ans=0.125 2023-11-18 17:06:51,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=324960.0, ans=0.125 2023-11-18 17:07:00,015 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2023-11-18 17:07:12,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=325093.3333333333, ans=10.0 2023-11-18 17:07:23,857 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.88 vs. limit=15.0 2023-11-18 17:07:26,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=325226.6666666667, ans=0.1 2023-11-18 17:07:37,858 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 700, loss[loss=0.08243, simple_loss=0.08555, pruned_loss=0.02311, audio_tagging_loss=0.01655, over 14958.00 frames. ], tot_loss[loss=0.1035, simple_loss=0.117, pruned_loss=0.03303, audio_tagging_loss=0.01199, over 2956571.18 frames. ], batch size: 56, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:07:55,465 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.19 vs. limit=15.0 2023-11-18 17:07:57,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=325360.0, ans=0.05 2023-11-18 17:08:04,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=325426.6666666667, ans=0.125 2023-11-18 17:08:05,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=325426.6666666667, ans=0.2 2023-11-18 17:08:13,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=325493.3333333333, ans=0.125 2023-11-18 17:08:24,352 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 9.264e+01 9.972e+01 1.138e+02 1.580e+02, threshold=1.994e+02, percent-clipped=0.0 2023-11-18 17:08:33,995 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 750, loss[loss=0.1112, simple_loss=0.1361, pruned_loss=0.03534, audio_tagging_loss=0.007876, over 14434.00 frames. ], tot_loss[loss=0.1039, simple_loss=0.1179, pruned_loss=0.03313, audio_tagging_loss=0.01184, over 2976840.23 frames. ], batch size: 56, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:08:45,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=325693.3333333333, ans=0.125 2023-11-18 17:08:48,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=325693.3333333333, ans=0.125 2023-11-18 17:08:49,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=325693.3333333333, ans=0.2 2023-11-18 17:08:53,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=325693.3333333333, ans=0.125 2023-11-18 17:09:06,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=325760.0, ans=0.125 2023-11-18 17:09:24,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=325893.3333333333, ans=0.125 2023-11-18 17:09:29,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=325960.0, ans=0.125 2023-11-18 17:09:29,760 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 800, loss[loss=0.08818, simple_loss=0.1021, pruned_loss=0.02496, audio_tagging_loss=0.01216, over 15142.00 frames. ], tot_loss[loss=0.1047, simple_loss=0.1188, pruned_loss=0.03336, audio_tagging_loss=0.01194, over 2989083.70 frames. ], batch size: 58, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:10:15,468 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.292e+01 9.696e+01 1.116e+02 1.261e+02 1.745e+02, threshold=2.231e+02, percent-clipped=0.0 2023-11-18 17:10:24,992 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 850, loss[loss=0.1107, simple_loss=0.1343, pruned_loss=0.03478, audio_tagging_loss=0.008802, over 14794.00 frames. ], tot_loss[loss=0.1041, simple_loss=0.118, pruned_loss=0.03314, audio_tagging_loss=0.01195, over 3008215.97 frames. ], batch size: 54, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:10:25,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=326293.3333333333, ans=0.125 2023-11-18 17:11:14,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=326560.0, ans=0.125 2023-11-18 17:11:21,921 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 900, loss[loss=0.1072, simple_loss=0.1218, pruned_loss=0.03149, audio_tagging_loss=0.01481, over 15981.00 frames. ], tot_loss[loss=0.1039, simple_loss=0.1176, pruned_loss=0.03308, audio_tagging_loss=0.01197, over 3018717.20 frames. ], batch size: 60, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:11:25,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=326626.6666666667, ans=0.5 2023-11-18 17:11:27,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=326626.6666666667, ans=0.125 2023-11-18 17:11:33,648 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.44 vs. limit=15.0 2023-11-18 17:12:08,183 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 9.412e+01 1.033e+02 1.138e+02 1.840e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 17:12:17,656 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 950, loss[loss=0.07244, simple_loss=0.08204, pruned_loss=0.01872, audio_tagging_loss=0.0127, over 14637.00 frames. ], tot_loss[loss=0.1035, simple_loss=0.1172, pruned_loss=0.03311, audio_tagging_loss=0.01184, over 3017517.01 frames. ], batch size: 55, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:12:21,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=326960.0, ans=0.125 2023-11-18 17:12:27,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=327026.6666666667, ans=0.0 2023-11-18 17:12:32,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=327026.6666666667, ans=0.1 2023-11-18 17:12:39,970 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.56 vs. limit=15.0 2023-11-18 17:13:05,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=327226.6666666667, ans=0.125 2023-11-18 17:13:05,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=327226.6666666667, ans=0.125 2023-11-18 17:13:13,605 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1000, loss[loss=0.07138, simple_loss=0.07341, pruned_loss=0.0186, audio_tagging_loss=0.01607, over 16191.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1164, pruned_loss=0.03283, audio_tagging_loss=0.01175, over 3022956.66 frames. ], batch size: 63, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:13:24,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=327360.0, ans=0.125 2023-11-18 17:13:38,757 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:14:00,045 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 9.245e+01 1.040e+02 1.129e+02 1.708e+02, threshold=2.081e+02, percent-clipped=0.0 2023-11-18 17:14:10,346 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1050, loss[loss=0.1226, simple_loss=0.1422, pruned_loss=0.04022, audio_tagging_loss=0.01133, over 15921.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1166, pruned_loss=0.03295, audio_tagging_loss=0.01159, over 3031756.95 frames. ], batch size: 60, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:14:25,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=327693.3333333333, ans=0.0 2023-11-18 17:14:27,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=327693.3333333333, ans=0.2 2023-11-18 17:14:29,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=327693.3333333333, ans=0.125 2023-11-18 17:14:43,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=327826.6666666667, ans=0.1 2023-11-18 17:14:49,148 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.84 vs. limit=15.0 2023-11-18 17:15:01,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=327893.3333333333, ans=0.125 2023-11-18 17:15:06,216 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1100, loss[loss=0.09443, simple_loss=0.1049, pruned_loss=0.02909, audio_tagging_loss=0.01292, over 16690.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.116, pruned_loss=0.0328, audio_tagging_loss=0.01151, over 3033208.56 frames. ], batch size: 65, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:15:09,918 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:15:28,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=328093.3333333333, ans=0.1 2023-11-18 17:15:45,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=328160.0, ans=0.125 2023-11-18 17:15:52,626 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.960e+01 8.982e+01 9.877e+01 1.122e+02 1.591e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-18 17:15:58,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=328226.6666666667, ans=0.0 2023-11-18 17:16:02,177 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1150, loss[loss=0.1094, simple_loss=0.122, pruned_loss=0.03501, audio_tagging_loss=0.01336, over 14592.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.1172, pruned_loss=0.03322, audio_tagging_loss=0.01142, over 3039186.73 frames. ], batch size: 56, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:16:04,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=328293.3333333333, ans=0.125 2023-11-18 17:16:09,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=328293.3333333333, ans=0.125 2023-11-18 17:16:34,794 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2023-11-18 17:16:35,784 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=12.0 2023-11-18 17:16:39,255 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.11 vs. limit=15.0 2023-11-18 17:16:49,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=328560.0, ans=0.0 2023-11-18 17:16:58,661 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1200, loss[loss=0.1131, simple_loss=0.1294, pruned_loss=0.03701, audio_tagging_loss=0.01138, over 14884.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1172, pruned_loss=0.0331, audio_tagging_loss=0.01137, over 3038994.25 frames. ], batch size: 56, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:17:11,610 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.92 vs. limit=10.0 2023-11-18 17:17:12,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=328693.3333333333, ans=0.0 2023-11-18 17:17:27,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=328760.0, ans=0.125 2023-11-18 17:17:35,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=328826.6666666667, ans=0.07 2023-11-18 17:17:39,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=328826.6666666667, ans=0.125 2023-11-18 17:17:44,329 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 9.512e+01 1.070e+02 1.237e+02 2.001e+02, threshold=2.140e+02, percent-clipped=1.0 2023-11-18 17:17:45,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=328893.3333333333, ans=0.0 2023-11-18 17:17:48,237 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.19 vs. limit=22.5 2023-11-18 17:17:54,574 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1250, loss[loss=0.08626, simple_loss=0.09796, pruned_loss=0.02241, audio_tagging_loss=0.01487, over 14724.00 frames. ], tot_loss[loss=0.1038, simple_loss=0.1177, pruned_loss=0.03343, audio_tagging_loss=0.01148, over 3040827.67 frames. ], batch size: 58, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:18:13,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=329026.6666666667, ans=0.125 2023-11-18 17:18:14,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=329026.6666666667, ans=0.125 2023-11-18 17:18:33,859 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.07 vs. limit=10.0 2023-11-18 17:18:34,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=329160.0, ans=0.0 2023-11-18 17:18:50,351 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1300, loss[loss=0.1018, simple_loss=0.1027, pruned_loss=0.03576, audio_tagging_loss=0.01471, over 15386.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.1174, pruned_loss=0.03327, audio_tagging_loss=0.01136, over 3039289.10 frames. ], batch size: 58, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:18:54,114 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.59 vs. limit=15.0 2023-11-18 17:18:56,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten.whitening_limit, batch_count=329293.3333333333, ans=22.5 2023-11-18 17:19:02,743 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.05 vs. limit=22.5 2023-11-18 17:19:07,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=329360.0, ans=0.125 2023-11-18 17:19:15,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=329426.6666666667, ans=0.0 2023-11-18 17:19:16,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=329426.6666666667, ans=0.125 2023-11-18 17:19:20,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=329426.6666666667, ans=0.125 2023-11-18 17:19:34,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=329560.0, ans=0.07 2023-11-18 17:19:36,209 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 9.287e+01 1.028e+02 1.154e+02 1.858e+02, threshold=2.056e+02, percent-clipped=0.0 2023-11-18 17:19:46,959 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1350, loss[loss=0.1053, simple_loss=0.1247, pruned_loss=0.03365, audio_tagging_loss=0.009288, over 16080.00 frames. ], tot_loss[loss=0.103, simple_loss=0.1169, pruned_loss=0.03319, audio_tagging_loss=0.01136, over 3040126.74 frames. ], batch size: 60, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:20:03,365 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.53 vs. limit=15.0 2023-11-18 17:20:06,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=329693.3333333333, ans=0.125 2023-11-18 17:20:06,800 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.94 vs. limit=6.0 2023-11-18 17:20:12,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=329760.0, ans=0.0 2023-11-18 17:20:23,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=329826.6666666667, ans=0.0 2023-11-18 17:20:23,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=329826.6666666667, ans=0.2 2023-11-18 17:20:28,586 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:20:33,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=329893.3333333333, ans=0.125 2023-11-18 17:20:35,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=329893.3333333333, ans=0.1 2023-11-18 17:20:42,468 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1400, loss[loss=0.1115, simple_loss=0.1307, pruned_loss=0.0349, audio_tagging_loss=0.01126, over 15448.00 frames. ], tot_loss[loss=0.103, simple_loss=0.1172, pruned_loss=0.0331, audio_tagging_loss=0.01134, over 3044837.42 frames. ], batch size: 57, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:21:19,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=330160.0, ans=0.0 2023-11-18 17:21:19,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=330160.0, ans=0.1 2023-11-18 17:21:27,664 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 9.034e+01 1.046e+02 1.162e+02 2.116e+02, threshold=2.092e+02, percent-clipped=1.0 2023-11-18 17:21:38,686 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1450, loss[loss=0.07432, simple_loss=0.08186, pruned_loss=0.02159, audio_tagging_loss=0.01181, over 15299.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1169, pruned_loss=0.03317, audio_tagging_loss=0.0115, over 3051828.67 frames. ], batch size: 56, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:21:44,239 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=6.544e-01 2023-11-18 17:22:15,984 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.07 vs. limit=6.0 2023-11-18 17:22:18,661 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.90 vs. limit=15.0 2023-11-18 17:22:34,985 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1500, loss[loss=0.1008, simple_loss=0.1098, pruned_loss=0.03262, audio_tagging_loss=0.01324, over 15166.00 frames. ], tot_loss[loss=0.1035, simple_loss=0.1175, pruned_loss=0.03318, audio_tagging_loss=0.01152, over 3043190.86 frames. ], batch size: 59, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:22:39,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=330626.6666666667, ans=0.125 2023-11-18 17:22:44,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=330626.6666666667, ans=0.125 2023-11-18 17:23:04,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=330760.0, ans=0.1 2023-11-18 17:23:11,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=330826.6666666667, ans=10.0 2023-11-18 17:23:15,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=330826.6666666667, ans=0.125 2023-11-18 17:23:16,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=330826.6666666667, ans=0.125 2023-11-18 17:23:20,981 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.810e+01 9.325e+01 1.048e+02 1.222e+02 1.829e+02, threshold=2.095e+02, percent-clipped=0.0 2023-11-18 17:23:30,605 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1550, loss[loss=0.1051, simple_loss=0.1166, pruned_loss=0.03507, audio_tagging_loss=0.01173, over 15555.00 frames. ], tot_loss[loss=0.1043, simple_loss=0.1183, pruned_loss=0.03346, audio_tagging_loss=0.01175, over 3045304.92 frames. ], batch size: 59, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:23:34,447 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.16 vs. limit=15.0 2023-11-18 17:23:42,277 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.60 vs. limit=10.0 2023-11-18 17:23:47,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=331026.6666666667, ans=0.125 2023-11-18 17:24:12,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=331160.0, ans=0.125 2023-11-18 17:24:26,282 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1600, loss[loss=0.06345, simple_loss=0.06348, pruned_loss=0.01672, audio_tagging_loss=0.01499, over 17172.00 frames. ], tot_loss[loss=0.1034, simple_loss=0.117, pruned_loss=0.03309, audio_tagging_loss=0.01185, over 3044092.31 frames. ], batch size: 68, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:24:30,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=331293.3333333333, ans=0.125 2023-11-18 17:24:41,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=331360.0, ans=0.0 2023-11-18 17:25:04,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=331493.3333333333, ans=0.0 2023-11-18 17:25:12,768 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 9.229e+01 1.020e+02 1.137e+02 1.712e+02, threshold=2.040e+02, percent-clipped=0.0 2023-11-18 17:25:17,607 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.43 vs. limit=15.0 2023-11-18 17:25:23,410 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1650, loss[loss=0.08894, simple_loss=0.1024, pruned_loss=0.02664, audio_tagging_loss=0.0111, over 14951.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1165, pruned_loss=0.03273, audio_tagging_loss=0.01187, over 3042016.28 frames. ], batch size: 58, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:25:28,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=331626.6666666667, ans=0.05 2023-11-18 17:25:35,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=331693.3333333333, ans=10.0 2023-11-18 17:25:49,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=331760.0, ans=0.125 2023-11-18 17:26:00,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=331826.6666666667, ans=0.125 2023-11-18 17:26:12,771 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.563e-03 2023-11-18 17:26:18,875 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1700, loss[loss=0.1044, simple_loss=0.1179, pruned_loss=0.03357, audio_tagging_loss=0.01185, over 15373.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.1171, pruned_loss=0.03288, audio_tagging_loss=0.01183, over 3037410.85 frames. ], batch size: 56, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:26:24,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=331960.0, ans=0.0 2023-11-18 17:27:00,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=332160.0, ans=0.125 2023-11-18 17:27:06,290 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.601e+01 9.403e+01 1.012e+02 1.120e+02 1.645e+02, threshold=2.024e+02, percent-clipped=0.0 2023-11-18 17:27:09,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=332226.6666666667, ans=0.05 2023-11-18 17:27:12,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=332226.6666666667, ans=0.125 2023-11-18 17:27:14,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=332293.3333333333, ans=0.09899494936611666 2023-11-18 17:27:15,291 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1750, loss[loss=0.1008, simple_loss=0.1181, pruned_loss=0.02877, audio_tagging_loss=0.01295, over 14308.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.1173, pruned_loss=0.03288, audio_tagging_loss=0.01168, over 3039085.84 frames. ], batch size: 53, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:27:19,069 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.91 vs. limit=15.0 2023-11-18 17:27:32,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=332360.0, ans=0.04949747468305833 2023-11-18 17:27:43,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=332426.6666666667, ans=0.025 2023-11-18 17:27:49,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=332493.3333333333, ans=0.05 2023-11-18 17:28:03,736 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.92 vs. limit=12.0 2023-11-18 17:28:05,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=332560.0, ans=0.0 2023-11-18 17:28:08,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=332560.0, ans=0.125 2023-11-18 17:28:11,838 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1800, loss[loss=0.08139, simple_loss=0.09031, pruned_loss=0.02497, audio_tagging_loss=0.01127, over 16289.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.1178, pruned_loss=0.03268, audio_tagging_loss=0.01159, over 3047258.78 frames. ], batch size: 62, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:28:15,256 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=22.5 2023-11-18 17:28:25,853 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=22.5 2023-11-18 17:28:31,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=332693.3333333333, ans=0.0 2023-11-18 17:28:59,147 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.466e+01 9.003e+01 9.997e+01 1.070e+02 1.437e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-18 17:29:00,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=332893.3333333333, ans=0.1 2023-11-18 17:29:06,194 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2023-11-18 17:29:07,651 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1850, loss[loss=0.1129, simple_loss=0.1279, pruned_loss=0.0421, audio_tagging_loss=0.006785, over 15397.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1171, pruned_loss=0.03255, audio_tagging_loss=0.01145, over 3051866.94 frames. ], batch size: 58, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:29:14,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=332960.0, ans=0.07 2023-11-18 17:29:22,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=333026.6666666667, ans=0.0 2023-11-18 17:29:24,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=333026.6666666667, ans=0.0 2023-11-18 17:29:27,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=333026.6666666667, ans=0.2 2023-11-18 17:29:28,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=333093.3333333333, ans=0.0 2023-11-18 17:29:30,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=333093.3333333333, ans=0.125 2023-11-18 17:29:37,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=333093.3333333333, ans=0.0 2023-11-18 17:29:53,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=333226.6666666667, ans=0.125 2023-11-18 17:30:01,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=333226.6666666667, ans=0.1 2023-11-18 17:30:04,070 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1900, loss[loss=0.1093, simple_loss=0.1158, pruned_loss=0.03832, audio_tagging_loss=0.01304, over 14704.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1168, pruned_loss=0.03255, audio_tagging_loss=0.01154, over 3056963.47 frames. ], batch size: 55, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:30:04,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=333293.3333333333, ans=0.125 2023-11-18 17:30:20,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=333360.0, ans=0.125 2023-11-18 17:30:34,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=333426.6666666667, ans=0.2 2023-11-18 17:30:38,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=333493.3333333333, ans=0.1 2023-11-18 17:30:40,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=333493.3333333333, ans=0.2 2023-11-18 17:30:50,674 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.294e+01 8.811e+01 9.778e+01 1.068e+02 1.411e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-18 17:30:58,823 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.15 vs. limit=22.5 2023-11-18 17:30:59,232 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 1950, loss[loss=0.1064, simple_loss=0.1262, pruned_loss=0.03282, audio_tagging_loss=0.01051, over 15933.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.1178, pruned_loss=0.03286, audio_tagging_loss=0.01155, over 3057360.32 frames. ], batch size: 56, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:31:13,823 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:31:16,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=333693.3333333333, ans=0.2 2023-11-18 17:31:29,614 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.26 vs. limit=15.0 2023-11-18 17:31:41,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=333826.6666666667, ans=0.0 2023-11-18 17:31:42,278 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.70 vs. limit=15.0 2023-11-18 17:31:56,037 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2000, loss[loss=0.1081, simple_loss=0.1198, pruned_loss=0.03446, audio_tagging_loss=0.01376, over 15695.00 frames. ], tot_loss[loss=0.1034, simple_loss=0.118, pruned_loss=0.03295, audio_tagging_loss=0.01143, over 3053580.11 frames. ], batch size: 59, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:32:01,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=333960.0, ans=0.125 2023-11-18 17:32:06,085 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.25 vs. limit=15.0 2023-11-18 17:32:11,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=334026.6666666667, ans=0.1 2023-11-18 17:32:27,377 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.36 vs. limit=15.0 2023-11-18 17:32:31,648 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2023-11-18 17:32:32,716 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=12.0 2023-11-18 17:32:42,819 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 9.028e+01 9.681e+01 1.107e+02 1.913e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-18 17:32:47,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=334226.6666666667, ans=0.1 2023-11-18 17:32:51,958 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2050, loss[loss=0.1279, simple_loss=0.1422, pruned_loss=0.04414, audio_tagging_loss=0.01269, over 15323.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.1175, pruned_loss=0.03292, audio_tagging_loss=0.01147, over 3055382.98 frames. ], batch size: 55, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:32:55,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=334293.3333333333, ans=0.125 2023-11-18 17:33:02,084 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.96 vs. limit=22.5 2023-11-18 17:33:05,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=334360.0, ans=0.05 2023-11-18 17:33:29,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=334493.3333333333, ans=0.0 2023-11-18 17:33:32,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=334493.3333333333, ans=0.015 2023-11-18 17:33:32,773 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.77 vs. limit=10.0 2023-11-18 17:33:43,956 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2023-11-18 17:33:47,585 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2100, loss[loss=0.09162, simple_loss=0.1043, pruned_loss=0.02694, audio_tagging_loss=0.01254, over 15235.00 frames. ], tot_loss[loss=0.1039, simple_loss=0.1184, pruned_loss=0.03322, audio_tagging_loss=0.01147, over 3055242.82 frames. ], batch size: 59, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:33:47,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=334626.6666666667, ans=0.95 2023-11-18 17:33:48,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=334626.6666666667, ans=0.125 2023-11-18 17:33:52,937 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.62 vs. limit=22.5 2023-11-18 17:33:56,959 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.71 vs. limit=22.5 2023-11-18 17:34:26,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=334826.6666666667, ans=0.125 2023-11-18 17:34:29,521 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.81 vs. limit=15.0 2023-11-18 17:34:34,905 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.950e+01 9.682e+01 1.081e+02 1.226e+02 1.656e+02, threshold=2.162e+02, percent-clipped=0.0 2023-11-18 17:34:35,405 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=12.0 2023-11-18 17:34:44,640 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2150, loss[loss=0.1666, simple_loss=0.1831, pruned_loss=0.06289, audio_tagging_loss=0.01219, over 15348.00 frames. ], tot_loss[loss=0.1049, simple_loss=0.1194, pruned_loss=0.0337, audio_tagging_loss=0.01148, over 3053144.03 frames. ], batch size: 53, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:34:45,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=334960.0, ans=0.0 2023-11-18 17:35:07,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=335093.3333333333, ans=0.05 2023-11-18 17:35:07,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=335093.3333333333, ans=0.1 2023-11-18 17:35:08,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=335093.3333333333, ans=0.015 2023-11-18 17:35:09,035 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.21 vs. limit=15.0 2023-11-18 17:35:17,652 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:35:19,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=335160.0, ans=0.05 2023-11-18 17:35:29,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=335226.6666666667, ans=0.125 2023-11-18 17:35:35,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=335226.6666666667, ans=0.0 2023-11-18 17:35:39,925 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2200, loss[loss=0.07815, simple_loss=0.08755, pruned_loss=0.0235, audio_tagging_loss=0.01088, over 14323.00 frames. ], tot_loss[loss=0.1057, simple_loss=0.1206, pruned_loss=0.03406, audio_tagging_loss=0.01134, over 3050423.72 frames. ], batch size: 55, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:35:52,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=335360.0, ans=0.0 2023-11-18 17:36:26,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=335560.0, ans=0.125 2023-11-18 17:36:27,479 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.184e+01 9.433e+01 1.069e+02 1.154e+02 1.802e+02, threshold=2.138e+02, percent-clipped=0.0 2023-11-18 17:36:36,106 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2250, loss[loss=0.12, simple_loss=0.1441, pruned_loss=0.039, audio_tagging_loss=0.008976, over 15946.00 frames. ], tot_loss[loss=0.1058, simple_loss=0.1205, pruned_loss=0.0341, audio_tagging_loss=0.01144, over 3044946.87 frames. ], batch size: 59, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:36:40,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=335626.6666666667, ans=0.125 2023-11-18 17:37:02,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=335760.0, ans=0.0 2023-11-18 17:37:13,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=335826.6666666667, ans=0.0 2023-11-18 17:37:14,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=335826.6666666667, ans=0.125 2023-11-18 17:37:15,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=335826.6666666667, ans=0.0 2023-11-18 17:37:22,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=335893.3333333333, ans=0.0 2023-11-18 17:37:30,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=335893.3333333333, ans=0.125 2023-11-18 17:37:33,140 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2300, loss[loss=0.1049, simple_loss=0.1169, pruned_loss=0.03414, audio_tagging_loss=0.01235, over 15078.00 frames. ], tot_loss[loss=0.1052, simple_loss=0.1198, pruned_loss=0.03374, audio_tagging_loss=0.01153, over 3045346.19 frames. ], batch size: 60, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:37:34,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=335960.0, ans=0.125 2023-11-18 17:37:37,631 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:37:49,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=336026.6666666667, ans=0.0 2023-11-18 17:37:55,401 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.20 vs. limit=15.0 2023-11-18 17:37:55,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=336093.3333333333, ans=0.125 2023-11-18 17:38:01,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=336093.3333333333, ans=0.0 2023-11-18 17:38:01,902 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.40 vs. limit=12.0 2023-11-18 17:38:20,664 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 9.497e+01 1.027e+02 1.187e+02 1.652e+02, threshold=2.054e+02, percent-clipped=0.0 2023-11-18 17:38:22,764 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:38:28,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=336293.3333333333, ans=0.125 2023-11-18 17:38:29,672 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2350, loss[loss=0.1371, simple_loss=0.1621, pruned_loss=0.04622, audio_tagging_loss=0.009843, over 15535.00 frames. ], tot_loss[loss=0.1054, simple_loss=0.1198, pruned_loss=0.03384, audio_tagging_loss=0.01165, over 3054111.53 frames. ], batch size: 57, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:38:36,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=336293.3333333333, ans=0.125 2023-11-18 17:38:37,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=336293.3333333333, ans=0.0 2023-11-18 17:38:40,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=336360.0, ans=0.0 2023-11-18 17:38:47,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=336360.0, ans=0.0 2023-11-18 17:38:55,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=336426.6666666667, ans=0.07 2023-11-18 17:39:22,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=336560.0, ans=0.125 2023-11-18 17:39:23,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=336560.0, ans=0.125 2023-11-18 17:39:25,495 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2400, loss[loss=0.08288, simple_loss=0.09219, pruned_loss=0.02238, audio_tagging_loss=0.0144, over 16379.00 frames. ], tot_loss[loss=0.1046, simple_loss=0.1189, pruned_loss=0.03342, audio_tagging_loss=0.01169, over 3053622.29 frames. ], batch size: 63, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:39:29,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=336626.6666666667, ans=0.125 2023-11-18 17:39:41,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=336693.3333333333, ans=0.0 2023-11-18 17:40:06,349 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=15.0 2023-11-18 17:40:13,220 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 9.087e+01 9.616e+01 1.102e+02 1.303e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-18 17:40:13,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=336893.3333333333, ans=0.125 2023-11-18 17:40:21,848 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2450, loss[loss=0.07487, simple_loss=0.08189, pruned_loss=0.02048, audio_tagging_loss=0.01344, over 15056.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1182, pruned_loss=0.03322, audio_tagging_loss=0.01167, over 3055809.54 frames. ], batch size: 59, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:40:24,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=336960.0, ans=0.125 2023-11-18 17:40:54,942 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=337160.0, ans=0.2 2023-11-18 17:41:17,247 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2500, loss[loss=0.1019, simple_loss=0.1177, pruned_loss=0.0329, audio_tagging_loss=0.01011, over 15810.00 frames. ], tot_loss[loss=0.1038, simple_loss=0.118, pruned_loss=0.03318, audio_tagging_loss=0.01163, over 3055155.82 frames. ], batch size: 58, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:41:30,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=337360.0, ans=0.0 2023-11-18 17:41:38,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=337426.6666666667, ans=0.0 2023-11-18 17:42:05,673 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 9.278e+01 1.050e+02 1.171e+02 1.497e+02, threshold=2.099e+02, percent-clipped=0.0 2023-11-18 17:42:12,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=337626.6666666667, ans=0.07 2023-11-18 17:42:13,639 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2550, loss[loss=0.09646, simple_loss=0.1044, pruned_loss=0.03164, audio_tagging_loss=0.01262, over 14714.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1166, pruned_loss=0.03273, audio_tagging_loss=0.01162, over 3045452.31 frames. ], batch size: 57, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:42:21,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=337626.6666666667, ans=0.2 2023-11-18 17:42:34,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=12.0 2023-11-18 17:42:42,890 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2023-11-18 17:42:44,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=337760.0, ans=0.0 2023-11-18 17:42:58,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337893.3333333333, ans=0.1 2023-11-18 17:43:07,268 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.34 vs. limit=6.0 2023-11-18 17:43:10,218 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2600, loss[loss=0.1173, simple_loss=0.1312, pruned_loss=0.04012, audio_tagging_loss=0.01158, over 15792.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1173, pruned_loss=0.03281, audio_tagging_loss=0.01141, over 3044222.39 frames. ], batch size: 57, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:43:25,942 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=15.0 2023-11-18 17:43:55,317 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2023-11-18 17:43:57,879 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.138e+01 8.896e+01 9.646e+01 1.065e+02 1.578e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-18 17:43:58,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=338226.6666666667, ans=0.125 2023-11-18 17:44:02,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=338226.6666666667, ans=0.0 2023-11-18 17:44:05,249 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2650, loss[loss=0.1055, simple_loss=0.1295, pruned_loss=0.0309, audio_tagging_loss=0.009864, over 14435.00 frames. ], tot_loss[loss=0.1035, simple_loss=0.1184, pruned_loss=0.033, audio_tagging_loss=0.01127, over 3051802.08 frames. ], batch size: 55, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:44:08,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=338293.3333333333, ans=0.2 2023-11-18 17:44:12,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=338293.3333333333, ans=0.035 2023-11-18 17:44:13,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=338293.3333333333, ans=0.125 2023-11-18 17:44:15,741 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0 2023-11-18 17:44:20,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=338360.0, ans=0.125 2023-11-18 17:44:55,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=338560.0, ans=0.125 2023-11-18 17:45:01,126 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2700, loss[loss=0.1347, simple_loss=0.1625, pruned_loss=0.04408, audio_tagging_loss=0.009385, over 15007.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1171, pruned_loss=0.0327, audio_tagging_loss=0.01121, over 3053229.66 frames. ], batch size: 55, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:45:06,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=338626.6666666667, ans=0.1 2023-11-18 17:45:09,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=338626.6666666667, ans=0.125 2023-11-18 17:45:16,778 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.02 vs. limit=15.0 2023-11-18 17:45:28,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=338760.0, ans=0.05 2023-11-18 17:45:35,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=338826.6666666667, ans=0.1 2023-11-18 17:45:49,102 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.160e+01 8.914e+01 9.942e+01 1.124e+02 1.692e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-18 17:45:52,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=338893.3333333333, ans=0.0 2023-11-18 17:45:56,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=338960.0, ans=0.1 2023-11-18 17:45:57,620 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2750, loss[loss=0.09551, simple_loss=0.1092, pruned_loss=0.03203, audio_tagging_loss=0.008871, over 14591.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1164, pruned_loss=0.03275, audio_tagging_loss=0.01122, over 3055186.83 frames. ], batch size: 54, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:45:58,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=338960.0, ans=0.0 2023-11-18 17:46:45,064 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:46:52,444 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2800, loss[loss=0.09852, simple_loss=0.1083, pruned_loss=0.03271, audio_tagging_loss=0.01166, over 15030.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1161, pruned_loss=0.03272, audio_tagging_loss=0.01137, over 3051346.83 frames. ], batch size: 56, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:47:02,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=339360.0, ans=0.02 2023-11-18 17:47:39,925 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 9.210e+01 1.044e+02 1.186e+02 2.162e+02, threshold=2.088e+02, percent-clipped=1.0 2023-11-18 17:47:47,891 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2850, loss[loss=0.1123, simple_loss=0.1208, pruned_loss=0.03899, audio_tagging_loss=0.01284, over 16399.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1154, pruned_loss=0.03254, audio_tagging_loss=0.0114, over 3051776.18 frames. ], batch size: 63, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:47:50,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=339626.6666666667, ans=0.0 2023-11-18 17:47:51,497 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.41 vs. limit=15.0 2023-11-18 17:47:54,106 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=15.0 2023-11-18 17:47:58,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=339693.3333333333, ans=0.2 2023-11-18 17:48:09,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=339760.0, ans=0.125 2023-11-18 17:48:13,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=339760.0, ans=0.125 2023-11-18 17:48:26,798 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=16.14 vs. limit=15.0 2023-11-18 17:48:34,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=339893.3333333333, ans=0.05 2023-11-18 17:48:40,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=339893.3333333333, ans=0.0 2023-11-18 17:48:44,329 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2900, loss[loss=0.118, simple_loss=0.1327, pruned_loss=0.03901, audio_tagging_loss=0.01263, over 16038.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1153, pruned_loss=0.0325, audio_tagging_loss=0.01136, over 3052528.70 frames. ], batch size: 60, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:49:12,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=340093.3333333333, ans=0.0 2023-11-18 17:49:14,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=340093.3333333333, ans=0.0 2023-11-18 17:49:22,702 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=340160.0, ans=0.1 2023-11-18 17:49:25,083 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=12.0 2023-11-18 17:49:33,079 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.551e+01 9.214e+01 1.048e+02 1.170e+02 1.772e+02, threshold=2.096e+02, percent-clipped=0.0 2023-11-18 17:49:40,490 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 2950, loss[loss=0.1335, simple_loss=0.1608, pruned_loss=0.04368, audio_tagging_loss=0.009417, over 15541.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1182, pruned_loss=0.03337, audio_tagging_loss=0.01125, over 3053121.06 frames. ], batch size: 56, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:49:42,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=340293.3333333333, ans=0.1 2023-11-18 17:49:48,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=340293.3333333333, ans=0.125 2023-11-18 17:50:05,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=340426.6666666667, ans=0.95 2023-11-18 17:50:15,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=340493.3333333333, ans=0.125 2023-11-18 17:50:16,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=340493.3333333333, ans=0.0 2023-11-18 17:50:27,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=340560.0, ans=0.125 2023-11-18 17:50:36,778 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3000, loss[loss=0.1136, simple_loss=0.1336, pruned_loss=0.03793, audio_tagging_loss=0.008891, over 15534.00 frames. ], tot_loss[loss=0.1042, simple_loss=0.1186, pruned_loss=0.03357, audio_tagging_loss=0.01136, over 3056758.70 frames. ], batch size: 57, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:50:36,779 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 17:50:57,772 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7392, 5.7538, 5.8014, 5.8996], device='cuda:2') 2023-11-18 17:51:09,285 INFO [train_asr.py:1147] (2/4) Epoch 5, validation: loss=0.07345, simple_loss=0.06093, pruned_loss=0.009446, audio_tagging_loss=0.03354, over 4681554.00 frames. 2023-11-18 17:51:09,286 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 17:51:15,170 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2023-11-18 17:51:32,588 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.13 vs. limit=15.0 2023-11-18 17:51:54,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=340893.3333333333, ans=0.125 2023-11-18 17:51:56,992 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 9.087e+01 9.878e+01 1.115e+02 1.743e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-18 17:52:04,483 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3050, loss[loss=0.09678, simple_loss=0.1162, pruned_loss=0.02903, audio_tagging_loss=0.009646, over 14405.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.118, pruned_loss=0.03323, audio_tagging_loss=0.01146, over 3057015.77 frames. ], batch size: 54, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:52:05,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=340960.0, ans=0.0 2023-11-18 17:52:09,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=340960.0, ans=0.125 2023-11-18 17:52:13,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=340960.0, ans=0.025 2023-11-18 17:52:14,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=341026.6666666667, ans=0.125 2023-11-18 17:52:20,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=341026.6666666667, ans=0.1 2023-11-18 17:52:34,147 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=5.35 vs. limit=15.0 2023-11-18 17:52:35,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=341093.3333333333, ans=0.125 2023-11-18 17:52:36,787 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:52:46,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=341160.0, ans=0.125 2023-11-18 17:52:54,445 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.77 vs. limit=22.5 2023-11-18 17:52:59,729 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3100, loss[loss=0.1231, simple_loss=0.1408, pruned_loss=0.04151, audio_tagging_loss=0.01118, over 14690.00 frames. ], tot_loss[loss=0.1049, simple_loss=0.1196, pruned_loss=0.03355, audio_tagging_loss=0.01153, over 3056844.10 frames. ], batch size: 54, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:53:01,356 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2023-11-18 17:53:02,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=341293.3333333333, ans=0.0 2023-11-18 17:53:17,410 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.34 vs. limit=22.5 2023-11-18 17:53:21,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=341426.6666666667, ans=0.0 2023-11-18 17:53:27,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=341426.6666666667, ans=0.0 2023-11-18 17:53:29,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=341426.6666666667, ans=0.125 2023-11-18 17:53:36,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=341493.3333333333, ans=0.0 2023-11-18 17:53:39,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=341493.3333333333, ans=0.125 2023-11-18 17:53:45,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=341560.0, ans=0.0 2023-11-18 17:53:46,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=341560.0, ans=0.125 2023-11-18 17:53:47,379 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.507e+01 9.303e+01 9.886e+01 1.114e+02 1.331e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-18 17:53:55,384 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3150, loss[loss=0.09752, simple_loss=0.1075, pruned_loss=0.02898, audio_tagging_loss=0.01479, over 15106.00 frames. ], tot_loss[loss=0.1046, simple_loss=0.1193, pruned_loss=0.03338, audio_tagging_loss=0.01156, over 3048582.42 frames. ], batch size: 55, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:54:09,846 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.16 vs. limit=22.5 2023-11-18 17:54:18,233 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.98 vs. limit=15.0 2023-11-18 17:54:33,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=341826.6666666667, ans=0.0 2023-11-18 17:54:33,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=341826.6666666667, ans=0.125 2023-11-18 17:54:43,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=341893.3333333333, ans=0.2 2023-11-18 17:54:51,847 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3200, loss[loss=0.1075, simple_loss=0.1222, pruned_loss=0.03523, audio_tagging_loss=0.01118, over 12810.00 frames. ], tot_loss[loss=0.1045, simple_loss=0.1192, pruned_loss=0.03329, audio_tagging_loss=0.01158, over 3047816.06 frames. ], batch size: 54, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:55:00,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=341960.0, ans=0.125 2023-11-18 17:55:08,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=342026.6666666667, ans=0.1 2023-11-18 17:55:13,133 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.09 vs. limit=15.0 2023-11-18 17:55:27,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=342160.0, ans=0.2 2023-11-18 17:55:39,558 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.324e+01 9.174e+01 9.896e+01 1.084e+02 1.894e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-18 17:55:42,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=342226.6666666667, ans=0.0 2023-11-18 17:55:47,525 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3250, loss[loss=0.09387, simple_loss=0.1003, pruned_loss=0.02989, audio_tagging_loss=0.01382, over 15151.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1181, pruned_loss=0.0329, audio_tagging_loss=0.01173, over 3045681.95 frames. ], batch size: 57, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:55:50,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=342293.3333333333, ans=0.0 2023-11-18 17:55:51,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=342293.3333333333, ans=0.2 2023-11-18 17:55:52,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=342293.3333333333, ans=0.125 2023-11-18 17:56:01,416 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=342360.0, ans=0.1 2023-11-18 17:56:12,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=342426.6666666667, ans=0.125 2023-11-18 17:56:18,332 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.84 vs. limit=15.0 2023-11-18 17:56:38,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=342560.0, ans=0.125 2023-11-18 17:56:42,533 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3300, loss[loss=0.1162, simple_loss=0.1303, pruned_loss=0.03987, audio_tagging_loss=0.01116, over 15598.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.1176, pruned_loss=0.03268, audio_tagging_loss=0.01172, over 3054550.29 frames. ], batch size: 61, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:56:43,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=342626.6666666667, ans=0.05 2023-11-18 17:56:49,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=342626.6666666667, ans=0.125 2023-11-18 17:56:49,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=342626.6666666667, ans=0.125 2023-11-18 17:56:55,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=342693.3333333333, ans=0.1 2023-11-18 17:56:58,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=342693.3333333333, ans=0.125 2023-11-18 17:57:31,971 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.617e+01 9.162e+01 1.022e+02 1.144e+02 1.543e+02, threshold=2.045e+02, percent-clipped=0.0 2023-11-18 17:57:38,104 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=6.264e-02 2023-11-18 17:57:39,985 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3350, loss[loss=0.1269, simple_loss=0.1511, pruned_loss=0.04159, audio_tagging_loss=0.009778, over 15272.00 frames. ], tot_loss[loss=0.1036, simple_loss=0.1185, pruned_loss=0.03282, audio_tagging_loss=0.01158, over 3047906.68 frames. ], batch size: 56, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:57:45,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=342960.0, ans=0.0 2023-11-18 17:58:20,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=343160.0, ans=0.125 2023-11-18 17:58:24,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=343226.6666666667, ans=0.125 2023-11-18 17:58:27,420 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2023-11-18 17:58:30,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=343226.6666666667, ans=0.125 2023-11-18 17:58:35,866 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3400, loss[loss=0.1067, simple_loss=0.1268, pruned_loss=0.03295, audio_tagging_loss=0.01035, over 16168.00 frames. ], tot_loss[loss=0.104, simple_loss=0.119, pruned_loss=0.03307, audio_tagging_loss=0.01146, over 3045200.80 frames. ], batch size: 59, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:58:36,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=343293.3333333333, ans=0.0 2023-11-18 17:58:38,247 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:58:51,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=343360.0, ans=0.0 2023-11-18 17:58:52,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=343360.0, ans=0.0 2023-11-18 17:59:11,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=343493.3333333333, ans=0.125 2023-11-18 17:59:23,778 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 9.786e+01 1.073e+02 1.222e+02 1.705e+02, threshold=2.147e+02, percent-clipped=0.0 2023-11-18 17:59:30,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=343626.6666666667, ans=0.125 2023-11-18 17:59:31,128 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3450, loss[loss=0.108, simple_loss=0.1194, pruned_loss=0.03739, audio_tagging_loss=0.0109, over 16002.00 frames. ], tot_loss[loss=0.1046, simple_loss=0.1196, pruned_loss=0.03344, audio_tagging_loss=0.01134, over 3038771.80 frames. ], batch size: 61, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:59:33,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=343626.6666666667, ans=0.0 2023-11-18 17:59:42,497 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2023-11-18 17:59:55,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=343760.0, ans=0.125 2023-11-18 17:59:56,738 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=15.0 2023-11-18 17:59:57,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=343760.0, ans=0.0 2023-11-18 17:59:58,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.whiten.whitening_limit, batch_count=343760.0, ans=12.0 2023-11-18 18:00:02,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=343760.0, ans=0.2 2023-11-18 18:00:27,486 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3500, loss[loss=0.1142, simple_loss=0.133, pruned_loss=0.03839, audio_tagging_loss=0.009284, over 14967.00 frames. ], tot_loss[loss=0.1045, simple_loss=0.1196, pruned_loss=0.03351, audio_tagging_loss=0.01123, over 3032112.50 frames. ], batch size: 57, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:00:27,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=343960.0, ans=12.0 2023-11-18 18:00:32,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=343960.0, ans=0.0 2023-11-18 18:00:39,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=344026.6666666667, ans=0.1 2023-11-18 18:00:49,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=344093.3333333333, ans=0.125 2023-11-18 18:00:55,474 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:01:05,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=344160.0, ans=0.125 2023-11-18 18:01:14,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=344226.6666666667, ans=0.0 2023-11-18 18:01:16,271 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 9.216e+01 1.044e+02 1.195e+02 1.654e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 18:01:22,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=344293.3333333333, ans=0.125 2023-11-18 18:01:23,670 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3550, loss[loss=0.08156, simple_loss=0.08936, pruned_loss=0.02306, audio_tagging_loss=0.01382, over 14794.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1179, pruned_loss=0.03299, audio_tagging_loss=0.0112, over 3038761.89 frames. ], batch size: 58, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:01:46,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=344426.6666666667, ans=0.0 2023-11-18 18:02:02,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=344493.3333333333, ans=0.2 2023-11-18 18:02:11,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=344560.0, ans=0.05 2023-11-18 18:02:18,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=344626.6666666667, ans=0.1 2023-11-18 18:02:19,432 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3600, loss[loss=0.1007, simple_loss=0.1224, pruned_loss=0.03175, audio_tagging_loss=0.007727, over 15132.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1169, pruned_loss=0.03247, audio_tagging_loss=0.01114, over 3042550.27 frames. ], batch size: 56, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:02:19,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=344626.6666666667, ans=15.0 2023-11-18 18:02:20,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=344626.6666666667, ans=0.125 2023-11-18 18:02:43,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=344760.0, ans=0.125 2023-11-18 18:02:48,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=344760.0, ans=0.0 2023-11-18 18:02:59,416 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.38 vs. limit=15.0 2023-11-18 18:03:08,090 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.942e+01 9.105e+01 1.020e+02 1.125e+02 1.503e+02, threshold=2.039e+02, percent-clipped=0.0 2023-11-18 18:03:16,171 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3650, loss[loss=0.09731, simple_loss=0.1058, pruned_loss=0.03186, audio_tagging_loss=0.01254, over 14585.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1169, pruned_loss=0.03248, audio_tagging_loss=0.01114, over 3050261.84 frames. ], batch size: 56, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:03:20,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=344960.0, ans=0.125 2023-11-18 18:03:25,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=345026.6666666667, ans=0.09899494936611666 2023-11-18 18:03:27,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=345026.6666666667, ans=0.1 2023-11-18 18:03:29,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=345026.6666666667, ans=0.1 2023-11-18 18:03:53,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=345160.0, ans=0.125 2023-11-18 18:04:00,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=345226.6666666667, ans=0.0 2023-11-18 18:04:11,793 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3700, loss[loss=0.133, simple_loss=0.1414, pruned_loss=0.05263, audio_tagging_loss=0.00969, over 14146.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1173, pruned_loss=0.03266, audio_tagging_loss=0.01128, over 3058090.60 frames. ], batch size: 56, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:04:15,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=345293.3333333333, ans=0.0 2023-11-18 18:04:31,833 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2023-11-18 18:04:32,508 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:04:38,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=345426.6666666667, ans=0.0 2023-11-18 18:05:00,335 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.024e+01 9.436e+01 1.012e+02 1.107e+02 1.712e+02, threshold=2.024e+02, percent-clipped=0.0 2023-11-18 18:05:05,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=345560.0, ans=0.125 2023-11-18 18:05:07,825 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3750, loss[loss=0.1192, simple_loss=0.1422, pruned_loss=0.03796, audio_tagging_loss=0.01019, over 15675.00 frames. ], tot_loss[loss=0.1036, simple_loss=0.1186, pruned_loss=0.03319, audio_tagging_loss=0.01115, over 3057702.69 frames. ], batch size: 56, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:05:17,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=345626.6666666667, ans=0.1 2023-11-18 18:05:29,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=345760.0, ans=0.0 2023-11-18 18:05:46,206 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:05:51,110 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.12 vs. limit=15.0 2023-11-18 18:06:03,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=345960.0, ans=0.0 2023-11-18 18:06:04,351 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3800, loss[loss=0.09699, simple_loss=0.1076, pruned_loss=0.03052, audio_tagging_loss=0.01267, over 14856.00 frames. ], tot_loss[loss=0.1038, simple_loss=0.1187, pruned_loss=0.0332, audio_tagging_loss=0.0113, over 3051770.84 frames. ], batch size: 56, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:06:19,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=346026.6666666667, ans=0.0 2023-11-18 18:06:33,799 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.23 vs. limit=15.0 2023-11-18 18:06:35,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=346093.3333333333, ans=0.2 2023-11-18 18:06:38,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=346160.0, ans=0.125 2023-11-18 18:06:45,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=346160.0, ans=0.125 2023-11-18 18:06:52,215 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 9.330e+01 1.017e+02 1.159e+02 1.442e+02, threshold=2.034e+02, percent-clipped=0.0 2023-11-18 18:06:54,531 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:06:59,706 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3850, loss[loss=0.145, simple_loss=0.1738, pruned_loss=0.04612, audio_tagging_loss=0.01204, over 14067.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1173, pruned_loss=0.03272, audio_tagging_loss=0.01149, over 3047871.24 frames. ], batch size: 53, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:07:18,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=346360.0, ans=0.2 2023-11-18 18:07:33,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=346493.3333333333, ans=0.125 2023-11-18 18:07:40,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=346493.3333333333, ans=0.0 2023-11-18 18:07:42,273 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=12.0 2023-11-18 18:07:48,018 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.44 vs. limit=15.0 2023-11-18 18:07:52,823 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.76 vs. limit=22.5 2023-11-18 18:07:55,525 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3900, loss[loss=0.09181, simple_loss=0.1007, pruned_loss=0.03009, audio_tagging_loss=0.01136, over 15251.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1167, pruned_loss=0.03261, audio_tagging_loss=0.01162, over 3050904.42 frames. ], batch size: 57, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:08:10,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=346693.3333333333, ans=0.0 2023-11-18 18:08:33,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=346826.6666666667, ans=0.125 2023-11-18 18:08:39,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=346826.6666666667, ans=0.125 2023-11-18 18:08:43,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=346893.3333333333, ans=0.125 2023-11-18 18:08:45,462 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 9.163e+01 1.014e+02 1.129e+02 1.556e+02, threshold=2.028e+02, percent-clipped=0.0 2023-11-18 18:08:45,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=346893.3333333333, ans=0.025 2023-11-18 18:08:53,953 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 3950, loss[loss=0.1027, simple_loss=0.1186, pruned_loss=0.03008, audio_tagging_loss=0.01336, over 14811.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1174, pruned_loss=0.03272, audio_tagging_loss=0.01172, over 3048588.46 frames. ], batch size: 56, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:08:57,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=346960.0, ans=0.1 2023-11-18 18:09:09,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=347026.6666666667, ans=0.125 2023-11-18 18:09:20,086 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.16 vs. limit=15.0 2023-11-18 18:09:36,147 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.50 vs. limit=22.5 2023-11-18 18:09:47,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=347226.6666666667, ans=0.125 2023-11-18 18:09:49,356 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4000, loss[loss=0.09864, simple_loss=0.1115, pruned_loss=0.03025, audio_tagging_loss=0.01264, over 15161.00 frames. ], tot_loss[loss=0.1034, simple_loss=0.1176, pruned_loss=0.03286, audio_tagging_loss=0.01177, over 3050710.40 frames. ], batch size: 58, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:09:59,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=347360.0, ans=0.1 2023-11-18 18:10:23,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=347493.3333333333, ans=0.0 2023-11-18 18:10:24,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=347493.3333333333, ans=0.125 2023-11-18 18:10:27,453 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.51 vs. limit=15.0 2023-11-18 18:10:28,008 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:10:34,486 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=347560.0, ans=0.0 2023-11-18 18:10:37,501 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 9.209e+01 1.022e+02 1.112e+02 1.476e+02, threshold=2.045e+02, percent-clipped=0.0 2023-11-18 18:10:40,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=347560.0, ans=0.125 2023-11-18 18:10:46,047 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4050, loss[loss=0.07808, simple_loss=0.08853, pruned_loss=0.02404, audio_tagging_loss=0.009777, over 15038.00 frames. ], tot_loss[loss=0.1047, simple_loss=0.1193, pruned_loss=0.03331, audio_tagging_loss=0.01174, over 3052641.87 frames. ], batch size: 56, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:10:47,153 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:10:52,073 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.34 vs. limit=10.0 2023-11-18 18:10:52,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=347626.6666666667, ans=0.125 2023-11-18 18:11:19,664 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2023-11-18 18:11:42,704 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4100, loss[loss=0.1117, simple_loss=0.1458, pruned_loss=0.03195, audio_tagging_loss=0.006806, over 15392.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1186, pruned_loss=0.03306, audio_tagging_loss=0.0116, over 3053530.60 frames. ], batch size: 57, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:11:45,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=347960.0, ans=0.125 2023-11-18 18:11:50,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=347960.0, ans=0.2 2023-11-18 18:11:51,901 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.26 vs. limit=22.5 2023-11-18 18:12:13,473 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2023-11-18 18:12:32,086 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 9.111e+01 1.020e+02 1.147e+02 2.406e+02, threshold=2.040e+02, percent-clipped=1.0 2023-11-18 18:12:38,572 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4150, loss[loss=0.09657, simple_loss=0.1112, pruned_loss=0.0329, audio_tagging_loss=0.008088, over 15583.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.118, pruned_loss=0.03291, audio_tagging_loss=0.01142, over 3043545.24 frames. ], batch size: 56, lr: 1.38e-02, grad_scale: 16.0 2023-11-18 18:12:40,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=348293.3333333333, ans=0.0 2023-11-18 18:12:46,407 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=22.5 2023-11-18 18:12:48,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=348360.0, ans=0.1 2023-11-18 18:13:01,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=348426.6666666667, ans=0.2 2023-11-18 18:13:07,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=348426.6666666667, ans=0.125 2023-11-18 18:13:08,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=348426.6666666667, ans=0.0 2023-11-18 18:13:12,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=348493.3333333333, ans=0.125 2023-11-18 18:13:17,902 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:13:19,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=348493.3333333333, ans=0.0 2023-11-18 18:13:34,492 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4200, loss[loss=0.1007, simple_loss=0.125, pruned_loss=0.0299, audio_tagging_loss=0.008354, over 15120.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1181, pruned_loss=0.03278, audio_tagging_loss=0.01128, over 3047169.37 frames. ], batch size: 56, lr: 1.38e-02, grad_scale: 16.0 2023-11-18 18:13:40,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=348626.6666666667, ans=0.125 2023-11-18 18:13:54,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=348693.3333333333, ans=0.125 2023-11-18 18:13:57,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=348760.0, ans=0.125 2023-11-18 18:13:58,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=348760.0, ans=0.1 2023-11-18 18:13:59,013 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.96 vs. limit=15.0 2023-11-18 18:14:03,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=348760.0, ans=0.0 2023-11-18 18:14:03,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=348760.0, ans=0.1 2023-11-18 18:14:07,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=348826.6666666667, ans=0.0 2023-11-18 18:14:08,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=348826.6666666667, ans=0.0 2023-11-18 18:14:12,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=348826.6666666667, ans=0.0 2023-11-18 18:14:23,314 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.393e+01 9.025e+01 9.781e+01 1.065e+02 1.508e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-18 18:14:23,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=348893.3333333333, ans=0.125 2023-11-18 18:14:30,709 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4250, loss[loss=0.1158, simple_loss=0.1223, pruned_loss=0.04284, audio_tagging_loss=0.01184, over 14633.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1178, pruned_loss=0.03274, audio_tagging_loss=0.01124, over 3049562.75 frames. ], batch size: 57, lr: 1.38e-02, grad_scale: 16.0 2023-11-18 18:14:43,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=349026.6666666667, ans=0.0 2023-11-18 18:14:46,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=349026.6666666667, ans=0.0 2023-11-18 18:15:04,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=349160.0, ans=0.125 2023-11-18 18:15:11,752 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.73 vs. limit=10.0 2023-11-18 18:15:26,399 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4300, loss[loss=0.09506, simple_loss=0.1162, pruned_loss=0.02632, audio_tagging_loss=0.01066, over 16202.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1179, pruned_loss=0.03278, audio_tagging_loss=0.01123, over 3047845.98 frames. ], batch size: 60, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:15:35,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=349293.3333333333, ans=0.125 2023-11-18 18:15:39,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=349360.0, ans=0.125 2023-11-18 18:15:52,141 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.57 vs. limit=22.5 2023-11-18 18:15:54,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=349426.6666666667, ans=0.125 2023-11-18 18:16:11,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=349560.0, ans=0.125 2023-11-18 18:16:15,087 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.043e+01 8.907e+01 1.023e+02 1.143e+02 1.661e+02, threshold=2.046e+02, percent-clipped=0.0 2023-11-18 18:16:21,909 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4350, loss[loss=0.1326, simple_loss=0.1564, pruned_loss=0.04658, audio_tagging_loss=0.007847, over 15553.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1181, pruned_loss=0.03277, audio_tagging_loss=0.01128, over 3044456.72 frames. ], batch size: 56, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:16:27,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=349626.6666666667, ans=0.1 2023-11-18 18:16:45,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=349760.0, ans=0.125 2023-11-18 18:16:48,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=349760.0, ans=0.125 2023-11-18 18:16:53,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=349760.0, ans=0.125 2023-11-18 18:17:01,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=349826.6666666667, ans=0.125 2023-11-18 18:17:04,081 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.98 vs. limit=15.0 2023-11-18 18:17:17,987 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4400, loss[loss=0.1067, simple_loss=0.1252, pruned_loss=0.03433, audio_tagging_loss=0.009766, over 15391.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1179, pruned_loss=0.03274, audio_tagging_loss=0.01124, over 3043426.63 frames. ], batch size: 56, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:17:25,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=349960.0, ans=0.125 2023-11-18 18:17:28,883 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.43 vs. limit=15.0 2023-11-18 18:17:29,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=350026.6666666667, ans=0.1 2023-11-18 18:17:30,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=350026.6666666667, ans=0.05 2023-11-18 18:17:36,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=350026.6666666667, ans=0.5 2023-11-18 18:17:44,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=350093.3333333333, ans=0.125 2023-11-18 18:17:53,734 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.68 vs. limit=12.0 2023-11-18 18:18:07,459 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 9.072e+01 1.020e+02 1.136e+02 1.526e+02, threshold=2.040e+02, percent-clipped=0.0 2023-11-18 18:18:13,846 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4450, loss[loss=0.1225, simple_loss=0.1462, pruned_loss=0.0386, audio_tagging_loss=0.0108, over 14340.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.1181, pruned_loss=0.03292, audio_tagging_loss=0.01122, over 3048527.26 frames. ], batch size: 52, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:18:18,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=350293.3333333333, ans=0.125 2023-11-18 18:18:23,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=350360.0, ans=0.1 2023-11-18 18:18:36,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=350426.6666666667, ans=0.0 2023-11-18 18:19:08,906 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4500, loss[loss=0.09744, simple_loss=0.1123, pruned_loss=0.02998, audio_tagging_loss=0.01129, over 15069.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.1182, pruned_loss=0.03288, audio_tagging_loss=0.01127, over 3048001.92 frames. ], batch size: 55, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:19:10,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=350626.6666666667, ans=0.0 2023-11-18 18:19:16,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=350626.6666666667, ans=0.2 2023-11-18 18:19:20,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=350693.3333333333, ans=0.1 2023-11-18 18:19:38,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=350760.0, ans=0.2 2023-11-18 18:19:39,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=350760.0, ans=0.125 2023-11-18 18:19:58,691 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.001e+01 9.243e+01 1.009e+02 1.115e+02 1.767e+02, threshold=2.018e+02, percent-clipped=0.0 2023-11-18 18:20:00,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=350893.3333333333, ans=0.125 2023-11-18 18:20:02,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=350893.3333333333, ans=0.0 2023-11-18 18:20:05,091 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4550, loss[loss=0.07261, simple_loss=0.07115, pruned_loss=0.02344, audio_tagging_loss=0.01359, over 14775.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1178, pruned_loss=0.0329, audio_tagging_loss=0.0113, over 3044844.32 frames. ], batch size: 58, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:20:07,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=350960.0, ans=0.2 2023-11-18 18:20:19,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=351026.6666666667, ans=0.125 2023-11-18 18:20:20,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=351026.6666666667, ans=0.5 2023-11-18 18:20:38,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=351160.0, ans=0.1 2023-11-18 18:20:43,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=351160.0, ans=0.125 2023-11-18 18:20:46,543 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:20:52,926 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.06 vs. limit=15.0 2023-11-18 18:21:02,024 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4600, loss[loss=0.1142, simple_loss=0.1267, pruned_loss=0.03769, audio_tagging_loss=0.01317, over 15996.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1171, pruned_loss=0.03268, audio_tagging_loss=0.01142, over 3045011.45 frames. ], batch size: 60, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:21:31,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=351426.6666666667, ans=0.05 2023-11-18 18:21:51,000 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.856e+01 9.307e+01 1.009e+02 1.129e+02 1.665e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-18 18:21:52,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=351560.0, ans=0.04949747468305833 2023-11-18 18:21:57,345 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4650, loss[loss=0.08092, simple_loss=0.0864, pruned_loss=0.02403, audio_tagging_loss=0.01368, over 14566.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1176, pruned_loss=0.03281, audio_tagging_loss=0.01146, over 3051151.87 frames. ], batch size: 56, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:21:58,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=351626.6666666667, ans=0.025 2023-11-18 18:22:30,895 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2023-11-18 18:22:36,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=351826.6666666667, ans=0.125 2023-11-18 18:22:52,877 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4700, loss[loss=0.1538, simple_loss=0.1748, pruned_loss=0.05638, audio_tagging_loss=0.01001, over 15508.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1166, pruned_loss=0.03253, audio_tagging_loss=0.01172, over 3049031.70 frames. ], batch size: 56, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:23:19,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=352093.3333333333, ans=0.1 2023-11-18 18:23:22,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=352093.3333333333, ans=0.0 2023-11-18 18:23:33,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=352160.0, ans=0.0 2023-11-18 18:23:42,222 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 9.156e+01 9.801e+01 1.107e+02 1.485e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-18 18:23:49,058 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4750, loss[loss=0.1537, simple_loss=0.1875, pruned_loss=0.04992, audio_tagging_loss=0.01002, over 15629.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1167, pruned_loss=0.03247, audio_tagging_loss=0.01187, over 3038774.86 frames. ], batch size: 54, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:23:57,279 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.928e-01 2023-11-18 18:24:09,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=352360.0, ans=0.125 2023-11-18 18:24:17,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=352426.6666666667, ans=0.125 2023-11-18 18:24:17,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=352426.6666666667, ans=0.125 2023-11-18 18:24:17,839 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.55 vs. limit=5.0 2023-11-18 18:24:21,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=352493.3333333333, ans=0.125 2023-11-18 18:24:27,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=352493.3333333333, ans=0.09899494936611666 2023-11-18 18:24:27,209 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.93 vs. limit=6.0 2023-11-18 18:24:30,403 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.47 vs. limit=12.0 2023-11-18 18:24:36,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=352560.0, ans=0.1 2023-11-18 18:24:45,201 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4800, loss[loss=0.1116, simple_loss=0.1351, pruned_loss=0.03443, audio_tagging_loss=0.009605, over 14617.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1181, pruned_loss=0.03309, audio_tagging_loss=0.01182, over 3033971.75 frames. ], batch size: 53, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:24:48,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=352626.6666666667, ans=0.2 2023-11-18 18:24:51,611 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.19 vs. limit=15.0 2023-11-18 18:25:06,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=352760.0, ans=0.0 2023-11-18 18:25:20,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=352826.6666666667, ans=0.125 2023-11-18 18:25:26,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=352826.6666666667, ans=0.0 2023-11-18 18:25:28,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=352826.6666666667, ans=0.125 2023-11-18 18:25:29,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=352893.3333333333, ans=0.1 2023-11-18 18:25:34,877 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.158e+01 9.461e+01 1.064e+02 1.235e+02 1.881e+02, threshold=2.128e+02, percent-clipped=0.0 2023-11-18 18:25:35,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=352893.3333333333, ans=0.0 2023-11-18 18:25:41,306 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4850, loss[loss=0.04915, simple_loss=0.04467, pruned_loss=0.01068, audio_tagging_loss=0.01614, over 14522.00 frames. ], tot_loss[loss=0.1045, simple_loss=0.1187, pruned_loss=0.03327, audio_tagging_loss=0.01186, over 3036410.42 frames. ], batch size: 56, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:25:52,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=353026.6666666667, ans=0.125 2023-11-18 18:26:04,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=353093.3333333333, ans=0.1 2023-11-18 18:26:06,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=353093.3333333333, ans=0.125 2023-11-18 18:26:36,694 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:26:37,599 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4900, loss[loss=0.09037, simple_loss=0.1006, pruned_loss=0.02851, audio_tagging_loss=0.01154, over 16342.00 frames. ], tot_loss[loss=0.1035, simple_loss=0.1178, pruned_loss=0.03279, audio_tagging_loss=0.01179, over 3039964.01 frames. ], batch size: 64, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:26:38,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=353293.3333333333, ans=0.1 2023-11-18 18:27:00,701 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2023-11-18 18:27:01,834 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.02 vs. limit=6.0 2023-11-18 18:27:12,462 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.88 vs. limit=15.0 2023-11-18 18:27:28,031 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.657e+01 9.441e+01 1.051e+02 1.165e+02 1.612e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-18 18:27:33,416 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 4950, loss[loss=0.1077, simple_loss=0.1292, pruned_loss=0.03373, audio_tagging_loss=0.00935, over 15243.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1175, pruned_loss=0.03261, audio_tagging_loss=0.01151, over 3038156.20 frames. ], batch size: 56, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:27:44,872 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.77 vs. limit=12.0 2023-11-18 18:27:49,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=353693.3333333333, ans=0.1 2023-11-18 18:28:30,000 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5000, loss[loss=0.1268, simple_loss=0.1464, pruned_loss=0.04431, audio_tagging_loss=0.009311, over 16459.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1166, pruned_loss=0.03207, audio_tagging_loss=0.01135, over 3048844.00 frames. ], batch size: 59, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:28:57,053 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.17 vs. limit=10.0 2023-11-18 18:29:01,320 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.86 vs. limit=15.0 2023-11-18 18:29:03,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=354160.0, ans=0.125 2023-11-18 18:29:04,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=354160.0, ans=0.0 2023-11-18 18:29:13,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=354226.6666666667, ans=0.0 2023-11-18 18:29:19,885 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.888e+01 9.416e+01 1.033e+02 1.125e+02 1.808e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 18:29:20,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=354226.6666666667, ans=0.0 2023-11-18 18:29:26,082 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.19 vs. limit=10.0 2023-11-18 18:29:26,420 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5050, loss[loss=0.09034, simple_loss=0.0987, pruned_loss=0.02973, audio_tagging_loss=0.01126, over 13981.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1166, pruned_loss=0.03203, audio_tagging_loss=0.01135, over 3043642.66 frames. ], batch size: 54, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:29:31,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=354293.3333333333, ans=0.0 2023-11-18 18:29:42,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=354360.0, ans=0.125 2023-11-18 18:29:59,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=354493.3333333333, ans=0.07 2023-11-18 18:30:06,893 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2023-11-18 18:30:12,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=354560.0, ans=0.125 2023-11-18 18:30:21,641 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5100, loss[loss=0.1228, simple_loss=0.1296, pruned_loss=0.04398, audio_tagging_loss=0.01407, over 15100.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1176, pruned_loss=0.03236, audio_tagging_loss=0.01137, over 3044433.22 frames. ], batch size: 57, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:30:34,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=354693.3333333333, ans=0.125 2023-11-18 18:30:52,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=354760.0, ans=0.125 2023-11-18 18:31:02,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=354826.6666666667, ans=0.0 2023-11-18 18:31:03,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=354826.6666666667, ans=0.125 2023-11-18 18:31:11,571 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.005e+01 9.700e+01 1.061e+02 1.154e+02 1.523e+02, threshold=2.123e+02, percent-clipped=0.0 2023-11-18 18:31:12,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=354893.3333333333, ans=0.1 2023-11-18 18:31:17,925 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5150, loss[loss=0.08471, simple_loss=0.09881, pruned_loss=0.02046, audio_tagging_loss=0.01484, over 14941.00 frames. ], tot_loss[loss=0.1016, simple_loss=0.1165, pruned_loss=0.03201, audio_tagging_loss=0.01138, over 3042224.04 frames. ], batch size: 56, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:31:25,024 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=15.57 vs. limit=15.0 2023-11-18 18:31:46,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=355093.3333333333, ans=0.1 2023-11-18 18:32:02,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=355226.6666666667, ans=0.5 2023-11-18 18:32:13,654 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5200, loss[loss=0.1294, simple_loss=0.1485, pruned_loss=0.04513, audio_tagging_loss=0.01008, over 15441.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1171, pruned_loss=0.03222, audio_tagging_loss=0.01137, over 3044375.89 frames. ], batch size: 56, lr: 1.36e-02, grad_scale: 32.0 2023-11-18 18:32:22,354 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.66 vs. limit=6.0 2023-11-18 18:32:22,805 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=355293.3333333333, ans=0.125 2023-11-18 18:32:23,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=355360.0, ans=0.0 2023-11-18 18:32:28,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=355360.0, ans=0.125 2023-11-18 18:32:45,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=355426.6666666667, ans=0.2 2023-11-18 18:32:48,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=355493.3333333333, ans=0.125 2023-11-18 18:33:03,979 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 9.196e+01 1.027e+02 1.112e+02 1.442e+02, threshold=2.053e+02, percent-clipped=0.0 2023-11-18 18:33:06,541 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2023-11-18 18:33:09,310 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5250, loss[loss=0.113, simple_loss=0.1197, pruned_loss=0.04172, audio_tagging_loss=0.01149, over 15020.00 frames. ], tot_loss[loss=0.103, simple_loss=0.1181, pruned_loss=0.03262, audio_tagging_loss=0.01138, over 3049992.76 frames. ], batch size: 56, lr: 1.36e-02, grad_scale: 32.0 2023-11-18 18:33:09,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=355626.6666666667, ans=0.125 2023-11-18 18:33:22,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=355693.3333333333, ans=0.2 2023-11-18 18:33:42,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=355826.6666666667, ans=0.0 2023-11-18 18:33:42,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=355826.6666666667, ans=0.125 2023-11-18 18:33:55,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=355893.3333333333, ans=0.1 2023-11-18 18:34:04,632 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5300, loss[loss=0.1237, simple_loss=0.1482, pruned_loss=0.03962, audio_tagging_loss=0.009976, over 14540.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.1185, pruned_loss=0.03278, audio_tagging_loss=0.01126, over 3044245.33 frames. ], batch size: 54, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:34:06,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=355960.0, ans=0.2 2023-11-18 18:34:27,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=356093.3333333333, ans=0.1 2023-11-18 18:34:36,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=356093.3333333333, ans=10.0 2023-11-18 18:34:38,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=356160.0, ans=0.125 2023-11-18 18:34:39,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=356160.0, ans=0.1 2023-11-18 18:34:55,613 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 8.052e+01 9.440e+01 1.053e+02 1.166e+02 1.714e+02, threshold=2.106e+02, percent-clipped=0.0 2023-11-18 18:35:00,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=356293.3333333333, ans=0.2 2023-11-18 18:35:00,876 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5350, loss[loss=0.1008, simple_loss=0.1201, pruned_loss=0.02977, audio_tagging_loss=0.01096, over 16043.00 frames. ], tot_loss[loss=0.1038, simple_loss=0.1193, pruned_loss=0.03289, audio_tagging_loss=0.01124, over 3049058.56 frames. ], batch size: 60, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:35:07,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=356293.3333333333, ans=0.125 2023-11-18 18:35:10,469 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.13 vs. limit=15.0 2023-11-18 18:35:17,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=356360.0, ans=0.125 2023-11-18 18:35:28,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=356426.6666666667, ans=0.125 2023-11-18 18:35:40,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=356493.3333333333, ans=0.05 2023-11-18 18:35:45,474 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.44 vs. limit=22.5 2023-11-18 18:35:55,908 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5400, loss[loss=0.09897, simple_loss=0.1046, pruned_loss=0.0326, audio_tagging_loss=0.01407, over 16460.00 frames. ], tot_loss[loss=0.1046, simple_loss=0.1202, pruned_loss=0.03325, audio_tagging_loss=0.01127, over 3055187.80 frames. ], batch size: 63, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:36:08,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=356693.3333333333, ans=0.125 2023-11-18 18:36:23,602 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:36:35,592 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.57 vs. limit=22.5 2023-11-18 18:36:37,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=356826.6666666667, ans=0.0 2023-11-18 18:36:46,630 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.262e+01 8.854e+01 9.931e+01 1.101e+02 1.556e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-18 18:36:51,438 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5450, loss[loss=0.1217, simple_loss=0.1454, pruned_loss=0.03955, audio_tagging_loss=0.009403, over 15133.00 frames. ], tot_loss[loss=0.1054, simple_loss=0.1207, pruned_loss=0.03371, audio_tagging_loss=0.01136, over 3055383.57 frames. ], batch size: 55, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:36:53,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=356960.0, ans=0.07 2023-11-18 18:37:05,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=357026.6666666667, ans=0.04949747468305833 2023-11-18 18:37:08,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=357026.6666666667, ans=0.05 2023-11-18 18:37:26,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=357160.0, ans=0.1 2023-11-18 18:37:46,497 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5500, loss[loss=0.1379, simple_loss=0.165, pruned_loss=0.04702, audio_tagging_loss=0.008309, over 15409.00 frames. ], tot_loss[loss=0.1055, simple_loss=0.1213, pruned_loss=0.03362, audio_tagging_loss=0.01125, over 3053465.35 frames. ], batch size: 55, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:37:55,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357293.3333333333, ans=0.1 2023-11-18 18:38:10,667 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2023-11-18 18:38:12,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=357426.6666666667, ans=0.125 2023-11-18 18:38:25,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=357493.3333333333, ans=0.125 2023-11-18 18:38:29,192 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.09 vs. limit=6.0 2023-11-18 18:38:34,127 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.56 vs. limit=22.5 2023-11-18 18:38:38,229 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.010e+01 9.009e+01 9.946e+01 1.101e+02 1.690e+02, threshold=1.989e+02, percent-clipped=0.0 2023-11-18 18:38:42,428 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5550, loss[loss=0.1067, simple_loss=0.13, pruned_loss=0.03066, audio_tagging_loss=0.011, over 16335.00 frames. ], tot_loss[loss=0.1044, simple_loss=0.1198, pruned_loss=0.03317, audio_tagging_loss=0.01134, over 3058539.83 frames. ], batch size: 62, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:38:43,065 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.28 vs. limit=22.5 2023-11-18 18:38:52,191 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:38:53,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=357693.3333333333, ans=0.1 2023-11-18 18:38:59,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=357693.3333333333, ans=0.125 2023-11-18 18:39:30,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=357893.3333333333, ans=0.125 2023-11-18 18:39:32,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=357893.3333333333, ans=0.1 2023-11-18 18:39:37,592 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5600, loss[loss=0.1351, simple_loss=0.1597, pruned_loss=0.04422, audio_tagging_loss=0.01097, over 15409.00 frames. ], tot_loss[loss=0.1042, simple_loss=0.1194, pruned_loss=0.03302, audio_tagging_loss=0.0115, over 3054247.66 frames. ], batch size: 56, lr: 1.36e-02, grad_scale: 32.0 2023-11-18 18:40:11,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=358160.0, ans=0.09899494936611666 2023-11-18 18:40:15,579 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:40:20,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=358160.0, ans=0.125 2023-11-18 18:40:22,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=358226.6666666667, ans=0.1 2023-11-18 18:40:29,642 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 9.084e+01 1.022e+02 1.205e+02 1.640e+02, threshold=2.044e+02, percent-clipped=0.0 2023-11-18 18:40:32,871 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5650, loss[loss=0.1161, simple_loss=0.1382, pruned_loss=0.03722, audio_tagging_loss=0.009771, over 16387.00 frames. ], tot_loss[loss=0.1052, simple_loss=0.1204, pruned_loss=0.0334, audio_tagging_loss=0.01164, over 3060158.38 frames. ], batch size: 60, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:41:09,171 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=15.0 2023-11-18 18:41:23,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=358560.0, ans=0.125 2023-11-18 18:41:29,428 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5700, loss[loss=0.1072, simple_loss=0.1243, pruned_loss=0.03501, audio_tagging_loss=0.01006, over 15202.00 frames. ], tot_loss[loss=0.1046, simple_loss=0.1196, pruned_loss=0.03327, audio_tagging_loss=0.01159, over 3060374.09 frames. ], batch size: 56, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:41:58,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=358760.0, ans=0.125 2023-11-18 18:42:03,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=358826.6666666667, ans=0.07 2023-11-18 18:42:20,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=358893.3333333333, ans=0.0 2023-11-18 18:42:21,625 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.993e+01 9.865e+01 1.099e+02 1.758e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 18:42:22,035 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.11 vs. limit=12.0 2023-11-18 18:42:24,813 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5750, loss[loss=0.08659, simple_loss=0.1066, pruned_loss=0.023, audio_tagging_loss=0.0103, over 15972.00 frames. ], tot_loss[loss=0.1036, simple_loss=0.1185, pruned_loss=0.03289, audio_tagging_loss=0.01149, over 3048235.17 frames. ], batch size: 59, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:42:38,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=359026.6666666667, ans=0.04949747468305833 2023-11-18 18:42:44,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=359026.6666666667, ans=0.0 2023-11-18 18:42:44,504 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2023-11-18 18:42:50,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=359093.3333333333, ans=0.125 2023-11-18 18:43:02,333 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.40 vs. limit=15.0 2023-11-18 18:43:15,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=359226.6666666667, ans=0.0 2023-11-18 18:43:17,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=359226.6666666667, ans=0.125 2023-11-18 18:43:20,444 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5800, loss[loss=0.1267, simple_loss=0.1535, pruned_loss=0.04125, audio_tagging_loss=0.008699, over 15337.00 frames. ], tot_loss[loss=0.1036, simple_loss=0.1187, pruned_loss=0.03291, audio_tagging_loss=0.01132, over 3047347.78 frames. ], batch size: 56, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:43:20,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=359293.3333333333, ans=0.125 2023-11-18 18:43:22,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=359293.3333333333, ans=0.125 2023-11-18 18:43:30,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=359360.0, ans=0.2 2023-11-18 18:43:30,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=359360.0, ans=0.07 2023-11-18 18:43:45,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=359426.6666666667, ans=0.2 2023-11-18 18:44:12,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.377e+01 8.910e+01 9.863e+01 1.080e+02 1.378e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 18:44:16,604 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5850, loss[loss=0.09079, simple_loss=0.09628, pruned_loss=0.03234, audio_tagging_loss=0.01031, over 16141.00 frames. ], tot_loss[loss=0.103, simple_loss=0.118, pruned_loss=0.03263, audio_tagging_loss=0.01139, over 3045375.35 frames. ], batch size: 61, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:44:17,215 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2023-11-18 18:44:17,255 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-11-18 18:44:29,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=359693.3333333333, ans=0.125 2023-11-18 18:44:38,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=359760.0, ans=0.125 2023-11-18 18:44:44,621 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.39 vs. limit=22.5 2023-11-18 18:44:47,958 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.47 vs. limit=15.0 2023-11-18 18:44:56,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=359826.6666666667, ans=0.125 2023-11-18 18:45:06,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=359893.3333333333, ans=0.125 2023-11-18 18:45:11,595 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.29 vs. limit=12.0 2023-11-18 18:45:12,057 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5900, loss[loss=0.107, simple_loss=0.1317, pruned_loss=0.03139, audio_tagging_loss=0.009802, over 14424.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.118, pruned_loss=0.0324, audio_tagging_loss=0.0113, over 3047410.08 frames. ], batch size: 54, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:45:26,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=360026.6666666667, ans=0.05 2023-11-18 18:45:29,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=360026.6666666667, ans=0.0 2023-11-18 18:45:35,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=360093.3333333333, ans=0.125 2023-11-18 18:45:40,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=360093.3333333333, ans=0.125 2023-11-18 18:45:42,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=360093.3333333333, ans=0.125 2023-11-18 18:45:43,039 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.99 vs. limit=10.0 2023-11-18 18:46:04,525 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 9.214e+01 1.011e+02 1.146e+02 1.411e+02, threshold=2.022e+02, percent-clipped=0.0 2023-11-18 18:46:07,763 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 5950, loss[loss=0.1013, simple_loss=0.1086, pruned_loss=0.03538, audio_tagging_loss=0.01158, over 15493.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.119, pruned_loss=0.03259, audio_tagging_loss=0.01123, over 3051103.35 frames. ], batch size: 57, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:46:07,972 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:46:19,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=360360.0, ans=0.125 2023-11-18 18:46:32,602 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2023-11-18 18:46:44,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=360493.3333333333, ans=0.0 2023-11-18 18:46:44,350 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.01 vs. limit=22.5 2023-11-18 18:46:51,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=360560.0, ans=0.04949747468305833 2023-11-18 18:47:03,806 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6000, loss[loss=0.09753, simple_loss=0.119, pruned_loss=0.02975, audio_tagging_loss=0.008267, over 14656.00 frames. ], tot_loss[loss=0.102, simple_loss=0.1173, pruned_loss=0.03214, audio_tagging_loss=0.01125, over 3043902.45 frames. ], batch size: 56, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:47:03,807 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 18:47:34,320 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1993, 2.0748, 5.1897, 2.1745], device='cuda:2') 2023-11-18 18:47:36,971 INFO [train_asr.py:1147] (2/4) Epoch 5, validation: loss=0.0732, simple_loss=0.06039, pruned_loss=0.009139, audio_tagging_loss=0.03386, over 4681554.00 frames. 2023-11-18 18:47:36,971 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 18:47:38,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=360626.6666666667, ans=0.04949747468305833 2023-11-18 18:47:40,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=360626.6666666667, ans=0.1 2023-11-18 18:47:45,959 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2023-11-18 18:47:46,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=360693.3333333333, ans=0.125 2023-11-18 18:47:47,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=360693.3333333333, ans=0.0 2023-11-18 18:47:51,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=360693.3333333333, ans=0.0 2023-11-18 18:48:01,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=360760.0, ans=0.125 2023-11-18 18:48:13,922 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:48:14,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=360826.6666666667, ans=0.125 2023-11-18 18:48:15,563 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.70 vs. limit=15.0 2023-11-18 18:48:28,700 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 9.155e+01 9.916e+01 1.075e+02 1.410e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-18 18:48:31,433 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.73 vs. limit=6.0 2023-11-18 18:48:31,983 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6050, loss[loss=0.08901, simple_loss=0.1148, pruned_loss=0.02272, audio_tagging_loss=0.008892, over 15844.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1181, pruned_loss=0.03223, audio_tagging_loss=0.01121, over 3049666.75 frames. ], batch size: 58, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:48:32,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten.whitening_limit, batch_count=360960.0, ans=15.0 2023-11-18 18:48:54,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=361093.3333333333, ans=0.025 2023-11-18 18:49:24,444 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.25 vs. limit=15.0 2023-11-18 18:49:28,168 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6100, loss[loss=0.1299, simple_loss=0.158, pruned_loss=0.04049, audio_tagging_loss=0.01044, over 14724.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1169, pruned_loss=0.03212, audio_tagging_loss=0.01122, over 3050694.15 frames. ], batch size: 52, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:49:31,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=361293.3333333333, ans=0.05 2023-11-18 18:49:38,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=361360.0, ans=0.05 2023-11-18 18:49:45,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=361360.0, ans=0.1 2023-11-18 18:50:04,353 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.41 vs. limit=22.5 2023-11-18 18:50:07,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=361493.3333333333, ans=0.125 2023-11-18 18:50:19,771 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.21 vs. limit=15.0 2023-11-18 18:50:21,524 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 9.202e+01 1.052e+02 1.142e+02 1.737e+02, threshold=2.103e+02, percent-clipped=0.0 2023-11-18 18:50:23,665 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6150, loss[loss=0.1054, simple_loss=0.1138, pruned_loss=0.03614, audio_tagging_loss=0.01231, over 14941.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1169, pruned_loss=0.0322, audio_tagging_loss=0.01131, over 3048277.93 frames. ], batch size: 55, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:50:49,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=361760.0, ans=0.125 2023-11-18 18:51:00,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=361826.6666666667, ans=0.0 2023-11-18 18:51:20,188 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6200, loss[loss=0.08758, simple_loss=0.1074, pruned_loss=0.02251, audio_tagging_loss=0.0114, over 15684.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1161, pruned_loss=0.03197, audio_tagging_loss=0.01145, over 3048476.06 frames. ], batch size: 57, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:51:21,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=361960.0, ans=0.1 2023-11-18 18:51:27,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=361960.0, ans=0.125 2023-11-18 18:51:43,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=362093.3333333333, ans=0.125 2023-11-18 18:51:48,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=362093.3333333333, ans=0.125 2023-11-18 18:51:51,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=362093.3333333333, ans=0.1 2023-11-18 18:52:09,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=362226.6666666667, ans=0.0 2023-11-18 18:52:14,243 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 9.346e+01 1.036e+02 1.107e+02 1.533e+02, threshold=2.072e+02, percent-clipped=0.0 2023-11-18 18:52:16,402 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6250, loss[loss=0.1053, simple_loss=0.1238, pruned_loss=0.03337, audio_tagging_loss=0.01002, over 16243.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.116, pruned_loss=0.03208, audio_tagging_loss=0.01145, over 3047487.51 frames. ], batch size: 56, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:52:19,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=362293.3333333333, ans=0.2 2023-11-18 18:52:28,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=362360.0, ans=0.04949747468305833 2023-11-18 18:52:39,854 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.52 vs. limit=15.0 2023-11-18 18:52:52,533 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.64 vs. limit=22.5 2023-11-18 18:53:10,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=362626.6666666667, ans=0.125 2023-11-18 18:53:11,488 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6300, loss[loss=0.1071, simple_loss=0.1305, pruned_loss=0.02833, audio_tagging_loss=0.01355, over 15025.00 frames. ], tot_loss[loss=0.1003, simple_loss=0.1144, pruned_loss=0.0315, audio_tagging_loss=0.01158, over 3050197.26 frames. ], batch size: 55, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:53:21,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=362626.6666666667, ans=0.0 2023-11-18 18:53:23,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=362693.3333333333, ans=0.125 2023-11-18 18:53:25,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=362693.3333333333, ans=0.0 2023-11-18 18:53:26,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=362693.3333333333, ans=0.1 2023-11-18 18:53:30,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=362693.3333333333, ans=0.125 2023-11-18 18:53:31,155 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.72 vs. limit=15.0 2023-11-18 18:53:56,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=362893.3333333333, ans=0.125 2023-11-18 18:54:04,973 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.371e+01 9.079e+01 9.861e+01 1.090e+02 1.541e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-18 18:54:07,085 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6350, loss[loss=0.1014, simple_loss=0.1284, pruned_loss=0.0274, audio_tagging_loss=0.009821, over 15469.00 frames. ], tot_loss[loss=0.09896, simple_loss=0.1132, pruned_loss=0.03081, audio_tagging_loss=0.01154, over 3052008.02 frames. ], batch size: 56, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:54:17,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=363026.6666666667, ans=0.125 2023-11-18 18:54:19,359 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2023-11-18 18:54:35,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=363093.3333333333, ans=0.0 2023-11-18 18:54:54,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=363226.6666666667, ans=0.0 2023-11-18 18:54:56,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=363226.6666666667, ans=0.035 2023-11-18 18:55:03,933 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6400, loss[loss=0.09544, simple_loss=0.1071, pruned_loss=0.02968, audio_tagging_loss=0.01219, over 14545.00 frames. ], tot_loss[loss=0.09982, simple_loss=0.1144, pruned_loss=0.03106, audio_tagging_loss=0.01158, over 3047952.71 frames. ], batch size: 56, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:55:26,171 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=12.0 2023-11-18 18:55:32,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=363426.6666666667, ans=0.07 2023-11-18 18:55:35,379 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.70 vs. limit=15.0 2023-11-18 18:55:46,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=363493.3333333333, ans=10.0 2023-11-18 18:55:50,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=363560.0, ans=0.5 2023-11-18 18:55:56,641 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.151e+01 9.321e+01 1.035e+02 1.143e+02 1.548e+02, threshold=2.069e+02, percent-clipped=0.0 2023-11-18 18:55:58,761 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6450, loss[loss=0.1226, simple_loss=0.1503, pruned_loss=0.04002, audio_tagging_loss=0.007444, over 15138.00 frames. ], tot_loss[loss=0.1003, simple_loss=0.1146, pruned_loss=0.03134, audio_tagging_loss=0.01166, over 3042978.41 frames. ], batch size: 54, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:56:19,147 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.62 vs. limit=15.0 2023-11-18 18:56:29,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=363760.0, ans=0.125 2023-11-18 18:56:51,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=363893.3333333333, ans=0.2 2023-11-18 18:56:54,222 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6500, loss[loss=0.09605, simple_loss=0.1087, pruned_loss=0.02842, audio_tagging_loss=0.01329, over 15380.00 frames. ], tot_loss[loss=0.1003, simple_loss=0.1146, pruned_loss=0.03131, audio_tagging_loss=0.01168, over 3049522.21 frames. ], batch size: 56, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:56:57,643 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:56:59,967 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.26 vs. limit=22.5 2023-11-18 18:57:03,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=363960.0, ans=0.2 2023-11-18 18:57:27,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=364160.0, ans=0.125 2023-11-18 18:57:42,808 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.51 vs. limit=22.5 2023-11-18 18:57:48,371 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 9.392e+01 1.004e+02 1.100e+02 1.543e+02, threshold=2.007e+02, percent-clipped=0.0 2023-11-18 18:57:50,526 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6550, loss[loss=0.1313, simple_loss=0.1556, pruned_loss=0.04618, audio_tagging_loss=0.007285, over 16247.00 frames. ], tot_loss[loss=0.09929, simple_loss=0.1135, pruned_loss=0.03096, audio_tagging_loss=0.01156, over 3051274.89 frames. ], batch size: 58, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:57:52,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=364293.3333333333, ans=0.125 2023-11-18 18:58:00,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=364360.0, ans=0.07 2023-11-18 18:58:46,368 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6600, loss[loss=0.09867, simple_loss=0.1182, pruned_loss=0.03034, audio_tagging_loss=0.009232, over 14885.00 frames. ], tot_loss[loss=0.09954, simple_loss=0.1138, pruned_loss=0.03112, audio_tagging_loss=0.01152, over 3041685.31 frames. ], batch size: 57, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:58:46,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=364626.6666666667, ans=0.125 2023-11-18 18:59:08,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=364760.0, ans=0.1 2023-11-18 18:59:10,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=364760.0, ans=0.1 2023-11-18 18:59:11,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=364760.0, ans=0.125 2023-11-18 18:59:39,646 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.947e+01 1.006e+02 1.140e+02 1.601e+02, threshold=2.013e+02, percent-clipped=0.0 2023-11-18 18:59:41,802 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6650, loss[loss=0.1032, simple_loss=0.117, pruned_loss=0.03277, audio_tagging_loss=0.01195, over 13925.00 frames. ], tot_loss[loss=0.1003, simple_loss=0.1151, pruned_loss=0.03143, audio_tagging_loss=0.01134, over 3039473.79 frames. ], batch size: 54, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:59:42,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=364960.0, ans=0.1 2023-11-18 18:59:47,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=364960.0, ans=0.125 2023-11-18 19:00:08,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=365093.3333333333, ans=0.1 2023-11-18 19:00:11,811 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.48 vs. limit=15.0 2023-11-18 19:00:17,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=365160.0, ans=0.0 2023-11-18 19:00:37,649 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6700, loss[loss=0.09717, simple_loss=0.1135, pruned_loss=0.02846, audio_tagging_loss=0.01196, over 14568.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1155, pruned_loss=0.03158, audio_tagging_loss=0.01139, over 3036805.74 frames. ], batch size: 53, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:00:45,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=365293.3333333333, ans=0.2 2023-11-18 19:00:52,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=365360.0, ans=0.0 2023-11-18 19:00:56,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=365360.0, ans=0.025 2023-11-18 19:01:14,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365493.3333333333, ans=0.1 2023-11-18 19:01:25,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=365560.0, ans=0.125 2023-11-18 19:01:32,112 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 9.414e+01 1.042e+02 1.183e+02 1.878e+02, threshold=2.084e+02, percent-clipped=0.0 2023-11-18 19:01:34,262 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6750, loss[loss=0.1006, simple_loss=0.0996, pruned_loss=0.03943, audio_tagging_loss=0.01132, over 15662.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1161, pruned_loss=0.03185, audio_tagging_loss=0.01137, over 3033517.80 frames. ], batch size: 60, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:01:34,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=365626.6666666667, ans=0.125 2023-11-18 19:01:42,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=365626.6666666667, ans=12.0 2023-11-18 19:01:49,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=365693.3333333333, ans=0.2 2023-11-18 19:01:52,322 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=12.0 2023-11-18 19:02:01,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=365760.0, ans=0.125 2023-11-18 19:02:09,621 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.41 vs. limit=6.0 2023-11-18 19:02:22,422 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.26 vs. limit=6.0 2023-11-18 19:02:29,899 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6800, loss[loss=0.07969, simple_loss=0.1006, pruned_loss=0.02293, audio_tagging_loss=0.006463, over 14762.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1158, pruned_loss=0.03174, audio_tagging_loss=0.01128, over 3030483.55 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:02:39,086 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=12.0 2023-11-18 19:03:09,783 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.73 vs. limit=22.5 2023-11-18 19:03:14,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=366226.6666666667, ans=0.1 2023-11-18 19:03:15,653 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:03:15,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=366226.6666666667, ans=0.125 2023-11-18 19:03:22,850 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 8.984e+01 9.907e+01 1.134e+02 1.555e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-18 19:03:24,932 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6850, loss[loss=0.108, simple_loss=0.1232, pruned_loss=0.03601, audio_tagging_loss=0.01036, over 15456.00 frames. ], tot_loss[loss=0.1004, simple_loss=0.1153, pruned_loss=0.03148, audio_tagging_loss=0.01131, over 3030800.27 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:03:26,607 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.30 vs. limit=15.0 2023-11-18 19:03:40,346 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2023-11-18 19:03:55,398 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.67 vs. limit=12.0 2023-11-18 19:04:18,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=366560.0, ans=0.0 2023-11-18 19:04:19,799 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.29 vs. limit=12.0 2023-11-18 19:04:21,394 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6900, loss[loss=0.1068, simple_loss=0.1218, pruned_loss=0.03593, audio_tagging_loss=0.009971, over 15692.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1167, pruned_loss=0.03187, audio_tagging_loss=0.01119, over 3042956.59 frames. ], batch size: 60, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:04:47,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=366760.0, ans=0.125 2023-11-18 19:04:47,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=366760.0, ans=0.025 2023-11-18 19:04:51,817 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=15.0 2023-11-18 19:04:53,162 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:05:02,956 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:05:15,600 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 9.152e+01 1.012e+02 1.130e+02 1.420e+02, threshold=2.024e+02, percent-clipped=0.0 2023-11-18 19:05:17,768 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 6950, loss[loss=0.1087, simple_loss=0.1299, pruned_loss=0.03429, audio_tagging_loss=0.009436, over 14719.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1158, pruned_loss=0.03164, audio_tagging_loss=0.01132, over 3036638.12 frames. ], batch size: 58, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:05:23,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=366960.0, ans=0.2 2023-11-18 19:05:38,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=367093.3333333333, ans=0.125 2023-11-18 19:05:41,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=367093.3333333333, ans=0.1 2023-11-18 19:05:51,068 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.20 vs. limit=22.5 2023-11-18 19:06:04,810 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.04 vs. limit=15.0 2023-11-18 19:06:12,702 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7000, loss[loss=0.08924, simple_loss=0.1063, pruned_loss=0.02451, audio_tagging_loss=0.01156, over 15657.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1156, pruned_loss=0.03161, audio_tagging_loss=0.01153, over 3031227.53 frames. ], batch size: 57, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:06:25,137 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.71 vs. limit=15.0 2023-11-18 19:06:45,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=367493.3333333333, ans=0.125 2023-11-18 19:06:52,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=367493.3333333333, ans=0.125 2023-11-18 19:07:06,681 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 9.236e+01 1.008e+02 1.142e+02 1.683e+02, threshold=2.016e+02, percent-clipped=0.0 2023-11-18 19:07:08,810 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7050, loss[loss=0.1468, simple_loss=0.1683, pruned_loss=0.0536, audio_tagging_loss=0.009029, over 14987.00 frames. ], tot_loss[loss=0.1016, simple_loss=0.1166, pruned_loss=0.03187, audio_tagging_loss=0.01148, over 3032221.73 frames. ], batch size: 55, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:07:29,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=367693.3333333333, ans=0.125 2023-11-18 19:07:33,384 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:07:46,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=367826.6666666667, ans=0.0 2023-11-18 19:07:58,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=367893.3333333333, ans=0.0 2023-11-18 19:08:04,461 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7100, loss[loss=0.1143, simple_loss=0.1235, pruned_loss=0.03965, audio_tagging_loss=0.01294, over 15226.00 frames. ], tot_loss[loss=0.101, simple_loss=0.1155, pruned_loss=0.03162, audio_tagging_loss=0.01156, over 3031880.33 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:08:28,559 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.52 vs. limit=6.0 2023-11-18 19:08:49,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=368226.6666666667, ans=0.04949747468305833 2023-11-18 19:08:57,712 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 9.190e+01 1.032e+02 1.164e+02 1.806e+02, threshold=2.063e+02, percent-clipped=0.0 2023-11-18 19:08:59,843 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7150, loss[loss=0.1057, simple_loss=0.1135, pruned_loss=0.03749, audio_tagging_loss=0.01147, over 15048.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1163, pruned_loss=0.03198, audio_tagging_loss=0.01156, over 3033673.64 frames. ], batch size: 55, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:09:08,410 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.03 vs. limit=15.0 2023-11-18 19:09:24,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=368426.6666666667, ans=0.125 2023-11-18 19:09:36,283 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2023-11-18 19:09:41,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=368493.3333333333, ans=0.0 2023-11-18 19:09:55,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=368626.6666666667, ans=0.125 2023-11-18 19:09:55,903 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7200, loss[loss=0.1044, simple_loss=0.1183, pruned_loss=0.03192, audio_tagging_loss=0.01328, over 14568.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1162, pruned_loss=0.03179, audio_tagging_loss=0.01161, over 3041336.54 frames. ], batch size: 55, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:10:01,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=368626.6666666667, ans=0.0 2023-11-18 19:10:26,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=368760.0, ans=0.2 2023-11-18 19:10:47,560 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.70 vs. limit=15.0 2023-11-18 19:10:49,019 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 9.156e+01 1.033e+02 1.136e+02 1.885e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 19:10:51,154 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7250, loss[loss=0.1033, simple_loss=0.1281, pruned_loss=0.03038, audio_tagging_loss=0.008882, over 15440.00 frames. ], tot_loss[loss=0.1024, simple_loss=0.1172, pruned_loss=0.03219, audio_tagging_loss=0.01162, over 3040235.16 frames. ], batch size: 57, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:10:51,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=368960.0, ans=0.125 2023-11-18 19:10:57,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=368960.0, ans=0.1 2023-11-18 19:11:07,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=369026.6666666667, ans=0.2 2023-11-18 19:11:23,494 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.08 vs. limit=12.0 2023-11-18 19:11:26,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=369160.0, ans=0.0 2023-11-18 19:11:26,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=369160.0, ans=0.125 2023-11-18 19:11:28,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=369160.0, ans=0.125 2023-11-18 19:11:44,551 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:11:46,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=369293.3333333333, ans=0.125 2023-11-18 19:11:47,513 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7300, loss[loss=0.09167, simple_loss=0.09943, pruned_loss=0.02836, audio_tagging_loss=0.01359, over 14759.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1171, pruned_loss=0.03225, audio_tagging_loss=0.0115, over 3045297.46 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:12:12,899 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.19 vs. limit=15.0 2023-11-18 19:12:21,829 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.56 vs. limit=22.5 2023-11-18 19:12:29,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=369493.3333333333, ans=0.95 2023-11-18 19:12:41,594 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.861e+01 9.294e+01 1.042e+02 1.203e+02 1.669e+02, threshold=2.084e+02, percent-clipped=0.0 2023-11-18 19:12:44,289 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7350, loss[loss=0.06985, simple_loss=0.07387, pruned_loss=0.01975, audio_tagging_loss=0.01316, over 14776.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1167, pruned_loss=0.03203, audio_tagging_loss=0.01137, over 3044172.90 frames. ], batch size: 57, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:12:47,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=369626.6666666667, ans=0.95 2023-11-18 19:12:50,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=369626.6666666667, ans=0.0 2023-11-18 19:13:07,521 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2023-11-18 19:13:13,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=369760.0, ans=0.125 2023-11-18 19:13:27,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=369893.3333333333, ans=0.125 2023-11-18 19:13:32,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=369893.3333333333, ans=0.0 2023-11-18 19:13:39,073 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7400, loss[loss=0.09373, simple_loss=0.1106, pruned_loss=0.02762, audio_tagging_loss=0.01083, over 15309.00 frames. ], tot_loss[loss=0.1016, simple_loss=0.1165, pruned_loss=0.03207, audio_tagging_loss=0.0113, over 3037994.57 frames. ], batch size: 58, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:14:10,301 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.25 vs. limit=6.0 2023-11-18 19:14:26,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=370226.6666666667, ans=0.0 2023-11-18 19:14:32,447 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.641e+01 9.039e+01 9.551e+01 1.074e+02 1.292e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-18 19:14:32,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=370226.6666666667, ans=0.125 2023-11-18 19:14:34,623 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7450, loss[loss=0.1183, simple_loss=0.1402, pruned_loss=0.03798, audio_tagging_loss=0.01023, over 15897.00 frames. ], tot_loss[loss=0.1011, simple_loss=0.1158, pruned_loss=0.03186, audio_tagging_loss=0.0113, over 3039129.34 frames. ], batch size: 58, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:14:46,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=370360.0, ans=0.1 2023-11-18 19:14:47,989 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.20 vs. limit=22.5 2023-11-18 19:15:06,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=370426.6666666667, ans=0.2 2023-11-18 19:15:09,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=370493.3333333333, ans=0.125 2023-11-18 19:15:19,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=370560.0, ans=0.1 2023-11-18 19:15:21,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=370560.0, ans=0.025 2023-11-18 19:15:27,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=370560.0, ans=0.02 2023-11-18 19:15:30,679 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7500, loss[loss=0.09742, simple_loss=0.1139, pruned_loss=0.03227, audio_tagging_loss=0.008226, over 16261.00 frames. ], tot_loss[loss=0.1008, simple_loss=0.1152, pruned_loss=0.03181, audio_tagging_loss=0.01141, over 3035179.56 frames. ], batch size: 60, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:15:33,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=370626.6666666667, ans=0.0 2023-11-18 19:15:44,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=370693.3333333333, ans=0.125 2023-11-18 19:15:50,808 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.71 vs. limit=15.0 2023-11-18 19:15:53,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=370760.0, ans=0.125 2023-11-18 19:16:03,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=370826.6666666667, ans=0.0 2023-11-18 19:16:07,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=370826.6666666667, ans=0.0 2023-11-18 19:16:24,549 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 9.004e+01 9.847e+01 1.087e+02 1.456e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-18 19:16:26,722 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7550, loss[loss=0.1143, simple_loss=0.1477, pruned_loss=0.03187, audio_tagging_loss=0.008597, over 15077.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1159, pruned_loss=0.03211, audio_tagging_loss=0.0113, over 3036953.46 frames. ], batch size: 58, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:16:43,911 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=15.0 2023-11-18 19:16:48,222 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.66 vs. limit=15.0 2023-11-18 19:17:22,447 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7600, loss[loss=0.1003, simple_loss=0.1133, pruned_loss=0.03298, audio_tagging_loss=0.01065, over 14987.00 frames. ], tot_loss[loss=0.1011, simple_loss=0.1156, pruned_loss=0.03201, audio_tagging_loss=0.0113, over 3044365.95 frames. ], batch size: 55, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:17:30,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=371293.3333333333, ans=0.0 2023-11-18 19:17:30,695 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.16 vs. limit=6.0 2023-11-18 19:17:47,627 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:18:15,323 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 9.066e+01 9.750e+01 1.073e+02 2.127e+02, threshold=1.950e+02, percent-clipped=2.0 2023-11-18 19:18:18,593 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7650, loss[loss=0.1169, simple_loss=0.1351, pruned_loss=0.03708, audio_tagging_loss=0.01227, over 15045.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1148, pruned_loss=0.03201, audio_tagging_loss=0.01128, over 3034853.57 frames. ], batch size: 56, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:18:24,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=371626.6666666667, ans=0.125 2023-11-18 19:18:57,986 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=371826.6666666667, ans=0.125 2023-11-18 19:19:14,363 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7700, loss[loss=0.07785, simple_loss=0.08209, pruned_loss=0.02334, audio_tagging_loss=0.01346, over 16890.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.1147, pruned_loss=0.03187, audio_tagging_loss=0.01129, over 3035698.89 frames. ], batch size: 64, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:19:34,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=372026.6666666667, ans=0.0 2023-11-18 19:19:41,098 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.34 vs. limit=6.0 2023-11-18 19:20:05,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=372226.6666666667, ans=0.0 2023-11-18 19:20:07,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=372226.6666666667, ans=0.04949747468305833 2023-11-18 19:20:08,184 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.371e+01 8.781e+01 9.756e+01 1.085e+02 1.598e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-18 19:20:10,364 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7750, loss[loss=0.1263, simple_loss=0.1546, pruned_loss=0.04313, audio_tagging_loss=0.005858, over 16038.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1158, pruned_loss=0.03215, audio_tagging_loss=0.01125, over 3037325.35 frames. ], batch size: 56, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:20:13,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=372293.3333333333, ans=0.2 2023-11-18 19:20:18,232 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2023-11-18 19:20:19,423 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2023-11-18 19:20:40,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=372426.6666666667, ans=0.125 2023-11-18 19:20:52,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=372493.3333333333, ans=0.125 2023-11-18 19:21:05,694 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7800, loss[loss=0.08852, simple_loss=0.09857, pruned_loss=0.02757, audio_tagging_loss=0.01167, over 15459.00 frames. ], tot_loss[loss=0.1024, simple_loss=0.1176, pruned_loss=0.03242, audio_tagging_loss=0.01122, over 3047553.15 frames. ], batch size: 56, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:21:19,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=372693.3333333333, ans=0.125 2023-11-18 19:22:00,496 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.836e+01 9.770e+01 1.067e+02 1.448e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-18 19:22:02,619 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7850, loss[loss=0.06458, simple_loss=0.07622, pruned_loss=0.01446, audio_tagging_loss=0.01201, over 14487.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.118, pruned_loss=0.03218, audio_tagging_loss=0.01131, over 3051801.46 frames. ], batch size: 55, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:22:14,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=373026.6666666667, ans=0.1 2023-11-18 19:22:30,092 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.33 vs. limit=10.0 2023-11-18 19:22:30,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=373093.3333333333, ans=0.125 2023-11-18 19:22:52,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=373226.6666666667, ans=0.0 2023-11-18 19:22:55,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=373226.6666666667, ans=0.125 2023-11-18 19:22:58,341 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7900, loss[loss=0.1178, simple_loss=0.1353, pruned_loss=0.0412, audio_tagging_loss=0.008945, over 14977.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1169, pruned_loss=0.03196, audio_tagging_loss=0.01145, over 3044633.66 frames. ], batch size: 54, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:22:59,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=373293.3333333333, ans=0.2 2023-11-18 19:23:25,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=373426.6666666667, ans=0.0 2023-11-18 19:23:33,545 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.62 vs. limit=12.0 2023-11-18 19:23:53,283 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 9.204e+01 9.997e+01 1.093e+02 1.252e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-18 19:23:55,400 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 7950, loss[loss=0.0965, simple_loss=0.1021, pruned_loss=0.03096, audio_tagging_loss=0.01446, over 14911.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1165, pruned_loss=0.03178, audio_tagging_loss=0.01164, over 3040788.18 frames. ], batch size: 58, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:24:00,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=373626.6666666667, ans=0.125 2023-11-18 19:24:00,465 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=373626.6666666667, ans=0.02 2023-11-18 19:24:08,106 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:24:11,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=373693.3333333333, ans=0.125 2023-11-18 19:24:43,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=373893.3333333333, ans=0.0 2023-11-18 19:24:47,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=373893.3333333333, ans=0.125 2023-11-18 19:24:51,735 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8000, loss[loss=0.08662, simple_loss=0.09423, pruned_loss=0.02655, audio_tagging_loss=0.01295, over 14705.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.116, pruned_loss=0.03167, audio_tagging_loss=0.01176, over 3043126.52 frames. ], batch size: 55, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:24:51,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=373960.0, ans=0.2 2023-11-18 19:25:04,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=374026.6666666667, ans=0.125 2023-11-18 19:25:39,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=374226.6666666667, ans=0.125 2023-11-18 19:25:46,712 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.997e+01 9.797e+01 1.056e+02 1.371e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-18 19:25:47,815 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8050, loss[loss=0.08323, simple_loss=0.08777, pruned_loss=0.02383, audio_tagging_loss=0.01552, over 15657.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1157, pruned_loss=0.03179, audio_tagging_loss=0.01179, over 3037233.87 frames. ], batch size: 63, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:25:58,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=374360.0, ans=0.125 2023-11-18 19:26:02,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=374360.0, ans=0.125 2023-11-18 19:26:15,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=374426.6666666667, ans=0.125 2023-11-18 19:26:18,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=374426.6666666667, ans=0.0 2023-11-18 19:26:34,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=374560.0, ans=0.05 2023-11-18 19:26:42,907 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8100, loss[loss=0.09654, simple_loss=0.1051, pruned_loss=0.03081, audio_tagging_loss=0.01319, over 14293.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1173, pruned_loss=0.03249, audio_tagging_loss=0.01161, over 3041932.92 frames. ], batch size: 53, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:27:08,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=374760.0, ans=0.2 2023-11-18 19:27:11,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=374760.0, ans=0.125 2023-11-18 19:27:13,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=374760.0, ans=0.125 2023-11-18 19:27:16,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=374826.6666666667, ans=0.125 2023-11-18 19:27:28,259 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-18 19:27:38,042 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.231e+01 9.480e+01 1.051e+02 1.132e+02 1.844e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-18 19:27:39,094 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8150, loss[loss=0.1315, simple_loss=0.1572, pruned_loss=0.04299, audio_tagging_loss=0.009923, over 15328.00 frames. ], tot_loss[loss=0.1036, simple_loss=0.1185, pruned_loss=0.03286, audio_tagging_loss=0.01148, over 3048152.51 frames. ], batch size: 56, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:27:43,980 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.35 vs. limit=22.5 2023-11-18 19:28:01,811 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.84 vs. limit=15.0 2023-11-18 19:28:02,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=375093.3333333333, ans=0.0 2023-11-18 19:28:02,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=375093.3333333333, ans=0.025 2023-11-18 19:28:28,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=375226.6666666667, ans=0.125 2023-11-18 19:28:34,130 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:28:35,153 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8200, loss[loss=0.09659, simple_loss=0.1192, pruned_loss=0.02769, audio_tagging_loss=0.009305, over 15254.00 frames. ], tot_loss[loss=0.1034, simple_loss=0.1188, pruned_loss=0.03279, audio_tagging_loss=0.01123, over 3051639.97 frames. ], batch size: 55, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:28:35,723 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.85 vs. limit=15.0 2023-11-18 19:28:38,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=375293.3333333333, ans=0.0 2023-11-18 19:28:43,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=375293.3333333333, ans=0.125 2023-11-18 19:28:53,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=375360.0, ans=0.0 2023-11-18 19:29:00,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=375426.6666666667, ans=0.125 2023-11-18 19:29:22,168 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.15 vs. limit=15.0 2023-11-18 19:29:22,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=375560.0, ans=0.0 2023-11-18 19:29:23,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375560.0, ans=0.1 2023-11-18 19:29:28,969 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 9.797e+01 1.057e+02 1.238e+02 1.453e+02, threshold=2.115e+02, percent-clipped=0.0 2023-11-18 19:29:30,070 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8250, loss[loss=0.09331, simple_loss=0.1029, pruned_loss=0.02801, audio_tagging_loss=0.01385, over 14171.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.1186, pruned_loss=0.03275, audio_tagging_loss=0.01115, over 3049483.64 frames. ], batch size: 53, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:29:34,954 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=15.0 2023-11-18 19:29:42,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=375693.3333333333, ans=0.0 2023-11-18 19:29:44,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=375693.3333333333, ans=0.0 2023-11-18 19:29:47,331 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.57 vs. limit=15.0 2023-11-18 19:29:55,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=375760.0, ans=0.0 2023-11-18 19:29:59,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=375760.0, ans=0.0 2023-11-18 19:30:09,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=375826.6666666667, ans=0.125 2023-11-18 19:30:15,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=375893.3333333333, ans=0.1 2023-11-18 19:30:16,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375893.3333333333, ans=0.1 2023-11-18 19:30:25,319 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8300, loss[loss=0.07147, simple_loss=0.08066, pruned_loss=0.01867, audio_tagging_loss=0.01247, over 15293.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1182, pruned_loss=0.03267, audio_tagging_loss=0.01114, over 3046705.82 frames. ], batch size: 60, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:30:29,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=375960.0, ans=0.125 2023-11-18 19:31:13,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=376226.6666666667, ans=0.1 2023-11-18 19:31:16,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=376226.6666666667, ans=0.125 2023-11-18 19:31:19,571 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.771e+01 9.264e+01 1.007e+02 1.092e+02 1.530e+02, threshold=2.015e+02, percent-clipped=0.0 2023-11-18 19:31:21,241 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8350, loss[loss=0.1032, simple_loss=0.1246, pruned_loss=0.03264, audio_tagging_loss=0.008252, over 15217.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1174, pruned_loss=0.03224, audio_tagging_loss=0.01124, over 3051526.60 frames. ], batch size: 56, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:31:32,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=376360.0, ans=0.2 2023-11-18 19:31:34,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=376360.0, ans=0.125 2023-11-18 19:31:40,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=376360.0, ans=0.125 2023-11-18 19:31:42,558 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.13 vs. limit=15.0 2023-11-18 19:31:56,397 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.52 vs. limit=15.0 2023-11-18 19:32:01,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=376493.3333333333, ans=0.0 2023-11-18 19:32:04,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=376560.0, ans=0.1 2023-11-18 19:32:14,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=376560.0, ans=0.0 2023-11-18 19:32:16,829 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8400, loss[loss=0.1202, simple_loss=0.1337, pruned_loss=0.04308, audio_tagging_loss=0.01028, over 14995.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1166, pruned_loss=0.03184, audio_tagging_loss=0.01123, over 3050159.18 frames. ], batch size: 56, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:32:29,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=376693.3333333333, ans=0.2 2023-11-18 19:32:29,654 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2023-11-18 19:32:33,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=376693.3333333333, ans=0.5 2023-11-18 19:32:44,214 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=12.0 2023-11-18 19:32:49,919 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.31 vs. limit=6.0 2023-11-18 19:33:11,381 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.828e+01 9.983e+01 1.109e+02 1.398e+02, threshold=1.997e+02, percent-clipped=0.0 2023-11-18 19:33:13,024 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8450, loss[loss=0.09219, simple_loss=0.1077, pruned_loss=0.02596, audio_tagging_loss=0.01238, over 14374.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1165, pruned_loss=0.03199, audio_tagging_loss=0.01126, over 3043298.90 frames. ], batch size: 54, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:33:21,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=376960.0, ans=0.125 2023-11-18 19:33:34,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=377093.3333333333, ans=0.125 2023-11-18 19:33:50,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=377160.0, ans=0.125 2023-11-18 19:34:07,974 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8500, loss[loss=0.09001, simple_loss=0.1058, pruned_loss=0.02688, audio_tagging_loss=0.01023, over 14653.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1166, pruned_loss=0.03184, audio_tagging_loss=0.0113, over 3042900.44 frames. ], batch size: 56, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:34:49,057 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.90 vs. limit=22.5 2023-11-18 19:34:56,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=377560.0, ans=0.125 2023-11-18 19:35:02,738 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.31 vs. limit=12.0 2023-11-18 19:35:03,281 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.332e+01 8.638e+01 9.723e+01 1.079e+02 1.527e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-18 19:35:03,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=377626.6666666667, ans=0.0 2023-11-18 19:35:04,376 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8550, loss[loss=0.07851, simple_loss=0.09021, pruned_loss=0.02082, audio_tagging_loss=0.01259, over 14297.00 frames. ], tot_loss[loss=0.1016, simple_loss=0.1168, pruned_loss=0.03184, audio_tagging_loss=0.01133, over 3043285.95 frames. ], batch size: 55, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:35:10,494 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.93 vs. limit=15.0 2023-11-18 19:35:29,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=377760.0, ans=0.125 2023-11-18 19:35:37,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=377826.6666666667, ans=0.0 2023-11-18 19:35:38,221 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.78 vs. limit=22.5 2023-11-18 19:36:00,028 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8600, loss[loss=0.1506, simple_loss=0.1869, pruned_loss=0.04935, audio_tagging_loss=0.007777, over 16257.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1165, pruned_loss=0.0316, audio_tagging_loss=0.01131, over 3042747.14 frames. ], batch size: 59, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:36:07,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=377960.0, ans=0.125 2023-11-18 19:36:11,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=378026.6666666667, ans=0.125 2023-11-18 19:36:19,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=378026.6666666667, ans=0.125 2023-11-18 19:36:33,654 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:36:41,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=378160.0, ans=0.1 2023-11-18 19:36:42,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=378160.0, ans=0.125 2023-11-18 19:36:45,607 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.25 vs. limit=10.0 2023-11-18 19:36:54,546 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.915e+01 9.697e+01 1.106e+02 1.523e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-18 19:36:55,628 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8650, loss[loss=0.09721, simple_loss=0.1129, pruned_loss=0.02854, audio_tagging_loss=0.01223, over 15796.00 frames. ], tot_loss[loss=0.102, simple_loss=0.1177, pruned_loss=0.03175, audio_tagging_loss=0.01134, over 3047071.59 frames. ], batch size: 58, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:37:10,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=378360.0, ans=0.1 2023-11-18 19:37:51,207 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8700, loss[loss=0.1155, simple_loss=0.1305, pruned_loss=0.03894, audio_tagging_loss=0.0113, over 16140.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1171, pruned_loss=0.03155, audio_tagging_loss=0.01138, over 3052343.82 frames. ], batch size: 59, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:38:18,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=378760.0, ans=0.5 2023-11-18 19:38:21,368 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.13 vs. limit=10.0 2023-11-18 19:38:44,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=378893.3333333333, ans=0.125 2023-11-18 19:38:46,594 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.813e+01 9.292e+01 1.037e+02 1.144e+02 1.707e+02, threshold=2.074e+02, percent-clipped=0.0 2023-11-18 19:38:47,728 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8750, loss[loss=0.06549, simple_loss=0.06744, pruned_loss=0.01697, audio_tagging_loss=0.0148, over 15133.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1186, pruned_loss=0.03214, audio_tagging_loss=0.01138, over 3053146.88 frames. ], batch size: 58, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:38:48,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=378960.0, ans=0.2 2023-11-18 19:38:52,770 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=378960.0, ans=0.0 2023-11-18 19:39:00,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=379026.6666666667, ans=0.0 2023-11-18 19:39:03,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=379026.6666666667, ans=0.2 2023-11-18 19:39:34,014 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=379226.6666666667, ans=0.125 2023-11-18 19:39:42,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=379293.3333333333, ans=0.125 2023-11-18 19:39:43,197 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8800, loss[loss=0.1585, simple_loss=0.1793, pruned_loss=0.05861, audio_tagging_loss=0.0102, over 16537.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1183, pruned_loss=0.03212, audio_tagging_loss=0.01156, over 3054329.71 frames. ], batch size: 57, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:39:51,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=379293.3333333333, ans=0.125 2023-11-18 19:40:04,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=379426.6666666667, ans=0.125 2023-11-18 19:40:21,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=379493.3333333333, ans=0.2 2023-11-18 19:40:22,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=379493.3333333333, ans=0.2 2023-11-18 19:40:28,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=379560.0, ans=0.125 2023-11-18 19:40:34,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=379560.0, ans=0.2 2023-11-18 19:40:37,500 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 9.218e+01 1.050e+02 1.133e+02 1.971e+02, threshold=2.101e+02, percent-clipped=0.0 2023-11-18 19:40:38,553 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8850, loss[loss=0.1471, simple_loss=0.1797, pruned_loss=0.0504, audio_tagging_loss=0.006895, over 15546.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.118, pruned_loss=0.03228, audio_tagging_loss=0.01152, over 3051094.65 frames. ], batch size: 55, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:40:40,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=379626.6666666667, ans=0.07 2023-11-18 19:40:41,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=379626.6666666667, ans=0.035 2023-11-18 19:40:47,040 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:40:52,622 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.76 vs. limit=10.0 2023-11-18 19:40:54,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=379693.3333333333, ans=0.2 2023-11-18 19:41:00,190 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.64 vs. limit=22.5 2023-11-18 19:41:22,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=379893.3333333333, ans=0.5 2023-11-18 19:41:32,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=379960.0, ans=0.025 2023-11-18 19:41:32,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=379960.0, ans=0.125 2023-11-18 19:41:33,622 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8900, loss[loss=0.06892, simple_loss=0.07878, pruned_loss=0.01785, audio_tagging_loss=0.01168, over 14853.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1183, pruned_loss=0.03239, audio_tagging_loss=0.01132, over 3047308.79 frames. ], batch size: 59, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:41:33,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=379960.0, ans=0.125 2023-11-18 19:42:27,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=380226.6666666667, ans=0.125 2023-11-18 19:42:28,691 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 9.192e+01 1.013e+02 1.118e+02 1.605e+02, threshold=2.026e+02, percent-clipped=0.0 2023-11-18 19:42:28,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=380293.3333333333, ans=0.125 2023-11-18 19:42:29,786 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 8950, loss[loss=0.1025, simple_loss=0.1245, pruned_loss=0.02979, audio_tagging_loss=0.0105, over 15654.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1186, pruned_loss=0.03232, audio_tagging_loss=0.01118, over 3050340.78 frames. ], batch size: 57, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:42:29,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=380293.3333333333, ans=0.125 2023-11-18 19:42:39,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=380360.0, ans=0.125 2023-11-18 19:42:45,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=380360.0, ans=0.0 2023-11-18 19:43:00,050 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=12.0 2023-11-18 19:43:17,096 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=12.0 2023-11-18 19:43:20,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=380560.0, ans=0.0 2023-11-18 19:43:23,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=380560.0, ans=0.0 2023-11-18 19:43:25,558 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9000, loss[loss=0.09941, simple_loss=0.107, pruned_loss=0.03517, audio_tagging_loss=0.01073, over 14579.00 frames. ], tot_loss[loss=0.1024, simple_loss=0.1182, pruned_loss=0.0321, audio_tagging_loss=0.01121, over 3050115.02 frames. ], batch size: 57, lr: 1.32e-02, grad_scale: 16.0 2023-11-18 19:43:25,558 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 19:43:58,401 INFO [train_asr.py:1147] (2/4) Epoch 5, validation: loss=0.07332, simple_loss=0.06001, pruned_loss=0.008857, audio_tagging_loss=0.03446, over 4681554.00 frames. 2023-11-18 19:43:58,402 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 19:44:04,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=380626.6666666667, ans=0.2 2023-11-18 19:44:04,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=380626.6666666667, ans=0.0 2023-11-18 19:44:08,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=380693.3333333333, ans=0.015 2023-11-18 19:44:26,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=380760.0, ans=0.125 2023-11-18 19:44:26,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=380760.0, ans=0.1 2023-11-18 19:44:54,106 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 9.240e+01 1.024e+02 1.108e+02 1.437e+02, threshold=2.047e+02, percent-clipped=0.0 2023-11-18 19:44:54,132 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9050, loss[loss=0.08475, simple_loss=0.09797, pruned_loss=0.02728, audio_tagging_loss=0.008486, over 15119.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1185, pruned_loss=0.03222, audio_tagging_loss=0.01099, over 3056109.73 frames. ], batch size: 56, lr: 1.32e-02, grad_scale: 16.0 2023-11-18 19:45:07,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=381026.6666666667, ans=0.0 2023-11-18 19:45:49,532 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9100, loss[loss=0.114, simple_loss=0.14, pruned_loss=0.03608, audio_tagging_loss=0.007892, over 14061.00 frames. ], tot_loss[loss=0.102, simple_loss=0.118, pruned_loss=0.03205, audio_tagging_loss=0.01091, over 3048283.59 frames. ], batch size: 54, lr: 1.32e-02, grad_scale: 16.0 2023-11-18 19:45:55,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=381293.3333333333, ans=0.125 2023-11-18 19:45:58,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=381293.3333333333, ans=0.125 2023-11-18 19:46:05,781 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.44 vs. limit=15.0 2023-11-18 19:46:28,633 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.70 vs. limit=6.0 2023-11-18 19:46:45,536 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.962e+01 1.000e+02 1.098e+02 1.318e+02, threshold=2.000e+02, percent-clipped=0.0 2023-11-18 19:46:45,565 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9150, loss[loss=0.1163, simple_loss=0.1286, pruned_loss=0.04027, audio_tagging_loss=0.01175, over 15248.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.117, pruned_loss=0.0315, audio_tagging_loss=0.01094, over 3050528.03 frames. ], batch size: 54, lr: 1.32e-02, grad_scale: 16.0 2023-11-18 19:47:36,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=381893.3333333333, ans=0.125 2023-11-18 19:47:42,468 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9200, loss[loss=0.07896, simple_loss=0.08321, pruned_loss=0.02463, audio_tagging_loss=0.01273, over 16763.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1171, pruned_loss=0.03178, audio_tagging_loss=0.01091, over 3050784.20 frames. ], batch size: 62, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:48:00,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=382026.6666666667, ans=0.1 2023-11-18 19:48:03,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=382093.3333333333, ans=0.125 2023-11-18 19:48:08,650 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2023-11-18 19:48:37,890 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 9.282e+01 1.040e+02 1.122e+02 1.499e+02, threshold=2.080e+02, percent-clipped=0.0 2023-11-18 19:48:37,918 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9250, loss[loss=0.1062, simple_loss=0.1252, pruned_loss=0.03001, audio_tagging_loss=0.0136, over 15963.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1163, pruned_loss=0.03167, audio_tagging_loss=0.01107, over 3048950.79 frames. ], batch size: 59, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:49:28,004 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:49:33,081 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9300, loss[loss=0.08127, simple_loss=0.09097, pruned_loss=0.0241, audio_tagging_loss=0.01169, over 15013.00 frames. ], tot_loss[loss=0.1003, simple_loss=0.1153, pruned_loss=0.03149, audio_tagging_loss=0.01116, over 3050671.29 frames. ], batch size: 57, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:49:33,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=382626.6666666667, ans=0.125 2023-11-18 19:49:33,626 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-11-18 19:49:44,511 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2023-11-18 19:50:13,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=382826.6666666667, ans=0.2 2023-11-18 19:50:18,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=382893.3333333333, ans=0.125 2023-11-18 19:50:28,163 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2023-11-18 19:50:28,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=382960.0, ans=0.09899494936611666 2023-11-18 19:50:29,715 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.203e+01 9.054e+01 9.801e+01 1.113e+02 1.567e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-18 19:50:29,741 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9350, loss[loss=0.1035, simple_loss=0.1207, pruned_loss=0.03232, audio_tagging_loss=0.01086, over 14614.00 frames. ], tot_loss[loss=0.09953, simple_loss=0.1144, pruned_loss=0.03113, audio_tagging_loss=0.01122, over 3049707.03 frames. ], batch size: 54, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:50:40,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=383026.6666666667, ans=0.125 2023-11-18 19:50:48,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=383026.6666666667, ans=12.0 2023-11-18 19:51:10,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=383160.0, ans=0.125 2023-11-18 19:51:25,406 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9400, loss[loss=0.08242, simple_loss=0.08786, pruned_loss=0.02226, audio_tagging_loss=0.01624, over 14381.00 frames. ], tot_loss[loss=0.09976, simple_loss=0.1145, pruned_loss=0.03113, audio_tagging_loss=0.0114, over 3052234.48 frames. ], batch size: 54, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:51:25,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=383293.3333333333, ans=0.015 2023-11-18 19:51:39,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=383360.0, ans=0.125 2023-11-18 19:51:41,527 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.20 vs. limit=6.0 2023-11-18 19:51:50,585 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=15.0 2023-11-18 19:51:51,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=383426.6666666667, ans=0.125 2023-11-18 19:52:11,848 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2023-11-18 19:52:17,530 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:52:20,607 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.864e+01 9.867e+01 1.096e+02 1.502e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 19:52:20,635 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9450, loss[loss=0.09201, simple_loss=0.0971, pruned_loss=0.02816, audio_tagging_loss=0.0153, over 15383.00 frames. ], tot_loss[loss=0.09978, simple_loss=0.1143, pruned_loss=0.03113, audio_tagging_loss=0.01151, over 3047557.62 frames. ], batch size: 60, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:52:20,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=383626.6666666667, ans=0.0 2023-11-18 19:52:48,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=383760.0, ans=0.1 2023-11-18 19:52:50,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=383760.0, ans=0.125 2023-11-18 19:53:04,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=383893.3333333333, ans=0.125 2023-11-18 19:53:07,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.40 vs. limit=15.0 2023-11-18 19:53:09,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=383893.3333333333, ans=0.125 2023-11-18 19:53:16,806 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9500, loss[loss=0.09849, simple_loss=0.1038, pruned_loss=0.03093, audio_tagging_loss=0.01567, over 15564.00 frames. ], tot_loss[loss=0.1001, simple_loss=0.1143, pruned_loss=0.03132, audio_tagging_loss=0.0116, over 3041090.31 frames. ], batch size: 59, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:53:28,018 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.46 vs. limit=15.0 2023-11-18 19:53:35,840 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.67 vs. limit=22.5 2023-11-18 19:53:42,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=384093.3333333333, ans=0.125 2023-11-18 19:53:55,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=384160.0, ans=0.2 2023-11-18 19:54:05,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=384226.6666666667, ans=0.125 2023-11-18 19:54:05,302 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:54:07,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=384226.6666666667, ans=0.0 2023-11-18 19:54:12,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=384293.3333333333, ans=0.0 2023-11-18 19:54:13,471 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 9.208e+01 1.015e+02 1.091e+02 1.477e+02, threshold=2.029e+02, percent-clipped=0.0 2023-11-18 19:54:13,497 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9550, loss[loss=0.107, simple_loss=0.1357, pruned_loss=0.03074, audio_tagging_loss=0.008381, over 15399.00 frames. ], tot_loss[loss=0.1006, simple_loss=0.1154, pruned_loss=0.03129, audio_tagging_loss=0.01164, over 3040693.61 frames. ], batch size: 56, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:54:28,554 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:54:38,179 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2023-11-18 19:55:08,364 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9600, loss[loss=0.09609, simple_loss=0.1118, pruned_loss=0.02991, audio_tagging_loss=0.01027, over 15164.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.1153, pruned_loss=0.03112, audio_tagging_loss=0.0117, over 3045946.67 frames. ], batch size: 56, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:55:30,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=384760.0, ans=0.0 2023-11-18 19:55:31,897 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2023-11-18 19:55:32,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=384760.0, ans=0.0 2023-11-18 19:55:59,862 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2023-11-18 19:56:04,712 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9650, loss[loss=0.09967, simple_loss=0.1157, pruned_loss=0.02729, audio_tagging_loss=0.01451, over 15696.00 frames. ], tot_loss[loss=0.1001, simple_loss=0.1148, pruned_loss=0.03097, audio_tagging_loss=0.01169, over 3045082.07 frames. ], batch size: 56, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:56:05,739 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.450e+01 8.741e+01 9.505e+01 1.064e+02 1.391e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-18 19:56:31,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=385093.3333333333, ans=0.2 2023-11-18 19:57:00,514 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9700, loss[loss=0.1037, simple_loss=0.1051, pruned_loss=0.03948, audio_tagging_loss=0.0117, over 14908.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1158, pruned_loss=0.0314, audio_tagging_loss=0.01137, over 3048663.46 frames. ], batch size: 57, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:57:00,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=385293.3333333333, ans=0.035 2023-11-18 19:57:11,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=385360.0, ans=0.2 2023-11-18 19:57:37,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=385493.3333333333, ans=0.1 2023-11-18 19:57:47,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=385560.0, ans=0.1 2023-11-18 19:57:47,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=385560.0, ans=0.125 2023-11-18 19:57:56,525 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9750, loss[loss=0.1065, simple_loss=0.1149, pruned_loss=0.03687, audio_tagging_loss=0.01215, over 16313.00 frames. ], tot_loss[loss=0.101, simple_loss=0.1163, pruned_loss=0.03148, audio_tagging_loss=0.01137, over 3045798.70 frames. ], batch size: 63, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:57:57,521 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.008e+01 9.015e+01 1.026e+02 1.125e+02 1.667e+02, threshold=2.051e+02, percent-clipped=0.0 2023-11-18 19:58:00,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=385626.6666666667, ans=0.0 2023-11-18 19:58:11,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=385693.3333333333, ans=0.2 2023-11-18 19:58:13,206 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2023-11-18 19:58:20,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=385760.0, ans=0.125 2023-11-18 19:58:37,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=385826.6666666667, ans=0.125 2023-11-18 19:58:42,983 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.74 vs. limit=22.5 2023-11-18 19:58:45,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=385893.3333333333, ans=0.0 2023-11-18 19:58:49,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=385893.3333333333, ans=0.125 2023-11-18 19:58:52,967 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9800, loss[loss=0.08938, simple_loss=0.1031, pruned_loss=0.02413, audio_tagging_loss=0.0137, over 15123.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.1158, pruned_loss=0.0314, audio_tagging_loss=0.01122, over 3040036.68 frames. ], batch size: 58, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:58:58,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=385960.0, ans=0.125 2023-11-18 19:59:09,906 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2023-11-18 19:59:12,198 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=15.0 2023-11-18 19:59:28,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=386160.0, ans=0.125 2023-11-18 19:59:35,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=386160.0, ans=0.035 2023-11-18 19:59:40,942 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:59:48,945 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9850, loss[loss=0.113, simple_loss=0.1415, pruned_loss=0.03504, audio_tagging_loss=0.007244, over 15830.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.1159, pruned_loss=0.03128, audio_tagging_loss=0.01128, over 3037101.98 frames. ], batch size: 57, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:59:49,998 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 9.044e+01 9.858e+01 1.082e+02 1.412e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-18 20:00:04,721 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:00:09,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=386360.0, ans=0.0 2023-11-18 20:00:37,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=386560.0, ans=0.1 2023-11-18 20:00:44,508 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9900, loss[loss=0.121, simple_loss=0.1361, pruned_loss=0.04128, audio_tagging_loss=0.01162, over 15225.00 frames. ], tot_loss[loss=0.1001, simple_loss=0.1157, pruned_loss=0.03107, audio_tagging_loss=0.01116, over 3034446.39 frames. ], batch size: 56, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:01:06,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=386760.0, ans=0.125 2023-11-18 20:01:07,306 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.43 vs. limit=22.5 2023-11-18 20:01:11,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=386760.0, ans=0.125 2023-11-18 20:01:32,733 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.43 vs. limit=15.0 2023-11-18 20:01:33,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=386893.3333333333, ans=0.125 2023-11-18 20:01:39,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=386893.3333333333, ans=0.125 2023-11-18 20:01:41,676 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 9950, loss[loss=0.1106, simple_loss=0.1232, pruned_loss=0.03772, audio_tagging_loss=0.01125, over 14636.00 frames. ], tot_loss[loss=0.09983, simple_loss=0.1152, pruned_loss=0.03104, audio_tagging_loss=0.0112, over 3035853.84 frames. ], batch size: 54, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:01:42,671 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.684e+01 9.823e+01 1.146e+02 1.516e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-18 20:01:47,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=386960.0, ans=0.125 2023-11-18 20:01:49,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=386960.0, ans=0.0 2023-11-18 20:01:52,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=387026.6666666667, ans=0.125 2023-11-18 20:01:53,964 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-18 20:01:58,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=387026.6666666667, ans=0.1 2023-11-18 20:01:58,422 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.32 vs. limit=10.0 2023-11-18 20:02:09,189 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2023-11-18 20:02:10,919 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:02:22,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=387160.0, ans=0.035 2023-11-18 20:02:24,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=387160.0, ans=0.125 2023-11-18 20:02:34,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=387226.6666666667, ans=0.125 2023-11-18 20:02:36,725 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10000, loss[loss=0.1217, simple_loss=0.1363, pruned_loss=0.0415, audio_tagging_loss=0.01202, over 15152.00 frames. ], tot_loss[loss=0.09958, simple_loss=0.1151, pruned_loss=0.0309, audio_tagging_loss=0.01113, over 3043809.86 frames. ], batch size: 54, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:02:42,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=387293.3333333333, ans=0.125 2023-11-18 20:03:18,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=387493.3333333333, ans=0.125 2023-11-18 20:03:18,828 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.95 vs. limit=10.0 2023-11-18 20:03:24,556 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.41 vs. limit=10.0 2023-11-18 20:03:27,961 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.59 vs. limit=15.0 2023-11-18 20:03:32,536 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10050, loss[loss=0.1226, simple_loss=0.1438, pruned_loss=0.04243, audio_tagging_loss=0.008248, over 14552.00 frames. ], tot_loss[loss=0.1006, simple_loss=0.1165, pruned_loss=0.03121, audio_tagging_loss=0.01115, over 3048291.88 frames. ], batch size: 54, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:03:33,529 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 9.098e+01 9.898e+01 1.122e+02 1.719e+02, threshold=1.980e+02, percent-clipped=0.0 2023-11-18 20:03:34,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=387626.6666666667, ans=0.0 2023-11-18 20:03:38,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=387626.6666666667, ans=0.0 2023-11-18 20:03:44,167 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-11-18 20:03:55,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=387760.0, ans=0.0 2023-11-18 20:04:01,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=387760.0, ans=0.125 2023-11-18 20:04:10,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=387826.6666666667, ans=0.125 2023-11-18 20:04:22,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=387893.3333333333, ans=0.1 2023-11-18 20:04:28,973 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10100, loss[loss=0.1118, simple_loss=0.1368, pruned_loss=0.03368, audio_tagging_loss=0.009707, over 15322.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1173, pruned_loss=0.03168, audio_tagging_loss=0.01113, over 3051805.31 frames. ], batch size: 55, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:04:30,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=387960.0, ans=0.125 2023-11-18 20:04:33,763 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2023-11-18 20:04:35,974 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-11-18 20:04:36,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=387960.0, ans=0.125 2023-11-18 20:04:41,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=388026.6666666667, ans=0.0 2023-11-18 20:04:47,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=388026.6666666667, ans=0.1 2023-11-18 20:04:54,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=388093.3333333333, ans=0.0 2023-11-18 20:04:54,497 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=12.0 2023-11-18 20:04:58,570 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=388093.3333333333, ans=0.125 2023-11-18 20:05:12,178 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:05:23,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=388293.3333333333, ans=0.0 2023-11-18 20:05:23,847 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10150, loss[loss=0.09955, simple_loss=0.1043, pruned_loss=0.03491, audio_tagging_loss=0.01248, over 14562.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1167, pruned_loss=0.03167, audio_tagging_loss=0.01117, over 3047574.13 frames. ], batch size: 57, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:05:24,856 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.804e+01 9.203e+01 1.000e+02 1.096e+02 2.259e+02, threshold=2.001e+02, percent-clipped=1.0 2023-11-18 20:05:30,458 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.97 vs. limit=15.0 2023-11-18 20:05:47,616 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:06:05,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=388493.3333333333, ans=0.2 2023-11-18 20:06:19,302 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10200, loss[loss=0.089, simple_loss=0.1038, pruned_loss=0.02382, audio_tagging_loss=0.0133, over 14874.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1165, pruned_loss=0.03138, audio_tagging_loss=0.01127, over 3045449.83 frames. ], batch size: 56, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:06:31,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=388693.3333333333, ans=0.125 2023-11-18 20:06:35,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=388693.3333333333, ans=0.2 2023-11-18 20:06:40,051 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:06:49,147 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2023-11-18 20:06:51,866 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.47 vs. limit=15.0 2023-11-18 20:06:56,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=388826.6666666667, ans=0.2 2023-11-18 20:07:02,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=388826.6666666667, ans=0.07 2023-11-18 20:07:05,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=388893.3333333333, ans=0.1 2023-11-18 20:07:14,929 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10250, loss[loss=0.09876, simple_loss=0.103, pruned_loss=0.03162, audio_tagging_loss=0.01565, over 15687.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.116, pruned_loss=0.03127, audio_tagging_loss=0.01144, over 3051681.78 frames. ], batch size: 61, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:07:15,955 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 9.102e+01 9.857e+01 1.065e+02 1.324e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-18 20:07:25,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=389026.6666666667, ans=0.125 2023-11-18 20:07:29,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=389026.6666666667, ans=0.125 2023-11-18 20:07:34,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=389026.6666666667, ans=0.0 2023-11-18 20:07:44,723 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.42 vs. limit=22.5 2023-11-18 20:07:45,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=389093.3333333333, ans=0.125 2023-11-18 20:07:49,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=389160.0, ans=0.125 2023-11-18 20:07:50,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=389160.0, ans=0.125 2023-11-18 20:08:11,208 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10300, loss[loss=0.106, simple_loss=0.121, pruned_loss=0.03035, audio_tagging_loss=0.01519, over 15033.00 frames. ], tot_loss[loss=0.1004, simple_loss=0.1154, pruned_loss=0.03114, audio_tagging_loss=0.01154, over 3051996.83 frames. ], batch size: 56, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:08:13,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=389293.3333333333, ans=0.2 2023-11-18 20:08:16,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=389293.3333333333, ans=0.2 2023-11-18 20:08:32,508 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=389426.6666666667, ans=0.125 2023-11-18 20:09:03,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=389560.0, ans=0.125 2023-11-18 20:09:06,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=389626.6666666667, ans=0.125 2023-11-18 20:09:07,682 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10350, loss[loss=0.1008, simple_loss=0.1185, pruned_loss=0.02804, audio_tagging_loss=0.01351, over 15329.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1167, pruned_loss=0.03147, audio_tagging_loss=0.01161, over 3062874.84 frames. ], batch size: 57, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:09:08,730 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 9.314e+01 1.056e+02 1.157e+02 1.834e+02, threshold=2.113e+02, percent-clipped=0.0 2023-11-18 20:10:02,905 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10400, loss[loss=0.08527, simple_loss=0.08726, pruned_loss=0.02525, audio_tagging_loss=0.01639, over 14814.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1165, pruned_loss=0.03152, audio_tagging_loss=0.01167, over 3056328.16 frames. ], batch size: 54, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:10:05,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=389960.0, ans=0.0 2023-11-18 20:10:09,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=389960.0, ans=0.0 2023-11-18 20:10:11,051 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.77 vs. limit=15.0 2023-11-18 20:10:30,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=390093.3333333333, ans=0.125 2023-11-18 20:10:39,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=390160.0, ans=0.0 2023-11-18 20:10:42,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=390160.0, ans=0.125 2023-11-18 20:10:52,877 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.00 vs. limit=10.0 2023-11-18 20:10:59,427 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10450, loss[loss=0.07963, simple_loss=0.08783, pruned_loss=0.02356, audio_tagging_loss=0.01216, over 14866.00 frames. ], tot_loss[loss=0.101, simple_loss=0.1161, pruned_loss=0.0313, audio_tagging_loss=0.01169, over 3052856.82 frames. ], batch size: 57, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:11:00,431 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.809e+01 9.608e+01 1.086e+02 1.646e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-18 20:11:21,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=390426.6666666667, ans=0.125 2023-11-18 20:11:45,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=390560.0, ans=0.125 2023-11-18 20:11:50,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=390560.0, ans=0.0 2023-11-18 20:11:55,614 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.66 vs. limit=15.0 2023-11-18 20:11:55,860 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10500, loss[loss=0.07961, simple_loss=0.09077, pruned_loss=0.02182, audio_tagging_loss=0.0124, over 14861.00 frames. ], tot_loss[loss=0.1002, simple_loss=0.1154, pruned_loss=0.031, audio_tagging_loss=0.01149, over 3048000.39 frames. ], batch size: 57, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:11:57,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=390626.6666666667, ans=0.0 2023-11-18 20:12:00,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=390626.6666666667, ans=0.125 2023-11-18 20:12:25,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=390760.0, ans=0.125 2023-11-18 20:12:28,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=390826.6666666667, ans=0.0 2023-11-18 20:12:41,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=390893.3333333333, ans=0.0 2023-11-18 20:12:51,616 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10550, loss[loss=0.09486, simple_loss=0.1081, pruned_loss=0.02991, audio_tagging_loss=0.01089, over 15590.00 frames. ], tot_loss[loss=0.09996, simple_loss=0.1152, pruned_loss=0.03096, audio_tagging_loss=0.01142, over 3045437.45 frames. ], batch size: 58, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:12:52,613 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.618e+01 8.716e+01 9.677e+01 1.046e+02 1.546e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-18 20:13:14,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=391093.3333333333, ans=0.0 2023-11-18 20:13:19,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=391093.3333333333, ans=0.125 2023-11-18 20:13:47,310 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10600, loss[loss=0.08289, simple_loss=0.09462, pruned_loss=0.02318, audio_tagging_loss=0.0124, over 15674.00 frames. ], tot_loss[loss=0.09987, simple_loss=0.1151, pruned_loss=0.03108, audio_tagging_loss=0.01124, over 3048987.80 frames. ], batch size: 59, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:13:48,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=391293.3333333333, ans=0.0 2023-11-18 20:13:55,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=391293.3333333333, ans=0.0 2023-11-18 20:14:04,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=391360.0, ans=0.125 2023-11-18 20:14:13,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=391426.6666666667, ans=0.2 2023-11-18 20:14:21,495 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.89 vs. limit=22.5 2023-11-18 20:14:36,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=391560.0, ans=0.125 2023-11-18 20:14:38,959 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:14:39,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=391560.0, ans=0.05 2023-11-18 20:14:42,291 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.70 vs. limit=15.0 2023-11-18 20:14:43,650 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10650, loss[loss=0.101, simple_loss=0.1148, pruned_loss=0.03163, audio_tagging_loss=0.01199, over 16641.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1161, pruned_loss=0.03153, audio_tagging_loss=0.01116, over 3044402.41 frames. ], batch size: 62, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:14:44,665 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 9.141e+01 1.015e+02 1.176e+02 1.580e+02, threshold=2.030e+02, percent-clipped=0.0 2023-11-18 20:15:07,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=391760.0, ans=0.125 2023-11-18 20:15:14,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=391760.0, ans=0.2 2023-11-18 20:15:21,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=391826.6666666667, ans=0.125 2023-11-18 20:15:30,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=391893.3333333333, ans=0.07 2023-11-18 20:15:31,911 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.33 vs. limit=15.0 2023-11-18 20:15:32,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=391893.3333333333, ans=0.0 2023-11-18 20:15:38,738 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10700, loss[loss=0.09847, simple_loss=0.1085, pruned_loss=0.03185, audio_tagging_loss=0.01238, over 16538.00 frames. ], tot_loss[loss=0.1006, simple_loss=0.1162, pruned_loss=0.03145, audio_tagging_loss=0.01109, over 3049776.59 frames. ], batch size: 63, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:15:41,440 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.31 vs. limit=15.0 2023-11-18 20:15:52,343 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.27 vs. limit=15.0 2023-11-18 20:16:09,478 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=15.0 2023-11-18 20:16:13,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=392160.0, ans=0.125 2023-11-18 20:16:22,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=392160.0, ans=0.0 2023-11-18 20:16:26,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=392226.6666666667, ans=0.2 2023-11-18 20:16:28,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=392226.6666666667, ans=0.09899494936611666 2023-11-18 20:16:35,675 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10750, loss[loss=0.1054, simple_loss=0.1308, pruned_loss=0.03299, audio_tagging_loss=0.006946, over 15757.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1169, pruned_loss=0.03166, audio_tagging_loss=0.01109, over 3047693.55 frames. ], batch size: 57, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:16:36,726 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.277e+01 9.086e+01 9.851e+01 1.129e+02 1.490e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-18 20:16:39,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=392293.3333333333, ans=0.04949747468305833 2023-11-18 20:16:41,538 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.45 vs. limit=22.5 2023-11-18 20:16:43,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=392293.3333333333, ans=0.04949747468305833 2023-11-18 20:16:50,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=392360.0, ans=0.2 2023-11-18 20:16:58,626 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.55 vs. limit=15.0 2023-11-18 20:17:11,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=392493.3333333333, ans=0.125 2023-11-18 20:17:23,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=392560.0, ans=0.1 2023-11-18 20:17:25,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=392560.0, ans=0.1 2023-11-18 20:17:28,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=392560.0, ans=0.125 2023-11-18 20:17:31,498 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10800, loss[loss=0.1051, simple_loss=0.1269, pruned_loss=0.03197, audio_tagging_loss=0.009684, over 15673.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1164, pruned_loss=0.03147, audio_tagging_loss=0.01124, over 3047563.96 frames. ], batch size: 61, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:17:35,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=392626.6666666667, ans=0.125 2023-11-18 20:18:01,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=392760.0, ans=0.125 2023-11-18 20:18:09,969 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.48 vs. limit=15.0 2023-11-18 20:18:18,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=392893.3333333333, ans=0.125 2023-11-18 20:18:27,574 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10850, loss[loss=0.07047, simple_loss=0.07726, pruned_loss=0.01853, audio_tagging_loss=0.01331, over 14942.00 frames. ], tot_loss[loss=0.101, simple_loss=0.1165, pruned_loss=0.0316, audio_tagging_loss=0.0112, over 3044567.91 frames. ], batch size: 59, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:18:28,585 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.407e+01 9.217e+01 1.010e+02 1.123e+02 1.956e+02, threshold=2.020e+02, percent-clipped=0.0 2023-11-18 20:18:31,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=392960.0, ans=0.1 2023-11-18 20:18:41,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=393026.6666666667, ans=0.05 2023-11-18 20:19:05,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=393160.0, ans=0.0 2023-11-18 20:19:18,283 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=15.0 2023-11-18 20:19:19,193 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:19:24,009 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10900, loss[loss=0.08792, simple_loss=0.1028, pruned_loss=0.02463, audio_tagging_loss=0.01187, over 14705.00 frames. ], tot_loss[loss=0.09936, simple_loss=0.1146, pruned_loss=0.03087, audio_tagging_loss=0.01121, over 3040409.22 frames. ], batch size: 56, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:19:26,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=393293.3333333333, ans=0.125 2023-11-18 20:19:28,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=393293.3333333333, ans=0.125 2023-11-18 20:19:51,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=393426.6666666667, ans=0.0 2023-11-18 20:19:53,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=393426.6666666667, ans=0.125 2023-11-18 20:19:57,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=393493.3333333333, ans=0.1 2023-11-18 20:20:13,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=393560.0, ans=0.2 2023-11-18 20:20:17,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=393560.0, ans=0.0 2023-11-18 20:20:20,068 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 10950, loss[loss=0.09933, simple_loss=0.1109, pruned_loss=0.03377, audio_tagging_loss=0.01013, over 16390.00 frames. ], tot_loss[loss=0.09902, simple_loss=0.1137, pruned_loss=0.0308, audio_tagging_loss=0.01136, over 3043766.13 frames. ], batch size: 59, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:20:21,116 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 9.174e+01 1.016e+02 1.114e+02 1.629e+02, threshold=2.031e+02, percent-clipped=0.0 2023-11-18 20:20:36,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=393693.3333333333, ans=0.125 2023-11-18 20:20:40,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=393760.0, ans=0.125 2023-11-18 20:21:02,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=393826.6666666667, ans=0.2 2023-11-18 20:21:06,378 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.40 vs. limit=15.0 2023-11-18 20:21:15,285 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11000, loss[loss=0.09091, simple_loss=0.09387, pruned_loss=0.02797, audio_tagging_loss=0.01601, over 15334.00 frames. ], tot_loss[loss=0.09988, simple_loss=0.1149, pruned_loss=0.03109, audio_tagging_loss=0.01137, over 3048552.03 frames. ], batch size: 58, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:21:21,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=393960.0, ans=0.0 2023-11-18 20:21:23,347 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:21:23,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=393960.0, ans=0.05 2023-11-18 20:21:23,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=393960.0, ans=0.07 2023-11-18 20:21:28,576 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.91 vs. limit=15.0 2023-11-18 20:21:30,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=394026.6666666667, ans=0.125 2023-11-18 20:21:57,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=394160.0, ans=0.125 2023-11-18 20:22:12,139 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11050, loss[loss=0.09099, simple_loss=0.09678, pruned_loss=0.02931, audio_tagging_loss=0.0133, over 15111.00 frames. ], tot_loss[loss=0.09953, simple_loss=0.1143, pruned_loss=0.03093, audio_tagging_loss=0.01145, over 3040291.35 frames. ], batch size: 58, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:22:13,192 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 9.478e+01 1.012e+02 1.085e+02 1.543e+02, threshold=2.025e+02, percent-clipped=0.0 2023-11-18 20:22:33,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=394426.6666666667, ans=0.0 2023-11-18 20:22:47,601 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.46 vs. limit=22.5 2023-11-18 20:22:50,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=394493.3333333333, ans=0.0 2023-11-18 20:22:55,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=394560.0, ans=0.0 2023-11-18 20:22:57,109 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=17.28 vs. limit=15.0 2023-11-18 20:23:07,218 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11100, loss[loss=0.1187, simple_loss=0.1393, pruned_loss=0.0369, audio_tagging_loss=0.01216, over 15906.00 frames. ], tot_loss[loss=0.1001, simple_loss=0.1147, pruned_loss=0.03115, audio_tagging_loss=0.01159, over 3044834.61 frames. ], batch size: 57, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:23:12,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=394626.6666666667, ans=0.0 2023-11-18 20:24:03,468 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11150, loss[loss=0.08933, simple_loss=0.1086, pruned_loss=0.02368, audio_tagging_loss=0.01136, over 16818.00 frames. ], tot_loss[loss=0.1003, simple_loss=0.1147, pruned_loss=0.03132, audio_tagging_loss=0.01162, over 3041394.05 frames. ], batch size: 63, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:24:03,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=394960.0, ans=0.125 2023-11-18 20:24:04,470 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.464e+01 9.395e+01 1.022e+02 1.169e+02 1.423e+02, threshold=2.044e+02, percent-clipped=0.0 2023-11-18 20:24:10,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=394960.0, ans=0.125 2023-11-18 20:24:10,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=394960.0, ans=0.2 2023-11-18 20:24:18,479 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=395026.6666666667, ans=0.0 2023-11-18 20:24:38,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=395160.0, ans=0.125 2023-11-18 20:24:59,122 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11200, loss[loss=0.08964, simple_loss=0.09387, pruned_loss=0.0286, audio_tagging_loss=0.0141, over 15361.00 frames. ], tot_loss[loss=0.09937, simple_loss=0.1136, pruned_loss=0.03089, audio_tagging_loss=0.01169, over 3037788.06 frames. ], batch size: 60, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:25:04,867 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-18 20:25:14,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=395360.0, ans=0.0 2023-11-18 20:25:15,702 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.78 vs. limit=15.0 2023-11-18 20:25:16,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=395360.0, ans=0.2 2023-11-18 20:25:19,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=395360.0, ans=0.1 2023-11-18 20:25:27,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=395426.6666666667, ans=0.125 2023-11-18 20:25:30,211 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.63 vs. limit=15.0 2023-11-18 20:25:34,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=395493.3333333333, ans=0.2 2023-11-18 20:25:38,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=395493.3333333333, ans=0.125 2023-11-18 20:25:40,753 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=395493.3333333333, ans=0.125 2023-11-18 20:25:55,416 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11250, loss[loss=0.1035, simple_loss=0.1131, pruned_loss=0.03358, audio_tagging_loss=0.01332, over 15133.00 frames. ], tot_loss[loss=0.09872, simple_loss=0.1131, pruned_loss=0.03063, audio_tagging_loss=0.01153, over 3038613.78 frames. ], batch size: 58, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:25:56,448 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.620e+01 9.426e+01 1.024e+02 1.146e+02 1.822e+02, threshold=2.048e+02, percent-clipped=0.0 2023-11-18 20:25:56,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=395626.6666666667, ans=0.125 2023-11-18 20:26:02,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=395626.6666666667, ans=0.125 2023-11-18 20:26:07,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=395693.3333333333, ans=0.125 2023-11-18 20:26:11,364 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.80 vs. limit=12.0 2023-11-18 20:26:17,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=395760.0, ans=0.1 2023-11-18 20:26:19,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=395760.0, ans=0.0 2023-11-18 20:26:20,455 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=395760.0, ans=0.0 2023-11-18 20:26:20,875 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.40 vs. limit=15.0 2023-11-18 20:26:21,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=395760.0, ans=0.0 2023-11-18 20:26:24,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=395760.0, ans=0.125 2023-11-18 20:26:28,526 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.73 vs. limit=22.5 2023-11-18 20:26:44,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=395893.3333333333, ans=0.2 2023-11-18 20:26:45,110 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2023-11-18 20:26:50,727 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11300, loss[loss=0.05706, simple_loss=0.06242, pruned_loss=0.01497, audio_tagging_loss=0.01088, over 14864.00 frames. ], tot_loss[loss=0.09939, simple_loss=0.1139, pruned_loss=0.03104, audio_tagging_loss=0.01143, over 3049699.61 frames. ], batch size: 59, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:26:51,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.55 vs. limit=15.0 2023-11-18 20:26:54,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=395960.0, ans=0.125 2023-11-18 20:27:05,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=396026.6666666667, ans=0.5 2023-11-18 20:27:05,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=396026.6666666667, ans=0.1 2023-11-18 20:27:10,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=396026.6666666667, ans=0.125 2023-11-18 20:27:15,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=396093.3333333333, ans=0.1 2023-11-18 20:27:31,768 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.00 vs. limit=15.0 2023-11-18 20:27:45,782 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11350, loss[loss=0.08953, simple_loss=0.106, pruned_loss=0.02389, audio_tagging_loss=0.01264, over 15993.00 frames. ], tot_loss[loss=0.1, simple_loss=0.115, pruned_loss=0.03139, audio_tagging_loss=0.01115, over 3049956.96 frames. ], batch size: 60, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:27:46,823 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.780e+01 9.361e+01 1.045e+02 1.135e+02 1.699e+02, threshold=2.091e+02, percent-clipped=0.0 2023-11-18 20:27:56,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=396360.0, ans=0.2 2023-11-18 20:28:03,414 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.44 vs. limit=12.0 2023-11-18 20:28:16,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=396426.6666666667, ans=0.125 2023-11-18 20:28:23,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=396493.3333333333, ans=0.2 2023-11-18 20:28:30,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=396560.0, ans=0.2 2023-11-18 20:28:41,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=396626.6666666667, ans=0.125 2023-11-18 20:28:42,014 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11400, loss[loss=0.07431, simple_loss=0.07995, pruned_loss=0.02257, audio_tagging_loss=0.01177, over 16052.00 frames. ], tot_loss[loss=0.09964, simple_loss=0.1144, pruned_loss=0.0312, audio_tagging_loss=0.01122, over 3049645.01 frames. ], batch size: 62, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:28:52,519 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.61 vs. limit=15.0 2023-11-18 20:28:54,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=396693.3333333333, ans=0.125 2023-11-18 20:28:55,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=396693.3333333333, ans=0.125 2023-11-18 20:28:55,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=396693.3333333333, ans=0.125 2023-11-18 20:28:57,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=396693.3333333333, ans=0.2 2023-11-18 20:29:00,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=396693.3333333333, ans=0.125 2023-11-18 20:29:14,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=396826.6666666667, ans=0.04949747468305833 2023-11-18 20:29:19,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=396826.6666666667, ans=0.125 2023-11-18 20:29:24,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=396826.6666666667, ans=0.125 2023-11-18 20:29:28,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=396893.3333333333, ans=0.125 2023-11-18 20:29:28,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=396893.3333333333, ans=0.0 2023-11-18 20:29:31,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=396893.3333333333, ans=0.0 2023-11-18 20:29:33,375 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.87 vs. limit=22.5 2023-11-18 20:29:37,079 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11450, loss[loss=0.09423, simple_loss=0.1124, pruned_loss=0.02657, audio_tagging_loss=0.01149, over 16287.00 frames. ], tot_loss[loss=0.09907, simple_loss=0.1138, pruned_loss=0.03093, audio_tagging_loss=0.01124, over 3042485.15 frames. ], batch size: 62, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:29:38,117 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.945e+01 1.000e+02 1.081e+02 1.401e+02, threshold=2.001e+02, percent-clipped=0.0 2023-11-18 20:29:39,758 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.82 vs. limit=10.0 2023-11-18 20:29:56,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=397026.6666666667, ans=0.125 2023-11-18 20:30:01,750 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2023-11-18 20:30:05,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=397093.3333333333, ans=0.2 2023-11-18 20:30:11,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=397160.0, ans=0.2 2023-11-18 20:30:18,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=397160.0, ans=0.2 2023-11-18 20:30:24,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=397226.6666666667, ans=0.2 2023-11-18 20:30:30,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=397226.6666666667, ans=0.1 2023-11-18 20:30:30,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=397226.6666666667, ans=0.125 2023-11-18 20:30:32,397 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11500, loss[loss=0.1087, simple_loss=0.1279, pruned_loss=0.03356, audio_tagging_loss=0.01116, over 15412.00 frames. ], tot_loss[loss=0.09931, simple_loss=0.114, pruned_loss=0.03114, audio_tagging_loss=0.01118, over 3043411.63 frames. ], batch size: 57, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:30:33,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=397293.3333333333, ans=0.0 2023-11-18 20:30:33,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=397293.3333333333, ans=0.0 2023-11-18 20:30:46,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=397360.0, ans=0.0 2023-11-18 20:30:49,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=397360.0, ans=0.125 2023-11-18 20:31:12,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=397493.3333333333, ans=0.0 2023-11-18 20:31:15,585 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.31 vs. limit=22.5 2023-11-18 20:31:21,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=397560.0, ans=0.125 2023-11-18 20:31:24,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=397560.0, ans=0.125 2023-11-18 20:31:29,283 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11550, loss[loss=0.08648, simple_loss=0.08892, pruned_loss=0.02801, audio_tagging_loss=0.01402, over 15656.00 frames. ], tot_loss[loss=0.1004, simple_loss=0.1154, pruned_loss=0.03159, audio_tagging_loss=0.0111, over 3050108.02 frames. ], batch size: 62, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:31:30,305 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.929e+01 8.927e+01 9.792e+01 1.098e+02 1.308e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-18 20:31:33,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=397626.6666666667, ans=0.125 2023-11-18 20:31:38,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=397626.6666666667, ans=0.0 2023-11-18 20:31:40,834 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.017e-02 2023-11-18 20:31:55,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=397760.0, ans=0.125 2023-11-18 20:31:57,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=397760.0, ans=0.2 2023-11-18 20:32:00,543 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:32:19,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=397893.3333333333, ans=0.0 2023-11-18 20:32:23,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=397893.3333333333, ans=0.0 2023-11-18 20:32:24,907 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11600, loss[loss=0.1083, simple_loss=0.1294, pruned_loss=0.03355, audio_tagging_loss=0.01009, over 15483.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1157, pruned_loss=0.03171, audio_tagging_loss=0.01112, over 3057345.02 frames. ], batch size: 56, lr: 1.29e-02, grad_scale: 64.0 2023-11-18 20:32:27,552 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.07 vs. limit=15.0 2023-11-18 20:32:58,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=398160.0, ans=0.125 2023-11-18 20:33:06,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=398160.0, ans=0.1 2023-11-18 20:33:14,240 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.56 vs. limit=6.0 2023-11-18 20:33:20,119 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11650, loss[loss=0.1001, simple_loss=0.1165, pruned_loss=0.03, audio_tagging_loss=0.01182, over 15517.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1168, pruned_loss=0.03178, audio_tagging_loss=0.01109, over 3051459.82 frames. ], batch size: 59, lr: 1.29e-02, grad_scale: 64.0 2023-11-18 20:33:21,156 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.987e+01 1.026e+02 1.150e+02 1.533e+02, threshold=2.053e+02, percent-clipped=0.0 2023-11-18 20:33:22,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=398293.3333333333, ans=0.0 2023-11-18 20:33:22,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=398293.3333333333, ans=0.2 2023-11-18 20:33:24,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=398293.3333333333, ans=0.125 2023-11-18 20:33:43,463 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.06 vs. limit=15.0 2023-11-18 20:33:49,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=398426.6666666667, ans=10.0 2023-11-18 20:34:03,937 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2023-11-18 20:34:12,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=398560.0, ans=0.0 2023-11-18 20:34:16,068 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11700, loss[loss=0.1155, simple_loss=0.1263, pruned_loss=0.03879, audio_tagging_loss=0.0136, over 15709.00 frames. ], tot_loss[loss=0.1006, simple_loss=0.1159, pruned_loss=0.0315, audio_tagging_loss=0.01119, over 3047576.57 frames. ], batch size: 59, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:34:32,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=398693.3333333333, ans=0.09899494936611666 2023-11-18 20:34:33,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=398693.3333333333, ans=0.0 2023-11-18 20:34:34,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=398693.3333333333, ans=0.125 2023-11-18 20:34:56,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=398826.6666666667, ans=0.0 2023-11-18 20:35:07,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=398893.3333333333, ans=0.04949747468305833 2023-11-18 20:35:10,233 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.31 vs. limit=15.0 2023-11-18 20:35:12,935 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11750, loss[loss=0.1211, simple_loss=0.1403, pruned_loss=0.0392, audio_tagging_loss=0.01177, over 15146.00 frames. ], tot_loss[loss=0.1011, simple_loss=0.1162, pruned_loss=0.03174, audio_tagging_loss=0.01128, over 3045825.36 frames. ], batch size: 56, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:35:14,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=398960.0, ans=0.2 2023-11-18 20:35:15,034 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.870e+01 9.922e+01 1.106e+02 1.477e+02, threshold=1.984e+02, percent-clipped=0.0 2023-11-18 20:35:37,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=399093.3333333333, ans=0.1 2023-11-18 20:35:38,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=399093.3333333333, ans=0.05 2023-11-18 20:36:00,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=399226.6666666667, ans=0.0 2023-11-18 20:36:00,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=399226.6666666667, ans=0.125 2023-11-18 20:36:03,298 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.23 vs. limit=15.0 2023-11-18 20:36:08,085 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11800, loss[loss=0.1011, simple_loss=0.1092, pruned_loss=0.0359, audio_tagging_loss=0.01062, over 14729.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1157, pruned_loss=0.03161, audio_tagging_loss=0.01122, over 3038296.20 frames. ], batch size: 58, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:36:08,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=399293.3333333333, ans=0.09899494936611666 2023-11-18 20:36:17,465 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.79 vs. limit=15.0 2023-11-18 20:36:35,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=399426.6666666667, ans=0.125 2023-11-18 20:36:36,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=399426.6666666667, ans=0.0 2023-11-18 20:36:52,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=399560.0, ans=0.125 2023-11-18 20:37:04,186 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11850, loss[loss=0.1114, simple_loss=0.1244, pruned_loss=0.0376, audio_tagging_loss=0.01162, over 15038.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1162, pruned_loss=0.03179, audio_tagging_loss=0.01128, over 3040007.26 frames. ], batch size: 56, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:37:06,259 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.830e+01 9.778e+01 1.086e+02 1.428e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-18 20:37:37,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=399826.6666666667, ans=0.125 2023-11-18 20:37:58,854 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11900, loss[loss=0.1146, simple_loss=0.1343, pruned_loss=0.03518, audio_tagging_loss=0.01229, over 16772.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1168, pruned_loss=0.03155, audio_tagging_loss=0.01144, over 3039858.21 frames. ], batch size: 63, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:37:59,535 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=22.5 2023-11-18 20:38:43,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=400160.0, ans=0.125 2023-11-18 20:38:43,870 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.035e-03 2023-11-18 20:38:46,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=400226.6666666667, ans=0.125 2023-11-18 20:38:56,559 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 11950, loss[loss=0.1182, simple_loss=0.132, pruned_loss=0.04075, audio_tagging_loss=0.01147, over 16938.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1159, pruned_loss=0.0313, audio_tagging_loss=0.01159, over 3043810.39 frames. ], batch size: 62, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:38:58,616 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.199e+01 8.829e+01 9.865e+01 1.129e+02 1.573e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 20:39:23,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=400426.6666666667, ans=0.125 2023-11-18 20:39:29,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=400493.3333333333, ans=0.025 2023-11-18 20:39:33,265 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.38 vs. limit=15.0 2023-11-18 20:39:40,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=400560.0, ans=0.125 2023-11-18 20:39:50,211 INFO [train_asr.py:1115] (2/4) Epoch 5, batch 12000, loss[loss=0.1164, simple_loss=0.1376, pruned_loss=0.03774, audio_tagging_loss=0.009857, over 15477.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1169, pruned_loss=0.03155, audio_tagging_loss=0.0115, over 3047341.40 frames. ], batch size: 56, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:39:50,212 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 20:40:23,265 INFO [train_asr.py:1147] (2/4) Epoch 5, validation: loss=0.07195, simple_loss=0.05986, pruned_loss=0.008725, audio_tagging_loss=0.0333, over 4681554.00 frames. 2023-11-18 20:40:23,266 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 20:40:28,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=400626.6666666667, ans=0.125 2023-11-18 20:41:23,805 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 0, loss[loss=0.1503, simple_loss=0.1647, pruned_loss=0.04583, audio_tagging_loss=0.02205, over 15705.00 frames. ], tot_loss[loss=0.1503, simple_loss=0.1647, pruned_loss=0.04583, audio_tagging_loss=0.02205, over 15705.00 frames. ], batch size: 58, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:41:23,806 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 20:41:55,539 INFO [train_asr.py:1147] (2/4) Epoch 6, validation: loss=0.07069, simple_loss=0.05989, pruned_loss=0.008764, audio_tagging_loss=0.03198, over 4681554.00 frames. 2023-11-18 20:41:55,540 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 20:42:00,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=400780.0, ans=0.0 2023-11-18 20:42:17,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=400913.3333333333, ans=0.0 2023-11-18 20:42:26,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=400913.3333333333, ans=0.1 2023-11-18 20:42:27,004 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.778e+01 9.356e+01 1.020e+02 1.152e+02 1.600e+02, threshold=2.040e+02, percent-clipped=0.0 2023-11-18 20:42:27,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=400980.0, ans=0.2 2023-11-18 20:42:29,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=400980.0, ans=0.0 2023-11-18 20:42:30,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=400980.0, ans=0.125 2023-11-18 20:42:47,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=401046.6666666667, ans=0.1 2023-11-18 20:42:50,313 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 50, loss[loss=0.09785, simple_loss=0.1146, pruned_loss=0.02396, audio_tagging_loss=0.0166, over 15852.00 frames. ], tot_loss[loss=0.1093, simple_loss=0.1139, pruned_loss=0.03069, audio_tagging_loss=0.02163, over 694438.53 frames. ], batch size: 57, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:43:00,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=401113.3333333333, ans=0.125 2023-11-18 20:43:05,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=401180.0, ans=0.125 2023-11-18 20:43:07,756 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.100e-03 2023-11-18 20:43:13,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=401246.6666666667, ans=0.1 2023-11-18 20:43:35,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=401380.0, ans=0.125 2023-11-18 20:43:47,270 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 100, loss[loss=0.09819, simple_loss=0.1138, pruned_loss=0.0246, audio_tagging_loss=0.01671, over 16066.00 frames. ], tot_loss[loss=0.1085, simple_loss=0.1147, pruned_loss=0.03049, audio_tagging_loss=0.0207, over 1209505.55 frames. ], batch size: 57, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:44:14,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=401580.0, ans=0.2 2023-11-18 20:44:16,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=401580.0, ans=0.0 2023-11-18 20:44:19,276 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 9.166e+01 9.950e+01 1.092e+02 1.419e+02, threshold=1.990e+02, percent-clipped=0.0 2023-11-18 20:44:23,503 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.05 vs. limit=15.0 2023-11-18 20:44:26,089 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.18 vs. limit=12.0 2023-11-18 20:44:28,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=401646.6666666667, ans=0.125 2023-11-18 20:44:35,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=401713.3333333333, ans=0.0 2023-11-18 20:44:43,062 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 150, loss[loss=0.07507, simple_loss=0.07726, pruned_loss=0.01853, audio_tagging_loss=0.01791, over 15828.00 frames. ], tot_loss[loss=0.106, simple_loss=0.1143, pruned_loss=0.03028, audio_tagging_loss=0.01861, over 1614269.75 frames. ], batch size: 60, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:44:59,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=401846.6666666667, ans=0.125 2023-11-18 20:45:11,629 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:45:24,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=401980.0, ans=0.0 2023-11-18 20:45:28,434 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=22.5 2023-11-18 20:45:30,302 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.268e-02 2023-11-18 20:45:35,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=402046.6666666667, ans=0.05 2023-11-18 20:45:39,164 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 200, loss[loss=0.08212, simple_loss=0.09147, pruned_loss=0.02086, audio_tagging_loss=0.01552, over 14869.00 frames. ], tot_loss[loss=0.1047, simple_loss=0.1151, pruned_loss=0.03066, audio_tagging_loss=0.01653, over 1936031.57 frames. ], batch size: 54, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:45:43,860 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.88 vs. limit=22.5 2023-11-18 20:45:53,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=402180.0, ans=0.1 2023-11-18 20:46:01,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=402246.6666666667, ans=0.125 2023-11-18 20:46:11,549 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.743e+01 9.007e+01 1.004e+02 1.088e+02 1.464e+02, threshold=2.009e+02, percent-clipped=0.0 2023-11-18 20:46:16,936 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.72 vs. limit=22.5 2023-11-18 20:46:17,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=402313.3333333333, ans=0.125 2023-11-18 20:46:19,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=402313.3333333333, ans=0.5 2023-11-18 20:46:26,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=402380.0, ans=0.0 2023-11-18 20:46:28,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=402380.0, ans=0.2 2023-11-18 20:46:31,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=402380.0, ans=0.0 2023-11-18 20:46:35,564 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 250, loss[loss=0.08825, simple_loss=0.1014, pruned_loss=0.02475, audio_tagging_loss=0.01278, over 15451.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1149, pruned_loss=0.03058, audio_tagging_loss=0.01484, over 2190952.69 frames. ], batch size: 59, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:46:35,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=402446.6666666667, ans=0.2 2023-11-18 20:47:01,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=402580.0, ans=0.0 2023-11-18 20:47:05,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=402580.0, ans=0.0 2023-11-18 20:47:19,840 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:47:22,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=402713.3333333333, ans=0.125 2023-11-18 20:47:31,866 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 300, loss[loss=0.09885, simple_loss=0.1199, pruned_loss=0.02976, audio_tagging_loss=0.009165, over 15518.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1154, pruned_loss=0.03066, audio_tagging_loss=0.01384, over 2386518.70 frames. ], batch size: 57, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:47:32,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=402780.0, ans=0.0 2023-11-18 20:47:50,121 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.86 vs. limit=15.0 2023-11-18 20:48:03,988 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.203e+01 9.372e+01 1.051e+02 1.173e+02 1.706e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-18 20:48:23,327 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2023-11-18 20:48:26,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=403113.3333333333, ans=0.125 2023-11-18 20:48:27,625 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 350, loss[loss=0.09968, simple_loss=0.116, pruned_loss=0.03305, audio_tagging_loss=0.008622, over 15182.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1167, pruned_loss=0.03087, audio_tagging_loss=0.01288, over 2535746.52 frames. ], batch size: 56, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:48:34,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=403113.3333333333, ans=0.1 2023-11-18 20:48:49,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=403246.6666666667, ans=0.125 2023-11-18 20:48:58,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=403246.6666666667, ans=0.125 2023-11-18 20:49:11,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=403380.0, ans=0.1 2023-11-18 20:49:23,918 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 400, loss[loss=0.08918, simple_loss=0.1047, pruned_loss=0.0257, audio_tagging_loss=0.01113, over 15435.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.116, pruned_loss=0.03064, audio_tagging_loss=0.01256, over 2654857.54 frames. ], batch size: 57, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:49:26,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=403446.6666666667, ans=0.125 2023-11-18 20:49:38,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=403513.3333333333, ans=0.125 2023-11-18 20:49:39,821 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.01 vs. limit=15.0 2023-11-18 20:49:42,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=403513.3333333333, ans=0.125 2023-11-18 20:49:55,726 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.946e+01 9.366e+01 1.079e+02 1.287e+02 1.849e+02, threshold=2.157e+02, percent-clipped=0.0 2023-11-18 20:50:09,814 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.68 vs. limit=15.0 2023-11-18 20:50:19,401 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 450, loss[loss=0.102, simple_loss=0.1195, pruned_loss=0.032, audio_tagging_loss=0.01022, over 15241.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.1154, pruned_loss=0.03053, audio_tagging_loss=0.01222, over 2741666.74 frames. ], batch size: 57, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:50:40,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=403913.3333333333, ans=0.2 2023-11-18 20:50:46,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=403913.3333333333, ans=0.95 2023-11-18 20:50:52,096 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:50:57,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=403980.0, ans=0.125 2023-11-18 20:51:06,126 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.02 vs. limit=15.0 2023-11-18 20:51:09,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=404046.6666666667, ans=0.2 2023-11-18 20:51:15,204 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 500, loss[loss=0.09019, simple_loss=0.1067, pruned_loss=0.02714, audio_tagging_loss=0.00972, over 15685.00 frames. ], tot_loss[loss=0.09984, simple_loss=0.1148, pruned_loss=0.0304, audio_tagging_loss=0.01205, over 2819204.39 frames. ], batch size: 60, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:51:24,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=404113.3333333333, ans=0.05 2023-11-18 20:51:47,903 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.724e+01 9.545e+01 1.075e+02 1.901e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-18 20:51:51,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=404313.3333333333, ans=0.125 2023-11-18 20:51:52,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=404313.3333333333, ans=0.035 2023-11-18 20:52:11,357 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 550, loss[loss=0.09777, simple_loss=0.1055, pruned_loss=0.03158, audio_tagging_loss=0.01346, over 14168.00 frames. ], tot_loss[loss=0.09904, simple_loss=0.1138, pruned_loss=0.03012, audio_tagging_loss=0.01199, over 2861068.15 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:52:18,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=404446.6666666667, ans=0.0 2023-11-18 20:52:18,870 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.08 vs. limit=15.0 2023-11-18 20:52:27,019 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.94 vs. limit=10.0 2023-11-18 20:52:33,353 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2023-11-18 20:52:41,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=404580.0, ans=0.1 2023-11-18 20:52:47,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=404646.6666666667, ans=0.125 2023-11-18 20:53:06,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=404780.0, ans=0.125 2023-11-18 20:53:07,244 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 600, loss[loss=0.1016, simple_loss=0.1229, pruned_loss=0.02852, audio_tagging_loss=0.01158, over 16534.00 frames. ], tot_loss[loss=0.09922, simple_loss=0.1145, pruned_loss=0.03025, audio_tagging_loss=0.01172, over 2901927.86 frames. ], batch size: 62, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:53:09,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=404780.0, ans=0.125 2023-11-18 20:53:18,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=404846.6666666667, ans=0.125 2023-11-18 20:53:24,209 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.87 vs. limit=22.5 2023-11-18 20:53:32,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=404913.3333333333, ans=0.0 2023-11-18 20:53:35,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=404913.3333333333, ans=0.0 2023-11-18 20:53:36,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=404913.3333333333, ans=0.0 2023-11-18 20:53:40,229 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.079e+01 8.597e+01 9.522e+01 1.046e+02 1.696e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-18 20:54:03,255 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 650, loss[loss=0.1048, simple_loss=0.1141, pruned_loss=0.03555, audio_tagging_loss=0.01215, over 15929.00 frames. ], tot_loss[loss=0.09872, simple_loss=0.114, pruned_loss=0.03007, audio_tagging_loss=0.01165, over 2933586.31 frames. ], batch size: 62, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:54:28,877 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.93 vs. limit=15.0 2023-11-18 20:54:45,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=405313.3333333333, ans=0.125 2023-11-18 20:54:59,362 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 700, loss[loss=0.07379, simple_loss=0.08383, pruned_loss=0.01916, audio_tagging_loss=0.01272, over 14503.00 frames. ], tot_loss[loss=0.1, simple_loss=0.116, pruned_loss=0.0305, audio_tagging_loss=0.01152, over 2960684.39 frames. ], batch size: 55, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:55:00,037 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.73 vs. limit=15.0 2023-11-18 20:55:05,296 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.70 vs. limit=22.5 2023-11-18 20:55:10,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=405513.3333333333, ans=0.1 2023-11-18 20:55:21,897 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.68 vs. limit=22.5 2023-11-18 20:55:25,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=405580.0, ans=0.0 2023-11-18 20:55:31,762 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.367e+01 9.330e+01 1.028e+02 1.121e+02 2.477e+02, threshold=2.056e+02, percent-clipped=1.0 2023-11-18 20:55:39,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=405646.6666666667, ans=0.0 2023-11-18 20:55:39,430 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:55:40,771 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.83 vs. limit=22.5 2023-11-18 20:55:55,653 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 750, loss[loss=0.1231, simple_loss=0.1479, pruned_loss=0.04055, audio_tagging_loss=0.008559, over 14755.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1168, pruned_loss=0.03085, audio_tagging_loss=0.01147, over 2982388.92 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:56:14,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=405846.6666666667, ans=0.125 2023-11-18 20:56:21,354 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.64 vs. limit=22.5 2023-11-18 20:56:46,454 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.46 vs. limit=6.0 2023-11-18 20:56:47,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=406046.6666666667, ans=0.0 2023-11-18 20:56:51,385 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 800, loss[loss=0.0585, simple_loss=0.07006, pruned_loss=0.01336, audio_tagging_loss=0.01011, over 14113.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.1165, pruned_loss=0.03067, audio_tagging_loss=0.01161, over 3007344.24 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:57:02,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=406180.0, ans=0.1 2023-11-18 20:57:05,829 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.19 vs. limit=15.0 2023-11-18 20:57:11,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=406180.0, ans=0.125 2023-11-18 20:57:17,019 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.89 vs. limit=22.5 2023-11-18 20:57:19,185 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2023-11-18 20:57:24,275 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 9.553e+01 1.008e+02 1.085e+02 1.896e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-18 20:57:46,235 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.42 vs. limit=15.0 2023-11-18 20:57:46,586 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 850, loss[loss=0.1052, simple_loss=0.1159, pruned_loss=0.0326, audio_tagging_loss=0.01464, over 15082.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1167, pruned_loss=0.03092, audio_tagging_loss=0.01168, over 3015077.35 frames. ], batch size: 58, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:58:02,336 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.20 vs. limit=15.0 2023-11-18 20:58:22,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=406646.6666666667, ans=0.05 2023-11-18 20:58:28,509 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.73 vs. limit=15.0 2023-11-18 20:58:43,502 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 900, loss[loss=0.0663, simple_loss=0.06577, pruned_loss=0.01894, audio_tagging_loss=0.01448, over 15365.00 frames. ], tot_loss[loss=0.1008, simple_loss=0.1165, pruned_loss=0.03087, audio_tagging_loss=0.01167, over 3024102.51 frames. ], batch size: 59, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:59:00,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=406846.6666666667, ans=0.2 2023-11-18 20:59:04,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=406913.3333333333, ans=0.1 2023-11-18 20:59:15,307 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.061e+01 8.915e+01 9.624e+01 1.067e+02 1.384e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-18 20:59:15,912 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.51 vs. limit=15.0 2023-11-18 20:59:21,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=406980.0, ans=0.2 2023-11-18 20:59:24,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=406980.0, ans=0.0 2023-11-18 20:59:31,965 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.60 vs. limit=15.0 2023-11-18 20:59:36,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=407046.6666666667, ans=0.0 2023-11-18 20:59:39,121 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 950, loss[loss=0.1199, simple_loss=0.1436, pruned_loss=0.04247, audio_tagging_loss=0.005659, over 15359.00 frames. ], tot_loss[loss=0.09994, simple_loss=0.1156, pruned_loss=0.03062, audio_tagging_loss=0.0115, over 3030293.39 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:59:44,719 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.25 vs. limit=15.0 2023-11-18 20:59:45,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=407113.3333333333, ans=0.125 2023-11-18 20:59:53,251 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.82 vs. limit=15.0 2023-11-18 20:59:56,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=407180.0, ans=0.0 2023-11-18 21:00:07,119 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2023-11-18 21:00:11,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=407313.3333333333, ans=0.125 2023-11-18 21:00:20,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=407313.3333333333, ans=0.125 2023-11-18 21:00:22,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=407380.0, ans=0.0 2023-11-18 21:00:30,626 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.76 vs. limit=15.0 2023-11-18 21:00:34,303 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1000, loss[loss=0.09094, simple_loss=0.1065, pruned_loss=0.02905, audio_tagging_loss=0.008622, over 16045.00 frames. ], tot_loss[loss=0.09974, simple_loss=0.1161, pruned_loss=0.03051, audio_tagging_loss=0.01119, over 3036337.44 frames. ], batch size: 60, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:00:39,879 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.13 vs. limit=15.0 2023-11-18 21:00:50,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=407513.3333333333, ans=0.125 2023-11-18 21:00:56,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=407580.0, ans=0.125 2023-11-18 21:00:58,789 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:01:04,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=407580.0, ans=0.1 2023-11-18 21:01:07,163 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 8.744e+01 1.004e+02 1.144e+02 1.885e+02, threshold=2.008e+02, percent-clipped=0.0 2023-11-18 21:01:08,987 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.52 vs. limit=22.5 2023-11-18 21:01:12,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=407646.6666666667, ans=0.1 2023-11-18 21:01:18,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=407713.3333333333, ans=0.2 2023-11-18 21:01:21,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=407713.3333333333, ans=0.04949747468305833 2023-11-18 21:01:26,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=407713.3333333333, ans=0.95 2023-11-18 21:01:30,883 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1050, loss[loss=0.1316, simple_loss=0.1618, pruned_loss=0.04278, audio_tagging_loss=0.007918, over 15401.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.1167, pruned_loss=0.03094, audio_tagging_loss=0.01117, over 3031767.04 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:01:51,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=407846.6666666667, ans=0.125 2023-11-18 21:02:23,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=408046.6666666667, ans=0.125 2023-11-18 21:02:27,543 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1100, loss[loss=0.08327, simple_loss=0.08969, pruned_loss=0.02404, audio_tagging_loss=0.01438, over 15765.00 frames. ], tot_loss[loss=0.0999, simple_loss=0.1158, pruned_loss=0.03089, audio_tagging_loss=0.01113, over 3033398.64 frames. ], batch size: 60, lr: 1.19e-02, grad_scale: 16.0 2023-11-18 21:02:29,727 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:02:30,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=408113.3333333333, ans=0.0 2023-11-18 21:02:35,524 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.40 vs. limit=15.0 2023-11-18 21:02:39,677 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2023-11-18 21:02:47,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=408180.0, ans=0.07 2023-11-18 21:02:50,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=408246.6666666667, ans=0.2 2023-11-18 21:02:57,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=408246.6666666667, ans=0.125 2023-11-18 21:02:57,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=408246.6666666667, ans=0.125 2023-11-18 21:03:00,441 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.573e+01 9.716e+01 1.058e+02 1.424e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-18 21:03:13,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=408380.0, ans=0.125 2023-11-18 21:03:13,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=408380.0, ans=0.125 2023-11-18 21:03:22,694 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1150, loss[loss=0.08914, simple_loss=0.09875, pruned_loss=0.02518, audio_tagging_loss=0.01459, over 15323.00 frames. ], tot_loss[loss=0.09986, simple_loss=0.1155, pruned_loss=0.03094, audio_tagging_loss=0.01116, over 3035012.43 frames. ], batch size: 58, lr: 1.19e-02, grad_scale: 16.0 2023-11-18 21:03:33,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=408513.3333333333, ans=0.125 2023-11-18 21:03:56,332 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.34 vs. limit=10.0 2023-11-18 21:04:05,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=408646.6666666667, ans=0.1 2023-11-18 21:04:19,253 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1200, loss[loss=0.08095, simple_loss=0.08366, pruned_loss=0.02506, audio_tagging_loss=0.01406, over 13632.00 frames. ], tot_loss[loss=0.0995, simple_loss=0.115, pruned_loss=0.03085, audio_tagging_loss=0.01114, over 3035311.05 frames. ], batch size: 53, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:04:23,273 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.10 vs. limit=15.0 2023-11-18 21:04:33,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=408846.6666666667, ans=0.0 2023-11-18 21:04:35,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=408846.6666666667, ans=0.0 2023-11-18 21:04:44,005 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:04:52,322 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 9.018e+01 9.709e+01 1.057e+02 1.336e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-18 21:05:15,226 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1250, loss[loss=0.0754, simple_loss=0.08254, pruned_loss=0.02412, audio_tagging_loss=0.01001, over 14775.00 frames. ], tot_loss[loss=0.0991, simple_loss=0.1148, pruned_loss=0.03066, audio_tagging_loss=0.01106, over 3034158.78 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:05:24,592 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=409113.3333333333, ans=0.125 2023-11-18 21:06:11,359 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1300, loss[loss=0.1288, simple_loss=0.1568, pruned_loss=0.04195, audio_tagging_loss=0.008456, over 15805.00 frames. ], tot_loss[loss=0.09833, simple_loss=0.1139, pruned_loss=0.03027, audio_tagging_loss=0.01111, over 3037200.61 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:06:17,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=409446.6666666667, ans=0.125 2023-11-18 21:06:19,795 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.41 vs. limit=15.0 2023-11-18 21:06:29,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=409513.3333333333, ans=0.0 2023-11-18 21:06:38,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=409580.0, ans=0.2 2023-11-18 21:06:45,371 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.886e+01 9.349e+01 1.016e+02 1.502e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-18 21:06:58,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=409713.3333333333, ans=0.5 2023-11-18 21:07:00,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=409713.3333333333, ans=0.0 2023-11-18 21:07:07,808 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1350, loss[loss=0.06317, simple_loss=0.07685, pruned_loss=0.01361, audio_tagging_loss=0.01115, over 15355.00 frames. ], tot_loss[loss=0.09866, simple_loss=0.1142, pruned_loss=0.03039, audio_tagging_loss=0.01115, over 3036639.93 frames. ], batch size: 58, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:07:11,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=409780.0, ans=15.0 2023-11-18 21:07:15,889 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.86 vs. limit=15.0 2023-11-18 21:07:16,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=409780.0, ans=0.04949747468305833 2023-11-18 21:07:16,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.whiten.whitening_limit, batch_count=409780.0, ans=12.0 2023-11-18 21:07:31,939 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-18 21:07:32,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=409913.3333333333, ans=0.125 2023-11-18 21:07:34,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=409913.3333333333, ans=0.0 2023-11-18 21:07:36,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=409913.3333333333, ans=0.125 2023-11-18 21:07:47,873 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:07:48,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=409980.0, ans=0.125 2023-11-18 21:07:52,128 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.42 vs. limit=15.0 2023-11-18 21:07:58,454 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.37 vs. limit=15.0 2023-11-18 21:08:03,669 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1400, loss[loss=0.1015, simple_loss=0.1151, pruned_loss=0.03311, audio_tagging_loss=0.01079, over 14418.00 frames. ], tot_loss[loss=0.09885, simple_loss=0.1145, pruned_loss=0.03037, audio_tagging_loss=0.01125, over 3040336.12 frames. ], batch size: 55, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:08:19,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=410180.0, ans=0.125 2023-11-18 21:08:23,077 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.14 vs. limit=6.0 2023-11-18 21:08:37,130 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.960e+01 8.879e+01 9.810e+01 1.048e+02 1.417e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-18 21:08:45,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=410313.3333333333, ans=0.2 2023-11-18 21:08:57,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=410380.0, ans=0.025 2023-11-18 21:08:57,986 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.75 vs. limit=10.0 2023-11-18 21:08:59,563 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1450, loss[loss=0.07077, simple_loss=0.08198, pruned_loss=0.01889, audio_tagging_loss=0.01088, over 15446.00 frames. ], tot_loss[loss=0.09933, simple_loss=0.1152, pruned_loss=0.03045, audio_tagging_loss=0.01125, over 3045963.30 frames. ], batch size: 59, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:09:27,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=410580.0, ans=0.0 2023-11-18 21:09:31,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=410580.0, ans=0.09899494936611666 2023-11-18 21:09:40,599 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.49 vs. limit=15.0 2023-11-18 21:09:46,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=410713.3333333333, ans=0.07 2023-11-18 21:09:48,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=410713.3333333333, ans=0.125 2023-11-18 21:09:50,815 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.70 vs. limit=15.0 2023-11-18 21:09:56,061 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1500, loss[loss=0.1027, simple_loss=0.1219, pruned_loss=0.03007, audio_tagging_loss=0.01165, over 15337.00 frames. ], tot_loss[loss=0.1006, simple_loss=0.1168, pruned_loss=0.03098, audio_tagging_loss=0.01127, over 3045010.02 frames. ], batch size: 55, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:10:18,117 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.84 vs. limit=22.5 2023-11-18 21:10:25,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=410913.3333333333, ans=0.125 2023-11-18 21:10:28,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=410980.0, ans=0.95 2023-11-18 21:10:29,752 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.050e+01 8.852e+01 9.763e+01 1.053e+02 1.656e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-18 21:10:51,952 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1550, loss[loss=0.07902, simple_loss=0.09175, pruned_loss=0.0192, audio_tagging_loss=0.01394, over 14773.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1171, pruned_loss=0.03129, audio_tagging_loss=0.0114, over 3035044.54 frames. ], batch size: 56, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:10:56,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=411113.3333333333, ans=0.0 2023-11-18 21:11:27,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=411313.3333333333, ans=0.0 2023-11-18 21:11:37,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=411380.0, ans=0.125 2023-11-18 21:11:38,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=411380.0, ans=0.125 2023-11-18 21:11:38,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=411380.0, ans=0.125 2023-11-18 21:11:47,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=411446.6666666667, ans=0.0 2023-11-18 21:11:47,825 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1600, loss[loss=0.1089, simple_loss=0.1329, pruned_loss=0.03381, audio_tagging_loss=0.008628, over 15108.00 frames. ], tot_loss[loss=0.09995, simple_loss=0.1155, pruned_loss=0.03065, audio_tagging_loss=0.01154, over 3036304.08 frames. ], batch size: 57, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:12:17,117 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.19 vs. limit=12.0 2023-11-18 21:12:18,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=411580.0, ans=0.1 2023-11-18 21:12:19,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=411580.0, ans=0.125 2023-11-18 21:12:21,738 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 8.910e+01 9.772e+01 1.109e+02 1.512e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-18 21:12:44,078 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1650, loss[loss=0.09191, simple_loss=0.1011, pruned_loss=0.02798, audio_tagging_loss=0.01339, over 14755.00 frames. ], tot_loss[loss=0.09971, simple_loss=0.1153, pruned_loss=0.03047, audio_tagging_loss=0.01158, over 3034858.09 frames. ], batch size: 58, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:12:54,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=411846.6666666667, ans=0.0 2023-11-18 21:13:11,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=411913.3333333333, ans=0.125 2023-11-18 21:13:34,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.82 vs. limit=22.5 2023-11-18 21:13:39,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=412113.3333333333, ans=0.1 2023-11-18 21:13:39,902 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1700, loss[loss=0.09156, simple_loss=0.1099, pruned_loss=0.02672, audio_tagging_loss=0.009881, over 15762.00 frames. ], tot_loss[loss=0.09906, simple_loss=0.1146, pruned_loss=0.0301, audio_tagging_loss=0.01165, over 3042142.78 frames. ], batch size: 58, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:14:00,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=412180.0, ans=0.1 2023-11-18 21:14:07,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=412246.6666666667, ans=0.0 2023-11-18 21:14:13,792 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 9.349e+01 1.070e+02 1.315e+02 2.031e+02, threshold=2.140e+02, percent-clipped=2.0 2023-11-18 21:14:14,494 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=12.0 2023-11-18 21:14:22,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=412313.3333333333, ans=0.0 2023-11-18 21:14:35,718 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1750, loss[loss=0.08909, simple_loss=0.1088, pruned_loss=0.02162, audio_tagging_loss=0.01307, over 15020.00 frames. ], tot_loss[loss=0.0988, simple_loss=0.1144, pruned_loss=0.03006, audio_tagging_loss=0.01155, over 3048007.73 frames. ], batch size: 56, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:14:44,810 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.84 vs. limit=22.5 2023-11-18 21:15:31,631 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1800, loss[loss=0.11, simple_loss=0.119, pruned_loss=0.04082, audio_tagging_loss=0.009624, over 14771.00 frames. ], tot_loss[loss=0.09895, simple_loss=0.1147, pruned_loss=0.03023, audio_tagging_loss=0.01134, over 3050126.31 frames. ], batch size: 55, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:15:58,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=412913.3333333333, ans=0.07 2023-11-18 21:16:06,199 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.676e+01 9.052e+01 1.017e+02 1.096e+02 2.007e+02, threshold=2.033e+02, percent-clipped=0.0 2023-11-18 21:16:12,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=412980.0, ans=0.125 2023-11-18 21:16:22,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=413046.6666666667, ans=0.0 2023-11-18 21:16:27,504 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1850, loss[loss=0.1036, simple_loss=0.1162, pruned_loss=0.03255, audio_tagging_loss=0.01299, over 14477.00 frames. ], tot_loss[loss=0.09834, simple_loss=0.1142, pruned_loss=0.02992, audio_tagging_loss=0.01133, over 3042601.84 frames. ], batch size: 54, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:16:47,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=413180.0, ans=0.1 2023-11-18 21:16:55,434 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.93 vs. limit=15.0 2023-11-18 21:16:58,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=413246.6666666667, ans=0.125 2023-11-18 21:17:01,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=413313.3333333333, ans=0.0 2023-11-18 21:17:04,120 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.10 vs. limit=10.0 2023-11-18 21:17:08,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=413313.3333333333, ans=0.125 2023-11-18 21:17:23,592 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1900, loss[loss=0.1083, simple_loss=0.1296, pruned_loss=0.03358, audio_tagging_loss=0.00997, over 15418.00 frames. ], tot_loss[loss=0.09705, simple_loss=0.1128, pruned_loss=0.0294, audio_tagging_loss=0.01123, over 3040960.32 frames. ], batch size: 58, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:17:58,662 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.936e+01 9.160e+01 9.941e+01 1.091e+02 1.656e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-18 21:18:07,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=413713.3333333333, ans=0.95 2023-11-18 21:18:18,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=413780.0, ans=0.0 2023-11-18 21:18:18,829 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.26 vs. limit=15.0 2023-11-18 21:18:19,655 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 1950, loss[loss=0.1042, simple_loss=0.1201, pruned_loss=0.03536, audio_tagging_loss=0.00885, over 14985.00 frames. ], tot_loss[loss=0.09664, simple_loss=0.1123, pruned_loss=0.02934, audio_tagging_loss=0.01115, over 3033804.54 frames. ], batch size: 57, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:18:25,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=413780.0, ans=0.125 2023-11-18 21:18:28,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=413780.0, ans=0.0 2023-11-18 21:18:31,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=413846.6666666667, ans=0.1 2023-11-18 21:18:43,128 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.39 vs. limit=22.5 2023-11-18 21:18:45,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=413913.3333333333, ans=0.1 2023-11-18 21:19:00,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=413980.0, ans=0.125 2023-11-18 21:19:10,816 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.73 vs. limit=15.0 2023-11-18 21:19:15,976 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2000, loss[loss=0.1051, simple_loss=0.1327, pruned_loss=0.03074, audio_tagging_loss=0.008061, over 14958.00 frames. ], tot_loss[loss=0.09689, simple_loss=0.1127, pruned_loss=0.02943, audio_tagging_loss=0.01114, over 3039444.38 frames. ], batch size: 54, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:19:23,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=414113.3333333333, ans=0.125 2023-11-18 21:19:27,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=414180.0, ans=0.125 2023-11-18 21:19:31,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=414180.0, ans=0.125 2023-11-18 21:19:40,004 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:19:50,562 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.514e+01 8.645e+01 9.576e+01 1.020e+02 1.190e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-18 21:19:52,411 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.92 vs. limit=22.5 2023-11-18 21:19:54,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=414313.3333333333, ans=0.125 2023-11-18 21:20:00,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=414380.0, ans=0.0 2023-11-18 21:20:03,761 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.91 vs. limit=22.5 2023-11-18 21:20:08,993 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.14 vs. limit=15.0 2023-11-18 21:20:11,735 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2050, loss[loss=0.08842, simple_loss=0.09931, pruned_loss=0.02672, audio_tagging_loss=0.01205, over 17062.00 frames. ], tot_loss[loss=0.09744, simple_loss=0.1134, pruned_loss=0.0298, audio_tagging_loss=0.01093, over 3039369.41 frames. ], batch size: 68, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:20:22,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=414513.3333333333, ans=0.07 2023-11-18 21:20:40,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=414580.0, ans=0.07 2023-11-18 21:20:52,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=414646.6666666667, ans=0.125 2023-11-18 21:21:01,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=414713.3333333333, ans=0.07 2023-11-18 21:21:03,429 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.79 vs. limit=15.0 2023-11-18 21:21:04,687 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2023-11-18 21:21:07,344 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2100, loss[loss=0.1086, simple_loss=0.1344, pruned_loss=0.03215, audio_tagging_loss=0.009253, over 15590.00 frames. ], tot_loss[loss=0.09791, simple_loss=0.1138, pruned_loss=0.0301, audio_tagging_loss=0.01092, over 3035841.42 frames. ], batch size: 56, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:21:20,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=414846.6666666667, ans=0.125 2023-11-18 21:21:20,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=414846.6666666667, ans=0.09899494936611666 2023-11-18 21:21:23,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=414846.6666666667, ans=0.125 2023-11-18 21:21:28,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=414846.6666666667, ans=0.125 2023-11-18 21:21:35,302 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=414913.3333333333, ans=0.0 2023-11-18 21:21:36,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=414913.3333333333, ans=0.1 2023-11-18 21:21:40,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=414980.0, ans=0.04949747468305833 2023-11-18 21:21:42,504 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 9.113e+01 9.926e+01 1.128e+02 1.703e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-18 21:21:48,583 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.76 vs. limit=22.5 2023-11-18 21:21:51,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=415046.6666666667, ans=0.1 2023-11-18 21:21:55,380 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2023-11-18 21:22:03,817 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2150, loss[loss=0.09817, simple_loss=0.1142, pruned_loss=0.02796, audio_tagging_loss=0.01308, over 15420.00 frames. ], tot_loss[loss=0.09775, simple_loss=0.1137, pruned_loss=0.0298, audio_tagging_loss=0.01109, over 3034391.85 frames. ], batch size: 58, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:22:13,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=415113.3333333333, ans=0.0 2023-11-18 21:22:20,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=415180.0, ans=0.125 2023-11-18 21:22:24,688 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.88 vs. limit=15.0 2023-11-18 21:22:36,306 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:22:39,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=415313.3333333333, ans=0.125 2023-11-18 21:22:46,719 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2023-11-18 21:22:47,487 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=415380.0, ans=0.125 2023-11-18 21:22:51,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=415380.0, ans=0.125 2023-11-18 21:22:57,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=415380.0, ans=0.0 2023-11-18 21:22:57,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=415380.0, ans=0.2 2023-11-18 21:22:59,973 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2200, loss[loss=0.1018, simple_loss=0.1203, pruned_loss=0.0322, audio_tagging_loss=0.009474, over 14981.00 frames. ], tot_loss[loss=0.0987, simple_loss=0.1148, pruned_loss=0.03015, audio_tagging_loss=0.01114, over 3035075.51 frames. ], batch size: 56, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:23:03,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=415446.6666666667, ans=0.2 2023-11-18 21:23:04,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=415446.6666666667, ans=0.0 2023-11-18 21:23:13,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=415513.3333333333, ans=0.2 2023-11-18 21:23:24,997 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.78 vs. limit=22.5 2023-11-18 21:23:35,841 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.875e+01 9.823e+01 1.124e+02 2.816e+02, threshold=1.965e+02, percent-clipped=1.0 2023-11-18 21:23:38,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=415646.6666666667, ans=0.2 2023-11-18 21:23:47,582 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2023-11-18 21:23:51,079 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.82 vs. limit=15.0 2023-11-18 21:23:53,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=415713.3333333333, ans=0.125 2023-11-18 21:23:55,606 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2250, loss[loss=0.1081, simple_loss=0.1338, pruned_loss=0.03301, audio_tagging_loss=0.008221, over 14665.00 frames. ], tot_loss[loss=0.1001, simple_loss=0.1163, pruned_loss=0.03078, audio_tagging_loss=0.01111, over 3033292.44 frames. ], batch size: 54, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:24:27,716 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.56 vs. limit=10.0 2023-11-18 21:24:51,861 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2300, loss[loss=0.1121, simple_loss=0.13, pruned_loss=0.0374, audio_tagging_loss=0.009695, over 15064.00 frames. ], tot_loss[loss=0.09981, simple_loss=0.1159, pruned_loss=0.03076, audio_tagging_loss=0.01112, over 3038998.99 frames. ], batch size: 55, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:25:12,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=416246.6666666667, ans=0.1 2023-11-18 21:25:20,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=416246.6666666667, ans=0.0 2023-11-18 21:25:27,509 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.888e+01 8.781e+01 9.557e+01 1.037e+02 1.979e+02, threshold=1.911e+02, percent-clipped=1.0 2023-11-18 21:25:32,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=416313.3333333333, ans=0.1 2023-11-18 21:25:39,379 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:25:47,862 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2350, loss[loss=0.1246, simple_loss=0.1497, pruned_loss=0.03945, audio_tagging_loss=0.01034, over 16295.00 frames. ], tot_loss[loss=0.09942, simple_loss=0.115, pruned_loss=0.03057, audio_tagging_loss=0.01133, over 3039012.73 frames. ], batch size: 58, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:26:43,310 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2400, loss[loss=0.1023, simple_loss=0.121, pruned_loss=0.03196, audio_tagging_loss=0.00985, over 14820.00 frames. ], tot_loss[loss=0.09969, simple_loss=0.1152, pruned_loss=0.0307, audio_tagging_loss=0.01141, over 3038317.29 frames. ], batch size: 54, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:26:43,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=416780.0, ans=0.1 2023-11-18 21:26:49,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=416780.0, ans=0.125 2023-11-18 21:27:01,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=416846.6666666667, ans=0.125 2023-11-18 21:27:20,427 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.680e+01 9.662e+01 1.129e+02 1.566e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-18 21:27:22,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=416980.0, ans=0.0 2023-11-18 21:27:35,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=417046.6666666667, ans=0.0 2023-11-18 21:27:36,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=417046.6666666667, ans=0.0 2023-11-18 21:27:39,626 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2450, loss[loss=0.1321, simple_loss=0.1548, pruned_loss=0.04361, audio_tagging_loss=0.01111, over 16067.00 frames. ], tot_loss[loss=0.09986, simple_loss=0.1153, pruned_loss=0.03076, audio_tagging_loss=0.01143, over 3043623.08 frames. ], batch size: 60, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:27:50,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=417180.0, ans=0.125 2023-11-18 21:27:52,412 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.13 vs. limit=15.0 2023-11-18 21:28:13,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=417313.3333333333, ans=0.0 2023-11-18 21:28:19,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=417313.3333333333, ans=0.0 2023-11-18 21:28:30,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=417380.0, ans=0.1 2023-11-18 21:28:35,707 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2500, loss[loss=0.1051, simple_loss=0.1362, pruned_loss=0.02969, audio_tagging_loss=0.00728, over 15624.00 frames. ], tot_loss[loss=0.09989, simple_loss=0.1153, pruned_loss=0.03073, audio_tagging_loss=0.0115, over 3042869.66 frames. ], batch size: 57, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:28:41,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=417446.6666666667, ans=0.0 2023-11-18 21:28:55,101 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.01 vs. limit=15.0 2023-11-18 21:29:10,602 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.53 vs. limit=15.0 2023-11-18 21:29:12,269 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.822e+01 8.909e+01 1.012e+02 1.108e+02 1.409e+02, threshold=2.024e+02, percent-clipped=0.0 2023-11-18 21:29:16,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=417646.6666666667, ans=0.1 2023-11-18 21:29:28,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=417713.3333333333, ans=0.2 2023-11-18 21:29:31,818 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2550, loss[loss=0.08952, simple_loss=0.1063, pruned_loss=0.02402, audio_tagging_loss=0.01235, over 15971.00 frames. ], tot_loss[loss=0.09926, simple_loss=0.1149, pruned_loss=0.03047, audio_tagging_loss=0.01137, over 3042523.13 frames. ], batch size: 58, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:29:35,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=417780.0, ans=0.2 2023-11-18 21:29:48,082 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.81 vs. limit=12.0 2023-11-18 21:29:51,165 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:29:51,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=417846.6666666667, ans=0.0 2023-11-18 21:30:26,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=418046.6666666667, ans=0.1 2023-11-18 21:30:28,056 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2600, loss[loss=0.08531, simple_loss=0.1025, pruned_loss=0.02311, audio_tagging_loss=0.01096, over 15243.00 frames. ], tot_loss[loss=0.09871, simple_loss=0.1144, pruned_loss=0.03033, audio_tagging_loss=0.01119, over 3040821.43 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:30:51,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=418246.6666666667, ans=0.0 2023-11-18 21:31:04,866 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.916e+01 1.001e+02 1.139e+02 1.588e+02, threshold=2.001e+02, percent-clipped=0.0 2023-11-18 21:31:20,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=418380.0, ans=0.0 2023-11-18 21:31:22,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=418380.0, ans=0.125 2023-11-18 21:31:24,009 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2650, loss[loss=0.09812, simple_loss=0.1148, pruned_loss=0.03022, audio_tagging_loss=0.01051, over 15357.00 frames. ], tot_loss[loss=0.09863, simple_loss=0.1142, pruned_loss=0.03034, audio_tagging_loss=0.01117, over 3037685.13 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:31:55,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=418580.0, ans=0.1 2023-11-18 21:32:02,721 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.44 vs. limit=15.0 2023-11-18 21:32:11,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=418713.3333333333, ans=0.125 2023-11-18 21:32:19,693 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2700, loss[loss=0.1174, simple_loss=0.1395, pruned_loss=0.03191, audio_tagging_loss=0.01579, over 15230.00 frames. ], tot_loss[loss=0.09803, simple_loss=0.1136, pruned_loss=0.03015, audio_tagging_loss=0.01107, over 3043510.73 frames. ], batch size: 54, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:32:53,330 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.49 vs. limit=10.0 2023-11-18 21:32:56,911 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.943e+01 9.076e+01 9.942e+01 1.068e+02 1.459e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-18 21:33:16,784 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2750, loss[loss=0.1148, simple_loss=0.1446, pruned_loss=0.03657, audio_tagging_loss=0.005971, over 15650.00 frames. ], tot_loss[loss=0.09848, simple_loss=0.1141, pruned_loss=0.03044, audio_tagging_loss=0.01097, over 3052824.76 frames. ], batch size: 53, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:33:17,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=419113.3333333333, ans=0.125 2023-11-18 21:33:52,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=419313.3333333333, ans=0.1 2023-11-18 21:33:55,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=419313.3333333333, ans=0.1 2023-11-18 21:34:03,605 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:34:08,528 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=15.0 2023-11-18 21:34:12,564 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2800, loss[loss=0.08659, simple_loss=0.09734, pruned_loss=0.02528, audio_tagging_loss=0.01264, over 15685.00 frames. ], tot_loss[loss=0.0985, simple_loss=0.1143, pruned_loss=0.03039, audio_tagging_loss=0.01097, over 3050278.12 frames. ], batch size: 59, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:34:30,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=419513.3333333333, ans=0.0 2023-11-18 21:34:32,315 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=12.0 2023-11-18 21:34:34,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=419580.0, ans=0.0 2023-11-18 21:34:41,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=419580.0, ans=12.0 2023-11-18 21:34:47,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=419646.6666666667, ans=0.0 2023-11-18 21:34:49,204 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.663e+01 8.954e+01 9.859e+01 1.088e+02 1.629e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-18 21:34:56,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=419713.3333333333, ans=0.125 2023-11-18 21:35:01,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=419713.3333333333, ans=0.125 2023-11-18 21:35:07,684 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2850, loss[loss=0.1041, simple_loss=0.1263, pruned_loss=0.03063, audio_tagging_loss=0.01026, over 15456.00 frames. ], tot_loss[loss=0.09804, simple_loss=0.1136, pruned_loss=0.03015, audio_tagging_loss=0.01108, over 3041665.60 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:35:23,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=419846.6666666667, ans=10.0 2023-11-18 21:35:24,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=419846.6666666667, ans=0.125 2023-11-18 21:35:24,798 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.52 vs. limit=10.0 2023-11-18 21:35:26,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=419846.6666666667, ans=0.125 2023-11-18 21:35:36,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=419913.3333333333, ans=0.125 2023-11-18 21:35:37,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=419913.3333333333, ans=0.1 2023-11-18 21:35:55,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=420046.6666666667, ans=0.125 2023-11-18 21:36:05,320 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2900, loss[loss=0.1098, simple_loss=0.123, pruned_loss=0.03737, audio_tagging_loss=0.01098, over 15842.00 frames. ], tot_loss[loss=0.09812, simple_loss=0.1138, pruned_loss=0.03018, audio_tagging_loss=0.01101, over 3048926.12 frames. ], batch size: 59, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:36:11,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=420113.3333333333, ans=0.125 2023-11-18 21:36:40,898 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.758e+01 9.574e+01 1.055e+02 1.297e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-18 21:36:44,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=420313.3333333333, ans=0.1 2023-11-18 21:36:53,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=420380.0, ans=0.0 2023-11-18 21:37:00,110 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 2950, loss[loss=0.07234, simple_loss=0.08395, pruned_loss=0.02042, audio_tagging_loss=0.009948, over 14765.00 frames. ], tot_loss[loss=0.09764, simple_loss=0.1133, pruned_loss=0.02991, audio_tagging_loss=0.01106, over 3045763.92 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:37:02,182 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.53 vs. limit=22.5 2023-11-18 21:37:10,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=420513.3333333333, ans=0.125 2023-11-18 21:37:23,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=420580.0, ans=0.125 2023-11-18 21:37:41,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=420646.6666666667, ans=0.2 2023-11-18 21:37:42,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=420646.6666666667, ans=0.015 2023-11-18 21:37:46,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=420713.3333333333, ans=0.0 2023-11-18 21:37:49,511 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-11-18 21:37:55,424 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3000, loss[loss=0.1138, simple_loss=0.1276, pruned_loss=0.03603, audio_tagging_loss=0.01399, over 15106.00 frames. ], tot_loss[loss=0.0979, simple_loss=0.1134, pruned_loss=0.02994, audio_tagging_loss=0.01124, over 3052417.45 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:37:55,425 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 21:38:15,091 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.2936, 4.9686, 4.8467, 5.1328], device='cuda:2') 2023-11-18 21:38:28,444 INFO [train_asr.py:1147] (2/4) Epoch 6, validation: loss=0.07003, simple_loss=0.05914, pruned_loss=0.008279, audio_tagging_loss=0.03218, over 4681554.00 frames. 2023-11-18 21:38:28,445 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 21:38:34,307 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2023-11-18 21:38:37,490 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.84 vs. limit=15.0 2023-11-18 21:38:39,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=420846.6666666667, ans=0.125 2023-11-18 21:39:03,792 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.758e+01 9.190e+01 1.009e+02 1.131e+02 1.432e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-18 21:39:06,577 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.92 vs. limit=15.0 2023-11-18 21:39:23,534 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3050, loss[loss=0.1041, simple_loss=0.1287, pruned_loss=0.03068, audio_tagging_loss=0.00907, over 14101.00 frames. ], tot_loss[loss=0.09783, simple_loss=0.1134, pruned_loss=0.02984, audio_tagging_loss=0.01126, over 3051677.90 frames. ], batch size: 55, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:39:25,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=421113.3333333333, ans=0.125 2023-11-18 21:39:29,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=421113.3333333333, ans=0.125 2023-11-18 21:39:29,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=421113.3333333333, ans=0.125 2023-11-18 21:39:35,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=421180.0, ans=0.125 2023-11-18 21:39:51,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=421246.6666666667, ans=0.125 2023-11-18 21:39:55,072 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:39:55,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=421246.6666666667, ans=0.5 2023-11-18 21:40:07,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=421380.0, ans=0.125 2023-11-18 21:40:10,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=421380.0, ans=0.0 2023-11-18 21:40:19,079 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3100, loss[loss=0.1067, simple_loss=0.1328, pruned_loss=0.02933, audio_tagging_loss=0.01097, over 14949.00 frames. ], tot_loss[loss=0.0987, simple_loss=0.1145, pruned_loss=0.03014, audio_tagging_loss=0.01133, over 3056988.92 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:40:20,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=421446.6666666667, ans=0.125 2023-11-18 21:40:34,994 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.90 vs. limit=10.0 2023-11-18 21:40:52,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=421646.6666666667, ans=0.1 2023-11-18 21:40:55,634 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.959e+01 9.848e+01 1.091e+02 1.372e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-18 21:41:01,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=421646.6666666667, ans=0.125 2023-11-18 21:41:07,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=421713.3333333333, ans=0.0 2023-11-18 21:41:13,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=421780.0, ans=0.1 2023-11-18 21:41:14,376 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3150, loss[loss=0.1141, simple_loss=0.1407, pruned_loss=0.0329, audio_tagging_loss=0.01081, over 16213.00 frames. ], tot_loss[loss=0.09922, simple_loss=0.1153, pruned_loss=0.03026, audio_tagging_loss=0.01133, over 3058346.30 frames. ], batch size: 59, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:41:23,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=421780.0, ans=0.0 2023-11-18 21:41:23,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=421780.0, ans=0.125 2023-11-18 21:41:59,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=422046.6666666667, ans=0.07 2023-11-18 21:42:09,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=422113.3333333333, ans=0.125 2023-11-18 21:42:10,394 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3200, loss[loss=0.1, simple_loss=0.1148, pruned_loss=0.03043, audio_tagging_loss=0.01222, over 16431.00 frames. ], tot_loss[loss=0.09877, simple_loss=0.1147, pruned_loss=0.03004, audio_tagging_loss=0.01138, over 3059752.19 frames. ], batch size: 59, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:42:20,524 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.16 vs. limit=15.0 2023-11-18 21:42:30,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=422246.6666666667, ans=0.0 2023-11-18 21:42:46,573 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.629e+01 9.130e+01 9.753e+01 1.111e+02 1.645e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-18 21:43:05,605 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3250, loss[loss=0.08937, simple_loss=0.1025, pruned_loss=0.0265, audio_tagging_loss=0.01162, over 14558.00 frames. ], tot_loss[loss=0.09915, simple_loss=0.1151, pruned_loss=0.03023, audio_tagging_loss=0.01139, over 3054815.27 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:43:17,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=422513.3333333333, ans=0.04949747468305833 2023-11-18 21:43:21,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=422513.3333333333, ans=0.125 2023-11-18 21:43:22,171 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.92 vs. limit=15.0 2023-11-18 21:43:25,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=422513.3333333333, ans=0.0 2023-11-18 21:43:41,231 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.82 vs. limit=15.0 2023-11-18 21:43:53,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=422713.3333333333, ans=0.125 2023-11-18 21:44:01,574 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3300, loss[loss=0.1068, simple_loss=0.1222, pruned_loss=0.03303, audio_tagging_loss=0.01263, over 16171.00 frames. ], tot_loss[loss=0.09908, simple_loss=0.1151, pruned_loss=0.03017, audio_tagging_loss=0.01137, over 3054208.33 frames. ], batch size: 58, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:44:12,919 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.325e-03 2023-11-18 21:44:38,007 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.927e+01 9.126e+01 1.034e+02 1.155e+02 1.977e+02, threshold=2.069e+02, percent-clipped=1.0 2023-11-18 21:44:39,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=422980.0, ans=0.0 2023-11-18 21:44:48,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=423046.6666666667, ans=0.125 2023-11-18 21:44:57,141 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3350, loss[loss=0.09875, simple_loss=0.1149, pruned_loss=0.03147, audio_tagging_loss=0.009844, over 14140.00 frames. ], tot_loss[loss=0.09952, simple_loss=0.1158, pruned_loss=0.03036, audio_tagging_loss=0.01124, over 3052546.28 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:45:24,760 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.05 vs. limit=15.0 2023-11-18 21:45:41,515 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.13 vs. limit=15.0 2023-11-18 21:45:46,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=423380.0, ans=0.125 2023-11-18 21:45:50,448 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.02 vs. limit=15.0 2023-11-18 21:45:52,881 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3400, loss[loss=0.1022, simple_loss=0.1231, pruned_loss=0.03057, audio_tagging_loss=0.01007, over 17149.00 frames. ], tot_loss[loss=0.1006, simple_loss=0.1174, pruned_loss=0.03083, audio_tagging_loss=0.01103, over 3056605.11 frames. ], batch size: 64, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:46:02,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=423513.3333333333, ans=0.125 2023-11-18 21:46:11,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=423513.3333333333, ans=0.2 2023-11-18 21:46:15,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=423580.0, ans=0.2 2023-11-18 21:46:18,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=423580.0, ans=0.0 2023-11-18 21:46:27,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=423646.6666666667, ans=0.125 2023-11-18 21:46:29,911 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.592e+01 8.844e+01 9.699e+01 1.055e+02 1.387e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-18 21:46:30,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=423646.6666666667, ans=10.0 2023-11-18 21:46:31,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=423646.6666666667, ans=0.125 2023-11-18 21:46:38,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=423713.3333333333, ans=0.1 2023-11-18 21:46:46,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=423713.3333333333, ans=0.125 2023-11-18 21:46:47,941 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3450, loss[loss=0.09309, simple_loss=0.1107, pruned_loss=0.02651, audio_tagging_loss=0.01122, over 15126.00 frames. ], tot_loss[loss=0.09958, simple_loss=0.1161, pruned_loss=0.03049, audio_tagging_loss=0.01105, over 3059677.43 frames. ], batch size: 57, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:46:49,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=423780.0, ans=0.0 2023-11-18 21:46:54,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=423780.0, ans=0.0 2023-11-18 21:46:54,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=423780.0, ans=0.125 2023-11-18 21:46:58,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=423846.6666666667, ans=0.0 2023-11-18 21:47:17,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=423913.3333333333, ans=0.0 2023-11-18 21:47:44,698 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3500, loss[loss=0.06521, simple_loss=0.06616, pruned_loss=0.01847, audio_tagging_loss=0.01366, over 15073.00 frames. ], tot_loss[loss=0.0994, simple_loss=0.1157, pruned_loss=0.03054, audio_tagging_loss=0.01103, over 3052652.32 frames. ], batch size: 60, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:48:04,467 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.78 vs. limit=15.0 2023-11-18 21:48:11,280 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:48:21,886 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 9.190e+01 1.062e+02 1.231e+02 1.599e+02, threshold=2.123e+02, percent-clipped=0.0 2023-11-18 21:48:32,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=424380.0, ans=0.125 2023-11-18 21:48:41,130 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3550, loss[loss=0.09693, simple_loss=0.1136, pruned_loss=0.03039, audio_tagging_loss=0.009719, over 16910.00 frames. ], tot_loss[loss=0.09821, simple_loss=0.1142, pruned_loss=0.03003, audio_tagging_loss=0.01106, over 3048687.84 frames. ], batch size: 63, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:48:47,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=424446.6666666667, ans=0.125 2023-11-18 21:48:54,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=424513.3333333333, ans=0.125 2023-11-18 21:49:11,515 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.08 vs. limit=10.0 2023-11-18 21:49:17,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=424646.6666666667, ans=0.5 2023-11-18 21:49:23,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=424646.6666666667, ans=0.05 2023-11-18 21:49:27,822 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.44 vs. limit=12.0 2023-11-18 21:49:32,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=424713.3333333333, ans=0.125 2023-11-18 21:49:34,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=424713.3333333333, ans=0.1 2023-11-18 21:49:35,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=424780.0, ans=0.0 2023-11-18 21:49:36,600 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3600, loss[loss=0.09742, simple_loss=0.1092, pruned_loss=0.0314, audio_tagging_loss=0.01144, over 14546.00 frames. ], tot_loss[loss=0.09724, simple_loss=0.1133, pruned_loss=0.0296, audio_tagging_loss=0.01098, over 3045776.56 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:49:44,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=424780.0, ans=0.1 2023-11-18 21:49:46,486 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=424780.0, ans=0.0 2023-11-18 21:50:13,743 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 9.290e+01 1.027e+02 1.176e+02 1.572e+02, threshold=2.055e+02, percent-clipped=0.0 2023-11-18 21:50:16,471 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.42 vs. limit=15.0 2023-11-18 21:50:27,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=425046.6666666667, ans=0.0 2023-11-18 21:50:33,099 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3650, loss[loss=0.09055, simple_loss=0.09413, pruned_loss=0.02785, audio_tagging_loss=0.01564, over 15035.00 frames. ], tot_loss[loss=0.09783, simple_loss=0.1137, pruned_loss=0.02997, audio_tagging_loss=0.01099, over 3045053.11 frames. ], batch size: 57, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:51:06,166 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.60 vs. limit=22.5 2023-11-18 21:51:17,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=425380.0, ans=0.125 2023-11-18 21:51:24,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=425380.0, ans=0.5 2023-11-18 21:51:24,324 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.77 vs. limit=15.0 2023-11-18 21:51:29,198 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3700, loss[loss=0.08477, simple_loss=0.1007, pruned_loss=0.0226, audio_tagging_loss=0.01183, over 15290.00 frames. ], tot_loss[loss=0.09797, simple_loss=0.1138, pruned_loss=0.03009, audio_tagging_loss=0.01099, over 3042239.78 frames. ], batch size: 57, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:51:30,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=425446.6666666667, ans=0.125 2023-11-18 21:51:36,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=425446.6666666667, ans=0.0 2023-11-18 21:51:36,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=425446.6666666667, ans=0.0 2023-11-18 21:51:38,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=425446.6666666667, ans=0.0 2023-11-18 21:52:00,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=425580.0, ans=0.95 2023-11-18 21:52:01,893 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:52:06,434 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.585e+01 8.991e+01 9.791e+01 1.095e+02 1.443e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-18 21:52:15,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=425713.3333333333, ans=0.0 2023-11-18 21:52:25,172 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3750, loss[loss=0.09532, simple_loss=0.1041, pruned_loss=0.03015, audio_tagging_loss=0.01314, over 16104.00 frames. ], tot_loss[loss=0.09781, simple_loss=0.1137, pruned_loss=0.03002, audio_tagging_loss=0.01093, over 3037005.98 frames. ], batch size: 60, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:52:40,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=425846.6666666667, ans=0.2 2023-11-18 21:52:46,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=425846.6666666667, ans=0.0 2023-11-18 21:53:02,237 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:53:21,306 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3800, loss[loss=0.09445, simple_loss=0.09893, pruned_loss=0.02983, audio_tagging_loss=0.01515, over 15513.00 frames. ], tot_loss[loss=0.09852, simple_loss=0.1146, pruned_loss=0.03022, audio_tagging_loss=0.01098, over 3044511.86 frames. ], batch size: 60, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:53:26,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=426113.3333333333, ans=0.1 2023-11-18 21:53:50,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=426246.6666666667, ans=0.125 2023-11-18 21:53:57,909 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.759e+01 9.502e+01 1.058e+02 1.503e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-18 21:54:09,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=426380.0, ans=0.125 2023-11-18 21:54:10,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=426380.0, ans=0.125 2023-11-18 21:54:10,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=426380.0, ans=0.125 2023-11-18 21:54:12,944 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:54:16,860 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3850, loss[loss=0.08086, simple_loss=0.09753, pruned_loss=0.02081, audio_tagging_loss=0.01129, over 15396.00 frames. ], tot_loss[loss=0.09856, simple_loss=0.1145, pruned_loss=0.03016, audio_tagging_loss=0.01113, over 3048000.44 frames. ], batch size: 60, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:54:38,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=426580.0, ans=0.0 2023-11-18 21:54:39,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=426580.0, ans=0.1 2023-11-18 21:54:42,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=426580.0, ans=0.125 2023-11-18 21:54:44,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=426580.0, ans=0.125 2023-11-18 21:54:44,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=426580.0, ans=0.125 2023-11-18 21:54:46,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=426580.0, ans=0.0 2023-11-18 21:55:03,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=426713.3333333333, ans=0.125 2023-11-18 21:55:14,857 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3900, loss[loss=0.07919, simple_loss=0.08875, pruned_loss=0.01945, audio_tagging_loss=0.01537, over 14584.00 frames. ], tot_loss[loss=0.09805, simple_loss=0.1138, pruned_loss=0.02994, audio_tagging_loss=0.0112, over 3049474.50 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:55:23,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=426780.0, ans=0.125 2023-11-18 21:55:51,458 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.999e+01 9.434e+01 1.040e+02 1.132e+02 1.500e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-18 21:55:59,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=427046.6666666667, ans=0.0 2023-11-18 21:56:09,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=427046.6666666667, ans=0.2 2023-11-18 21:56:10,969 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 3950, loss[loss=0.09411, simple_loss=0.09814, pruned_loss=0.03212, audio_tagging_loss=0.01292, over 14871.00 frames. ], tot_loss[loss=0.09928, simple_loss=0.1153, pruned_loss=0.03045, audio_tagging_loss=0.01118, over 3048175.79 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:56:11,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=427113.3333333333, ans=0.125 2023-11-18 21:56:23,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=427180.0, ans=0.125 2023-11-18 21:56:28,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=427180.0, ans=0.125 2023-11-18 21:56:41,453 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=8.30 vs. limit=15.0 2023-11-18 21:56:51,352 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2023-11-18 21:57:00,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=427380.0, ans=0.0 2023-11-18 21:57:01,953 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.45 vs. limit=15.0 2023-11-18 21:57:07,276 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4000, loss[loss=0.1157, simple_loss=0.1306, pruned_loss=0.03922, audio_tagging_loss=0.01122, over 15558.00 frames. ], tot_loss[loss=0.09829, simple_loss=0.1135, pruned_loss=0.03011, audio_tagging_loss=0.01142, over 3038631.54 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:57:28,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=427580.0, ans=0.1 2023-11-18 21:57:30,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=427580.0, ans=0.1 2023-11-18 21:57:31,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=427580.0, ans=0.125 2023-11-18 21:57:43,816 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.662e+01 9.312e+01 1.008e+02 1.147e+02 1.511e+02, threshold=2.016e+02, percent-clipped=0.0 2023-11-18 21:57:48,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=427646.6666666667, ans=0.125 2023-11-18 21:58:01,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=427780.0, ans=0.0 2023-11-18 21:58:02,413 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4050, loss[loss=0.1455, simple_loss=0.1718, pruned_loss=0.05031, audio_tagging_loss=0.009303, over 16121.00 frames. ], tot_loss[loss=0.09882, simple_loss=0.1142, pruned_loss=0.0303, audio_tagging_loss=0.0114, over 3041726.56 frames. ], batch size: 57, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:58:03,493 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:58:04,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=427780.0, ans=0.125 2023-11-18 21:58:20,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=427846.6666666667, ans=0.1 2023-11-18 21:58:40,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=427980.0, ans=0.125 2023-11-18 21:58:59,666 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4100, loss[loss=0.0999, simple_loss=0.1204, pruned_loss=0.03209, audio_tagging_loss=0.007634, over 14088.00 frames. ], tot_loss[loss=0.09815, simple_loss=0.1136, pruned_loss=0.02994, audio_tagging_loss=0.0114, over 3040585.53 frames. ], batch size: 54, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:59:06,790 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2023-11-18 21:59:13,785 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.73 vs. limit=22.5 2023-11-18 21:59:22,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=428246.6666666667, ans=0.1 2023-11-18 21:59:30,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=428246.6666666667, ans=0.125 2023-11-18 21:59:35,729 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 8.951e+01 9.681e+01 1.090e+02 3.452e+02, threshold=1.936e+02, percent-clipped=1.0 2023-11-18 21:59:39,963 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.06 vs. limit=22.5 2023-11-18 21:59:44,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=428380.0, ans=0.1 2023-11-18 21:59:55,347 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4150, loss[loss=0.0839, simple_loss=0.09995, pruned_loss=0.02251, audio_tagging_loss=0.01141, over 15060.00 frames. ], tot_loss[loss=0.09915, simple_loss=0.1152, pruned_loss=0.03029, audio_tagging_loss=0.01125, over 3038596.78 frames. ], batch size: 59, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 22:00:00,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=428446.6666666667, ans=0.07 2023-11-18 22:00:04,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=428446.6666666667, ans=0.125 2023-11-18 22:00:10,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=428513.3333333333, ans=0.2 2023-11-18 22:00:23,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=428580.0, ans=0.125 2023-11-18 22:00:30,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=428646.6666666667, ans=0.125 2023-11-18 22:00:34,722 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:00:50,553 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4200, loss[loss=0.1303, simple_loss=0.1609, pruned_loss=0.04191, audio_tagging_loss=0.007992, over 15152.00 frames. ], tot_loss[loss=0.09976, simple_loss=0.1165, pruned_loss=0.03053, audio_tagging_loss=0.01099, over 3045949.45 frames. ], batch size: 54, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 22:01:04,888 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2023-11-18 22:01:22,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=428913.3333333333, ans=0.125 2023-11-18 22:01:27,562 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 8.609e+01 9.394e+01 1.081e+02 1.374e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-18 22:01:35,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=429046.6666666667, ans=0.125 2023-11-18 22:01:36,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=429046.6666666667, ans=0.125 2023-11-18 22:01:46,231 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4250, loss[loss=0.1011, simple_loss=0.1154, pruned_loss=0.03101, audio_tagging_loss=0.01239, over 15767.00 frames. ], tot_loss[loss=0.09989, simple_loss=0.1168, pruned_loss=0.03058, audio_tagging_loss=0.01092, over 3051048.93 frames. ], batch size: 60, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 22:01:46,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=429113.3333333333, ans=0.025 2023-11-18 22:02:26,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=429313.3333333333, ans=0.0 2023-11-18 22:02:29,387 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.28 vs. limit=15.0 2023-11-18 22:02:33,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=429380.0, ans=0.04949747468305833 2023-11-18 22:02:35,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=429380.0, ans=0.125 2023-11-18 22:02:43,274 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4300, loss[loss=0.1023, simple_loss=0.1132, pruned_loss=0.03495, audio_tagging_loss=0.0107, over 14627.00 frames. ], tot_loss[loss=0.09975, simple_loss=0.1169, pruned_loss=0.03052, audio_tagging_loss=0.01079, over 3056357.13 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 22:02:58,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=429513.3333333333, ans=0.125 2023-11-18 22:03:02,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=429513.3333333333, ans=0.0 2023-11-18 22:03:20,036 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 9.239e+01 1.003e+02 1.122e+02 1.597e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-18 22:03:38,791 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4350, loss[loss=0.08818, simple_loss=0.1006, pruned_loss=0.02533, audio_tagging_loss=0.01255, over 15502.00 frames. ], tot_loss[loss=0.09986, simple_loss=0.117, pruned_loss=0.03057, audio_tagging_loss=0.01077, over 3060108.88 frames. ], batch size: 58, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 22:03:39,238 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.54 vs. limit=15.0 2023-11-18 22:03:42,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=429780.0, ans=0.125 2023-11-18 22:03:52,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=429846.6666666667, ans=0.1 2023-11-18 22:03:59,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=429846.6666666667, ans=0.04949747468305833 2023-11-18 22:04:09,121 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.54 vs. limit=22.5 2023-11-18 22:04:18,642 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.10 vs. limit=15.0 2023-11-18 22:04:34,586 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4400, loss[loss=0.1096, simple_loss=0.1213, pruned_loss=0.04103, audio_tagging_loss=0.00794, over 15253.00 frames. ], tot_loss[loss=0.1, simple_loss=0.1173, pruned_loss=0.03069, audio_tagging_loss=0.0107, over 3061592.77 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:05:11,341 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.875e+01 9.886e+01 1.073e+02 1.418e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-18 22:05:13,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=430313.3333333333, ans=0.125 2023-11-18 22:05:29,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=430380.0, ans=0.125 2023-11-18 22:05:31,696 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4450, loss[loss=0.07059, simple_loss=0.07878, pruned_loss=0.01896, audio_tagging_loss=0.01224, over 15600.00 frames. ], tot_loss[loss=0.09986, simple_loss=0.1168, pruned_loss=0.03069, audio_tagging_loss=0.01075, over 3064191.75 frames. ], batch size: 60, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:05:31,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=430446.6666666667, ans=0.07 2023-11-18 22:05:44,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=430513.3333333333, ans=0.05 2023-11-18 22:05:55,666 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.31 vs. limit=22.5 2023-11-18 22:05:58,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=430580.0, ans=0.125 2023-11-18 22:05:59,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=430580.0, ans=0.125 2023-11-18 22:06:01,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=430580.0, ans=0.125 2023-11-18 22:06:02,533 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.90 vs. limit=15.0 2023-11-18 22:06:06,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=430646.6666666667, ans=0.0 2023-11-18 22:06:13,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=430646.6666666667, ans=0.125 2023-11-18 22:06:26,831 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4500, loss[loss=0.1091, simple_loss=0.1232, pruned_loss=0.03753, audio_tagging_loss=0.009971, over 14390.00 frames. ], tot_loss[loss=0.09924, simple_loss=0.1163, pruned_loss=0.03025, audio_tagging_loss=0.01083, over 3061817.85 frames. ], batch size: 55, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:06:30,247 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=430780.0, ans=0.125 2023-11-18 22:06:43,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=430846.6666666667, ans=0.025 2023-11-18 22:06:48,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=430913.3333333333, ans=0.125 2023-11-18 22:06:53,523 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.59 vs. limit=22.5 2023-11-18 22:06:54,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=430913.3333333333, ans=0.05 2023-11-18 22:07:03,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=430980.0, ans=0.125 2023-11-18 22:07:03,970 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.610e+01 9.076e+01 9.936e+01 1.124e+02 1.630e+02, threshold=1.987e+02, percent-clipped=0.0 2023-11-18 22:07:08,755 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.32 vs. limit=15.0 2023-11-18 22:07:10,782 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.47 vs. limit=15.0 2023-11-18 22:07:13,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=431046.6666666667, ans=0.125 2023-11-18 22:07:22,444 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4550, loss[loss=0.09938, simple_loss=0.1186, pruned_loss=0.03008, audio_tagging_loss=0.01002, over 16224.00 frames. ], tot_loss[loss=0.09853, simple_loss=0.1153, pruned_loss=0.02998, audio_tagging_loss=0.01091, over 3055090.79 frames. ], batch size: 62, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:07:25,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=431113.3333333333, ans=0.125 2023-11-18 22:07:43,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=431180.0, ans=0.1 2023-11-18 22:07:45,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=431246.6666666667, ans=0.125 2023-11-18 22:07:49,741 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.55 vs. limit=15.0 2023-11-18 22:08:02,179 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:08:07,382 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2023-11-18 22:08:18,477 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4600, loss[loss=0.08987, simple_loss=0.09996, pruned_loss=0.02663, audio_tagging_loss=0.01327, over 14222.00 frames. ], tot_loss[loss=0.09787, simple_loss=0.1141, pruned_loss=0.02974, audio_tagging_loss=0.01106, over 3053073.97 frames. ], batch size: 55, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:08:19,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=431446.6666666667, ans=0.1 2023-11-18 22:08:25,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=431446.6666666667, ans=0.125 2023-11-18 22:08:30,433 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.14 vs. limit=22.5 2023-11-18 22:08:47,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=431580.0, ans=0.125 2023-11-18 22:08:54,522 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.10 vs. limit=15.0 2023-11-18 22:08:55,074 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.849e+01 8.959e+01 9.865e+01 1.112e+02 1.512e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 22:09:14,276 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4650, loss[loss=0.1176, simple_loss=0.1453, pruned_loss=0.03522, audio_tagging_loss=0.009753, over 15795.00 frames. ], tot_loss[loss=0.09859, simple_loss=0.115, pruned_loss=0.03001, audio_tagging_loss=0.0111, over 3051768.61 frames. ], batch size: 57, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:09:20,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=431780.0, ans=0.2 2023-11-18 22:09:30,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=431846.6666666667, ans=0.0 2023-11-18 22:09:57,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=431980.0, ans=0.125 2023-11-18 22:10:01,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=432046.6666666667, ans=0.0 2023-11-18 22:10:04,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=432046.6666666667, ans=0.0 2023-11-18 22:10:05,992 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2023-11-18 22:10:09,896 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4700, loss[loss=0.103, simple_loss=0.1233, pruned_loss=0.0286, audio_tagging_loss=0.01278, over 15285.00 frames. ], tot_loss[loss=0.09835, simple_loss=0.1144, pruned_loss=0.02992, audio_tagging_loss=0.01122, over 3060335.54 frames. ], batch size: 58, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:10:17,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=432113.3333333333, ans=0.0 2023-11-18 22:10:18,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=432113.3333333333, ans=0.125 2023-11-18 22:10:29,596 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:10:33,834 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=432246.6666666667, ans=0.2 2023-11-18 22:10:35,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=432246.6666666667, ans=0.125 2023-11-18 22:10:39,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=432246.6666666667, ans=0.125 2023-11-18 22:10:41,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=432246.6666666667, ans=0.2 2023-11-18 22:10:47,009 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.805e+01 9.825e+01 1.121e+02 1.529e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-18 22:11:03,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=432380.0, ans=0.2 2023-11-18 22:11:06,127 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4750, loss[loss=0.08897, simple_loss=0.1007, pruned_loss=0.02641, audio_tagging_loss=0.0122, over 15638.00 frames. ], tot_loss[loss=0.09722, simple_loss=0.1129, pruned_loss=0.02938, audio_tagging_loss=0.01138, over 3050211.83 frames. ], batch size: 58, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:11:21,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=432513.3333333333, ans=0.0 2023-11-18 22:11:33,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=432580.0, ans=0.125 2023-11-18 22:11:47,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=432646.6666666667, ans=0.1 2023-11-18 22:12:02,253 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4800, loss[loss=0.09775, simple_loss=0.1162, pruned_loss=0.02707, audio_tagging_loss=0.0126, over 16379.00 frames. ], tot_loss[loss=0.09767, simple_loss=0.1133, pruned_loss=0.0295, audio_tagging_loss=0.01153, over 3047568.59 frames. ], batch size: 60, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:12:22,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=432846.6666666667, ans=0.0 2023-11-18 22:12:23,314 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=22.5 2023-11-18 22:12:32,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=432913.3333333333, ans=0.04949747468305833 2023-11-18 22:12:37,186 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.50 vs. limit=15.0 2023-11-18 22:12:39,893 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.105e+01 8.787e+01 9.594e+01 1.036e+02 1.388e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-18 22:12:57,427 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4850, loss[loss=0.08714, simple_loss=0.09977, pruned_loss=0.02622, audio_tagging_loss=0.01104, over 14380.00 frames. ], tot_loss[loss=0.09789, simple_loss=0.1137, pruned_loss=0.0296, audio_tagging_loss=0.01144, over 3048367.80 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:13:03,819 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.90 vs. limit=15.0 2023-11-18 22:13:12,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=433180.0, ans=0.2 2023-11-18 22:13:14,466 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.51 vs. limit=22.5 2023-11-18 22:13:29,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=433246.6666666667, ans=0.0 2023-11-18 22:13:54,000 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4900, loss[loss=0.09263, simple_loss=0.1194, pruned_loss=0.02455, audio_tagging_loss=0.008394, over 15775.00 frames. ], tot_loss[loss=0.09722, simple_loss=0.1132, pruned_loss=0.02925, audio_tagging_loss=0.01134, over 3052155.27 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:13:54,728 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.15 vs. limit=22.5 2023-11-18 22:13:55,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=433446.6666666667, ans=0.2 2023-11-18 22:14:17,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=433580.0, ans=0.05 2023-11-18 22:14:32,279 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.047e+01 8.759e+01 9.245e+01 1.025e+02 1.316e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-18 22:14:40,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=433713.3333333333, ans=0.125 2023-11-18 22:14:41,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=433713.3333333333, ans=0.125 2023-11-18 22:14:49,981 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 4950, loss[loss=0.09092, simple_loss=0.09641, pruned_loss=0.03051, audio_tagging_loss=0.01221, over 14893.00 frames. ], tot_loss[loss=0.09704, simple_loss=0.1132, pruned_loss=0.0293, audio_tagging_loss=0.01116, over 3051257.36 frames. ], batch size: 58, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:14:54,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=433780.0, ans=0.1 2023-11-18 22:15:45,753 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5000, loss[loss=0.09255, simple_loss=0.1141, pruned_loss=0.02455, audio_tagging_loss=0.01097, over 16240.00 frames. ], tot_loss[loss=0.09694, simple_loss=0.1131, pruned_loss=0.02941, audio_tagging_loss=0.01097, over 3050631.16 frames. ], batch size: 61, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:16:09,084 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2023-11-18 22:16:23,436 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.033e+01 8.790e+01 9.696e+01 1.074e+02 1.675e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-18 22:16:28,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=434313.3333333333, ans=0.125 2023-11-18 22:16:41,998 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5050, loss[loss=0.0726, simple_loss=0.07931, pruned_loss=0.01833, audio_tagging_loss=0.01463, over 14967.00 frames. ], tot_loss[loss=0.09701, simple_loss=0.1133, pruned_loss=0.02942, audio_tagging_loss=0.01093, over 3042784.60 frames. ], batch size: 59, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:17:17,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=434646.6666666667, ans=0.125 2023-11-18 22:17:28,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=434713.3333333333, ans=0.1 2023-11-18 22:17:38,163 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5100, loss[loss=0.1243, simple_loss=0.1477, pruned_loss=0.04127, audio_tagging_loss=0.009184, over 17787.00 frames. ], tot_loss[loss=0.09717, simple_loss=0.1134, pruned_loss=0.02959, audio_tagging_loss=0.0109, over 3042378.44 frames. ], batch size: 66, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:17:46,937 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:17:47,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=434846.6666666667, ans=0.0 2023-11-18 22:18:13,074 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2023-11-18 22:18:16,618 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.735e+01 9.607e+01 1.051e+02 1.879e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-18 22:18:19,991 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=434980.0, ans=0.125 2023-11-18 22:18:20,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=434980.0, ans=0.125 2023-11-18 22:18:33,464 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5150, loss[loss=0.09016, simple_loss=0.1068, pruned_loss=0.02425, audio_tagging_loss=0.01249, over 14827.00 frames. ], tot_loss[loss=0.0967, simple_loss=0.1129, pruned_loss=0.02929, audio_tagging_loss=0.01094, over 3041119.12 frames. ], batch size: 55, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:18:37,631 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.96 vs. limit=15.0 2023-11-18 22:18:39,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=435113.3333333333, ans=0.2 2023-11-18 22:19:00,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=435246.6666666667, ans=0.2 2023-11-18 22:19:09,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=435313.3333333333, ans=0.0 2023-11-18 22:19:14,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=435313.3333333333, ans=0.125 2023-11-18 22:19:30,433 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5200, loss[loss=0.08693, simple_loss=0.09776, pruned_loss=0.02339, audio_tagging_loss=0.01467, over 14735.00 frames. ], tot_loss[loss=0.0967, simple_loss=0.1129, pruned_loss=0.02924, audio_tagging_loss=0.011, over 3045393.00 frames. ], batch size: 55, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:19:52,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=435580.0, ans=0.125 2023-11-18 22:19:53,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=435580.0, ans=0.125 2023-11-18 22:20:01,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=435580.0, ans=0.125 2023-11-18 22:20:02,483 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.61 vs. limit=6.0 2023-11-18 22:20:07,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=435646.6666666667, ans=0.125 2023-11-18 22:20:09,690 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.828e+01 8.948e+01 9.784e+01 1.083e+02 1.629e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-18 22:20:09,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=435646.6666666667, ans=0.125 2023-11-18 22:20:17,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=435713.3333333333, ans=0.07 2023-11-18 22:20:22,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=435713.3333333333, ans=0.125 2023-11-18 22:20:25,526 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5250, loss[loss=0.1204, simple_loss=0.1368, pruned_loss=0.04142, audio_tagging_loss=0.01054, over 15620.00 frames. ], tot_loss[loss=0.09724, simple_loss=0.1135, pruned_loss=0.02949, audio_tagging_loss=0.01099, over 3033588.56 frames. ], batch size: 57, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:20:42,702 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=435846.6666666667, ans=0.0 2023-11-18 22:21:03,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=435980.0, ans=0.2 2023-11-18 22:21:10,079 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.85 vs. limit=10.0 2023-11-18 22:21:19,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=436046.6666666667, ans=0.04949747468305833 2023-11-18 22:21:21,100 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5300, loss[loss=0.08754, simple_loss=0.1036, pruned_loss=0.02422, audio_tagging_loss=0.01152, over 14741.00 frames. ], tot_loss[loss=0.09774, simple_loss=0.1139, pruned_loss=0.02981, audio_tagging_loss=0.01098, over 3034453.63 frames. ], batch size: 55, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:21:30,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=436113.3333333333, ans=0.125 2023-11-18 22:22:01,695 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.866e+01 8.659e+01 9.451e+01 1.050e+02 1.358e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-18 22:22:10,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=436380.0, ans=0.125 2023-11-18 22:22:17,478 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5350, loss[loss=0.08357, simple_loss=0.1026, pruned_loss=0.02212, audio_tagging_loss=0.01017, over 15613.00 frames. ], tot_loss[loss=0.09889, simple_loss=0.1157, pruned_loss=0.03006, audio_tagging_loss=0.01097, over 3032587.05 frames. ], batch size: 58, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:22:25,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=436446.6666666667, ans=0.0 2023-11-18 22:22:35,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=436513.3333333333, ans=0.2 2023-11-18 22:22:56,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=436646.6666666667, ans=0.125 2023-11-18 22:23:12,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=436780.0, ans=0.125 2023-11-18 22:23:13,373 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5400, loss[loss=0.08818, simple_loss=0.1048, pruned_loss=0.02599, audio_tagging_loss=0.009781, over 14213.00 frames. ], tot_loss[loss=0.09773, simple_loss=0.1141, pruned_loss=0.0296, audio_tagging_loss=0.01107, over 3035200.51 frames. ], batch size: 53, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:23:50,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=436980.0, ans=0.035 2023-11-18 22:23:53,556 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.845e+01 9.112e+01 1.017e+02 1.141e+02 1.585e+02, threshold=2.034e+02, percent-clipped=0.0 2023-11-18 22:24:07,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=437113.3333333333, ans=0.125 2023-11-18 22:24:08,344 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5450, loss[loss=0.1107, simple_loss=0.1211, pruned_loss=0.03782, audio_tagging_loss=0.01237, over 14578.00 frames. ], tot_loss[loss=0.09946, simple_loss=0.1161, pruned_loss=0.03041, audio_tagging_loss=0.01098, over 3038616.48 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:25:04,301 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5500, loss[loss=0.103, simple_loss=0.1333, pruned_loss=0.02888, audio_tagging_loss=0.007433, over 14642.00 frames. ], tot_loss[loss=0.09923, simple_loss=0.1157, pruned_loss=0.03037, audio_tagging_loss=0.011, over 3036554.00 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:25:16,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=437513.3333333333, ans=0.95 2023-11-18 22:25:17,915 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.04 vs. limit=15.0 2023-11-18 22:25:18,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=437513.3333333333, ans=0.125 2023-11-18 22:25:25,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=437513.3333333333, ans=0.0 2023-11-18 22:25:27,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=437580.0, ans=0.2 2023-11-18 22:25:33,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437580.0, ans=0.1 2023-11-18 22:25:43,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=437646.6666666667, ans=0.125 2023-11-18 22:25:44,446 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.355e+01 8.778e+01 9.490e+01 1.043e+02 1.354e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-18 22:25:50,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=437713.3333333333, ans=0.125 2023-11-18 22:26:00,858 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5550, loss[loss=0.09401, simple_loss=0.1122, pruned_loss=0.02439, audio_tagging_loss=0.01352, over 14330.00 frames. ], tot_loss[loss=0.09937, simple_loss=0.1158, pruned_loss=0.0304, audio_tagging_loss=0.01107, over 3033847.37 frames. ], batch size: 53, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:26:04,503 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2023-11-18 22:26:09,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=437780.0, ans=0.1 2023-11-18 22:26:26,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=437913.3333333333, ans=0.5 2023-11-18 22:26:30,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=437913.3333333333, ans=0.125 2023-11-18 22:26:39,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=437980.0, ans=0.125 2023-11-18 22:26:51,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=438046.6666666667, ans=0.125 2023-11-18 22:26:55,684 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5600, loss[loss=0.08625, simple_loss=0.09891, pruned_loss=0.024, audio_tagging_loss=0.0128, over 15653.00 frames. ], tot_loss[loss=0.09926, simple_loss=0.1156, pruned_loss=0.03025, audio_tagging_loss=0.0112, over 3042296.77 frames. ], batch size: 59, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:27:11,629 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.29 vs. limit=15.0 2023-11-18 22:27:24,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=438246.6666666667, ans=0.0 2023-11-18 22:27:34,841 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:27:35,842 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.410e+01 9.319e+01 9.867e+01 1.109e+02 1.388e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 22:27:40,834 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=12.0 2023-11-18 22:27:51,224 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5650, loss[loss=0.09934, simple_loss=0.1258, pruned_loss=0.02666, audio_tagging_loss=0.009755, over 14754.00 frames. ], tot_loss[loss=0.09909, simple_loss=0.1155, pruned_loss=0.03012, audio_tagging_loss=0.01119, over 3044429.63 frames. ], batch size: 53, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:27:58,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=438446.6666666667, ans=0.2 2023-11-18 22:27:58,881 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.87 vs. limit=22.5 2023-11-18 22:28:06,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=438513.3333333333, ans=0.0 2023-11-18 22:28:15,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=438580.0, ans=0.0 2023-11-18 22:28:19,683 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=438580.0, ans=0.125 2023-11-18 22:28:20,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=438580.0, ans=0.125 2023-11-18 22:28:24,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=438646.6666666667, ans=0.125 2023-11-18 22:28:30,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=438646.6666666667, ans=0.0 2023-11-18 22:28:45,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=438713.3333333333, ans=15.0 2023-11-18 22:28:46,901 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5700, loss[loss=0.07479, simple_loss=0.07625, pruned_loss=0.02203, audio_tagging_loss=0.01463, over 12896.00 frames. ], tot_loss[loss=0.09801, simple_loss=0.1142, pruned_loss=0.02965, audio_tagging_loss=0.01127, over 3039719.55 frames. ], batch size: 53, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:28:53,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=438780.0, ans=0.2 2023-11-18 22:29:00,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=438846.6666666667, ans=0.125 2023-11-18 22:29:09,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=438913.3333333333, ans=0.2 2023-11-18 22:29:09,656 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2023-11-18 22:29:27,086 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.478e+01 8.890e+01 9.546e+01 1.053e+02 1.340e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-18 22:29:33,825 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.795e-01 2023-11-18 22:29:43,000 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5750, loss[loss=0.09765, simple_loss=0.1218, pruned_loss=0.02392, audio_tagging_loss=0.01284, over 14516.00 frames. ], tot_loss[loss=0.09746, simple_loss=0.1135, pruned_loss=0.02947, audio_tagging_loss=0.01125, over 3046748.87 frames. ], batch size: 55, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:29:57,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=439180.0, ans=0.125 2023-11-18 22:30:11,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=439246.6666666667, ans=0.125 2023-11-18 22:30:16,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=439313.3333333333, ans=0.125 2023-11-18 22:30:28,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=439380.0, ans=0.125 2023-11-18 22:30:30,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=439380.0, ans=0.125 2023-11-18 22:30:37,476 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5800, loss[loss=0.1449, simple_loss=0.1563, pruned_loss=0.05887, audio_tagging_loss=0.007928, over 15616.00 frames. ], tot_loss[loss=0.09747, simple_loss=0.1135, pruned_loss=0.02963, audio_tagging_loss=0.01109, over 3045139.10 frames. ], batch size: 57, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:30:46,719 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.29 vs. limit=22.5 2023-11-18 22:30:57,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=439513.3333333333, ans=0.125 2023-11-18 22:31:13,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=439646.6666666667, ans=0.1 2023-11-18 22:31:17,839 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.084e+01 8.421e+01 9.751e+01 1.061e+02 1.528e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-18 22:31:33,293 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.74 vs. limit=15.0 2023-11-18 22:31:33,830 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5850, loss[loss=0.08894, simple_loss=0.09424, pruned_loss=0.03076, audio_tagging_loss=0.01106, over 17159.00 frames. ], tot_loss[loss=0.09733, simple_loss=0.1133, pruned_loss=0.02971, audio_tagging_loss=0.01097, over 3045572.27 frames. ], batch size: 67, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:31:55,414 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.71 vs. limit=15.0 2023-11-18 22:31:58,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=439913.3333333333, ans=0.0 2023-11-18 22:32:08,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=439980.0, ans=0.1 2023-11-18 22:32:13,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=439980.0, ans=0.2 2023-11-18 22:32:23,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=440046.6666666667, ans=0.125 2023-11-18 22:32:27,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=440046.6666666667, ans=0.0 2023-11-18 22:32:29,692 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5900, loss[loss=0.07923, simple_loss=0.1009, pruned_loss=0.01997, audio_tagging_loss=0.008817, over 15221.00 frames. ], tot_loss[loss=0.09755, simple_loss=0.1138, pruned_loss=0.02969, audio_tagging_loss=0.01095, over 3058398.51 frames. ], batch size: 55, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:32:34,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=440113.3333333333, ans=0.1 2023-11-18 22:33:06,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=440313.3333333333, ans=0.125 2023-11-18 22:33:06,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=440313.3333333333, ans=0.0 2023-11-18 22:33:09,421 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.910e+01 8.634e+01 9.404e+01 1.031e+02 1.635e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-18 22:33:16,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=440380.0, ans=0.125 2023-11-18 22:33:25,042 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 5950, loss[loss=0.1369, simple_loss=0.1579, pruned_loss=0.04835, audio_tagging_loss=0.009654, over 15164.00 frames. ], tot_loss[loss=0.09744, simple_loss=0.1138, pruned_loss=0.0296, audio_tagging_loss=0.01094, over 3062453.75 frames. ], batch size: 55, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:33:38,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440513.3333333333, ans=0.1 2023-11-18 22:33:39,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=440513.3333333333, ans=0.2 2023-11-18 22:33:45,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=440513.3333333333, ans=0.07 2023-11-18 22:33:48,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=440580.0, ans=0.125 2023-11-18 22:33:59,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=440646.6666666667, ans=0.2 2023-11-18 22:34:21,299 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6000, loss[loss=0.1015, simple_loss=0.1086, pruned_loss=0.03437, audio_tagging_loss=0.01281, over 14774.00 frames. ], tot_loss[loss=0.0969, simple_loss=0.1134, pruned_loss=0.02924, audio_tagging_loss=0.01097, over 3055650.96 frames. ], batch size: 59, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:34:21,300 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 22:34:52,027 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3697, 3.5181, 1.8458, 3.6582], device='cuda:2') 2023-11-18 22:34:54,504 INFO [train_asr.py:1147] (2/4) Epoch 6, validation: loss=0.07034, simple_loss=0.0589, pruned_loss=0.008199, audio_tagging_loss=0.03269, over 4681554.00 frames. 2023-11-18 22:34:54,505 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 22:34:56,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=440780.0, ans=0.125 2023-11-18 22:35:12,606 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.48 vs. limit=15.0 2023-11-18 22:35:33,139 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:35:34,149 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.702e+01 9.372e+01 1.021e+02 1.628e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-18 22:35:50,037 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6050, loss[loss=0.1001, simple_loss=0.1178, pruned_loss=0.03003, audio_tagging_loss=0.01118, over 15729.00 frames. ], tot_loss[loss=0.09715, simple_loss=0.1137, pruned_loss=0.02943, audio_tagging_loss=0.01086, over 3057098.87 frames. ], batch size: 60, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:35:53,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=441113.3333333333, ans=0.125 2023-11-18 22:35:54,614 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2023-11-18 22:36:02,495 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.59 vs. limit=15.0 2023-11-18 22:36:08,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=441180.0, ans=0.1 2023-11-18 22:36:17,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=441246.6666666667, ans=0.125 2023-11-18 22:36:29,710 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.78 vs. limit=15.0 2023-11-18 22:36:41,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=441380.0, ans=0.015 2023-11-18 22:36:43,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=441380.0, ans=0.125 2023-11-18 22:36:46,621 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6100, loss[loss=0.08922, simple_loss=0.09967, pruned_loss=0.02768, audio_tagging_loss=0.0117, over 16671.00 frames. ], tot_loss[loss=0.09769, simple_loss=0.1143, pruned_loss=0.02968, audio_tagging_loss=0.01084, over 3061319.40 frames. ], batch size: 62, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:36:46,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=441446.6666666667, ans=0.125 2023-11-18 22:37:23,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=441646.6666666667, ans=10.0 2023-11-18 22:37:26,320 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 8.924e+01 9.623e+01 1.090e+02 1.421e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-18 22:37:41,753 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6150, loss[loss=0.1143, simple_loss=0.1349, pruned_loss=0.03657, audio_tagging_loss=0.01031, over 15927.00 frames. ], tot_loss[loss=0.0976, simple_loss=0.1139, pruned_loss=0.02971, audio_tagging_loss=0.01094, over 3052665.77 frames. ], batch size: 60, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:37:44,291 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.51 vs. limit=15.0 2023-11-18 22:37:49,393 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.99 vs. limit=15.0 2023-11-18 22:37:52,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=441846.6666666667, ans=0.025 2023-11-18 22:37:54,138 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=441846.6666666667, ans=0.125 2023-11-18 22:38:37,169 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6200, loss[loss=0.1052, simple_loss=0.1183, pruned_loss=0.03244, audio_tagging_loss=0.01364, over 15995.00 frames. ], tot_loss[loss=0.09604, simple_loss=0.1117, pruned_loss=0.02911, audio_tagging_loss=0.01108, over 3045974.33 frames. ], batch size: 63, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:38:42,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff3.min_abs, batch_count=442113.3333333333, ans=0.2 2023-11-18 22:38:49,615 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.57 vs. limit=10.0 2023-11-18 22:39:17,402 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.454e+01 8.898e+01 9.637e+01 1.062e+02 1.709e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-18 22:39:19,009 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.64 vs. limit=15.0 2023-11-18 22:39:33,421 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6250, loss[loss=0.0991, simple_loss=0.1074, pruned_loss=0.0344, audio_tagging_loss=0.01099, over 14253.00 frames. ], tot_loss[loss=0.09655, simple_loss=0.112, pruned_loss=0.02936, audio_tagging_loss=0.0112, over 3044101.28 frames. ], batch size: 55, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:39:35,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=442446.6666666667, ans=0.0 2023-11-18 22:39:38,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=442446.6666666667, ans=0.125 2023-11-18 22:40:15,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=442646.6666666667, ans=0.2 2023-11-18 22:40:19,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=442713.3333333333, ans=0.125 2023-11-18 22:40:19,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=442713.3333333333, ans=0.0 2023-11-18 22:40:23,722 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.59 vs. limit=15.0 2023-11-18 22:40:24,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=442713.3333333333, ans=0.125 2023-11-18 22:40:26,080 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.49 vs. limit=22.5 2023-11-18 22:40:29,553 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6300, loss[loss=0.09559, simple_loss=0.1063, pruned_loss=0.03131, audio_tagging_loss=0.01111, over 15708.00 frames. ], tot_loss[loss=0.0961, simple_loss=0.1113, pruned_loss=0.02917, audio_tagging_loss=0.01128, over 3044352.14 frames. ], batch size: 58, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:40:32,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=442780.0, ans=0.125 2023-11-18 22:40:41,761 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2023-11-18 22:40:42,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=442846.6666666667, ans=0.125 2023-11-18 22:40:43,497 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.82 vs. limit=10.0 2023-11-18 22:40:51,208 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.23 vs. limit=15.0 2023-11-18 22:40:51,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=442913.3333333333, ans=0.1 2023-11-18 22:40:52,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=442913.3333333333, ans=0.0 2023-11-18 22:41:08,050 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.66 vs. limit=22.5 2023-11-18 22:41:09,551 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.985e+01 9.817e+01 1.039e+02 1.348e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-18 22:41:24,954 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6350, loss[loss=0.07617, simple_loss=0.08521, pruned_loss=0.02127, audio_tagging_loss=0.01229, over 13535.00 frames. ], tot_loss[loss=0.09561, simple_loss=0.1106, pruned_loss=0.02897, audio_tagging_loss=0.01133, over 3042776.47 frames. ], batch size: 54, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:41:32,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=443113.3333333333, ans=0.1 2023-11-18 22:41:58,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=443313.3333333333, ans=0.1 2023-11-18 22:42:00,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=443313.3333333333, ans=0.025 2023-11-18 22:42:03,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=443313.3333333333, ans=0.125 2023-11-18 22:42:13,645 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.61 vs. limit=15.0 2023-11-18 22:42:15,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=443380.0, ans=0.125 2023-11-18 22:42:21,095 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6400, loss[loss=0.101, simple_loss=0.111, pruned_loss=0.03197, audio_tagging_loss=0.0135, over 14532.00 frames. ], tot_loss[loss=0.09546, simple_loss=0.1107, pruned_loss=0.02877, audio_tagging_loss=0.01136, over 3040495.23 frames. ], batch size: 54, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:42:22,339 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:42:27,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=443446.6666666667, ans=0.07 2023-11-18 22:42:40,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=443513.3333333333, ans=0.2 2023-11-18 22:43:01,449 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 8.672e+01 9.519e+01 1.064e+02 1.432e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-18 22:43:12,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=443713.3333333333, ans=0.125 2023-11-18 22:43:14,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=443713.3333333333, ans=0.125 2023-11-18 22:43:16,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=443780.0, ans=0.0 2023-11-18 22:43:16,888 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6450, loss[loss=0.07753, simple_loss=0.09123, pruned_loss=0.02139, audio_tagging_loss=0.01053, over 14640.00 frames. ], tot_loss[loss=0.09625, simple_loss=0.1118, pruned_loss=0.02902, audio_tagging_loss=0.01134, over 3043576.48 frames. ], batch size: 56, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:43:29,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=443846.6666666667, ans=0.125 2023-11-18 22:43:33,333 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.89 vs. limit=15.0 2023-11-18 22:43:51,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=443980.0, ans=0.1 2023-11-18 22:44:05,227 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:44:12,487 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6500, loss[loss=0.1021, simple_loss=0.1258, pruned_loss=0.0301, audio_tagging_loss=0.009129, over 16058.00 frames. ], tot_loss[loss=0.0954, simple_loss=0.111, pruned_loss=0.02871, audio_tagging_loss=0.01121, over 3048013.39 frames. ], batch size: 57, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:44:17,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=444113.3333333333, ans=0.125 2023-11-18 22:44:19,761 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.99 vs. limit=12.0 2023-11-18 22:44:25,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=444180.0, ans=0.0 2023-11-18 22:44:25,898 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.75 vs. limit=12.0 2023-11-18 22:44:29,468 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.49 vs. limit=12.0 2023-11-18 22:44:49,933 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2023-11-18 22:44:50,107 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.27 vs. limit=12.0 2023-11-18 22:44:52,545 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.587e+01 9.452e+01 1.044e+02 1.613e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-18 22:45:09,097 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6550, loss[loss=0.08539, simple_loss=0.09907, pruned_loss=0.02749, audio_tagging_loss=0.008366, over 15119.00 frames. ], tot_loss[loss=0.09618, simple_loss=0.112, pruned_loss=0.02902, audio_tagging_loss=0.01115, over 3054519.40 frames. ], batch size: 59, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:45:12,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=444446.6666666667, ans=0.125 2023-11-18 22:45:46,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=444646.6666666667, ans=0.0 2023-11-18 22:45:52,841 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.50 vs. limit=22.5 2023-11-18 22:46:04,535 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6600, loss[loss=0.1203, simple_loss=0.137, pruned_loss=0.03935, audio_tagging_loss=0.01247, over 15768.00 frames. ], tot_loss[loss=0.09666, simple_loss=0.1129, pruned_loss=0.02924, audio_tagging_loss=0.01095, over 3045475.86 frames. ], batch size: 58, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:46:14,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=444846.6666666667, ans=0.1 2023-11-18 22:46:25,550 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=22.5 2023-11-18 22:46:28,951 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.29 vs. limit=22.5 2023-11-18 22:46:44,722 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 9.179e+01 9.898e+01 1.109e+02 1.412e+02, threshold=1.980e+02, percent-clipped=0.0 2023-11-18 22:46:53,712 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=15.0 2023-11-18 22:46:59,490 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6650, loss[loss=0.1008, simple_loss=0.1254, pruned_loss=0.02952, audio_tagging_loss=0.008552, over 16556.00 frames. ], tot_loss[loss=0.09648, simple_loss=0.1126, pruned_loss=0.02923, audio_tagging_loss=0.01093, over 3046321.73 frames. ], batch size: 59, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:47:01,050 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2023-11-18 22:47:06,962 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.90 vs. limit=15.0 2023-11-18 22:47:32,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=445313.3333333333, ans=0.125 2023-11-18 22:47:37,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=445313.3333333333, ans=0.125 2023-11-18 22:47:39,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=445313.3333333333, ans=0.125 2023-11-18 22:47:42,194 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.45 vs. limit=12.0 2023-11-18 22:47:46,354 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=12.89 vs. limit=12.0 2023-11-18 22:47:54,893 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6700, loss[loss=0.08228, simple_loss=0.0959, pruned_loss=0.02225, audio_tagging_loss=0.01208, over 15160.00 frames. ], tot_loss[loss=0.09615, simple_loss=0.1123, pruned_loss=0.02906, audio_tagging_loss=0.01094, over 3047068.35 frames. ], batch size: 57, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:48:07,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=445513.3333333333, ans=0.07 2023-11-18 22:48:33,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=445646.6666666667, ans=0.2 2023-11-18 22:48:36,167 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 9.188e+01 9.958e+01 1.118e+02 1.458e+02, threshold=1.992e+02, percent-clipped=0.0 2023-11-18 22:48:49,013 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.05 vs. limit=15.0 2023-11-18 22:48:51,578 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6750, loss[loss=0.08547, simple_loss=0.1052, pruned_loss=0.02228, audio_tagging_loss=0.01061, over 16009.00 frames. ], tot_loss[loss=0.096, simple_loss=0.1119, pruned_loss=0.0291, audio_tagging_loss=0.01094, over 3042567.75 frames. ], batch size: 60, lr: 1.14e-02, grad_scale: 16.0 2023-11-18 22:48:51,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=445780.0, ans=0.2 2023-11-18 22:49:03,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=445846.6666666667, ans=0.125 2023-11-18 22:49:07,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=445846.6666666667, ans=0.2 2023-11-18 22:49:10,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=445846.6666666667, ans=0.1 2023-11-18 22:49:35,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=446046.6666666667, ans=0.0 2023-11-18 22:49:42,941 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2023-11-18 22:49:46,735 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6800, loss[loss=0.07506, simple_loss=0.0821, pruned_loss=0.0192, audio_tagging_loss=0.01481, over 15657.00 frames. ], tot_loss[loss=0.09663, simple_loss=0.1127, pruned_loss=0.02937, audio_tagging_loss=0.01091, over 3038677.08 frames. ], batch size: 59, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:49:54,666 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.07 vs. limit=22.5 2023-11-18 22:49:55,561 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.81 vs. limit=15.0 2023-11-18 22:50:02,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=446180.0, ans=0.125 2023-11-18 22:50:13,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=446246.6666666667, ans=0.125 2023-11-18 22:50:27,944 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.920e+01 9.995e+01 1.137e+02 1.788e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-18 22:50:32,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=446380.0, ans=0.125 2023-11-18 22:50:34,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=446380.0, ans=0.0 2023-11-18 22:50:35,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=446380.0, ans=0.125 2023-11-18 22:50:42,349 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6850, loss[loss=0.0801, simple_loss=0.0898, pruned_loss=0.02347, audio_tagging_loss=0.01173, over 14702.00 frames. ], tot_loss[loss=0.09636, simple_loss=0.1126, pruned_loss=0.02914, audio_tagging_loss=0.01093, over 3044712.37 frames. ], batch size: 56, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:50:43,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=446446.6666666667, ans=0.05 2023-11-18 22:50:53,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=446513.3333333333, ans=0.0 2023-11-18 22:51:17,016 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:51:17,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=446646.6666666667, ans=0.2 2023-11-18 22:51:30,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=446713.3333333333, ans=0.0 2023-11-18 22:51:36,073 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2023-11-18 22:51:39,182 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6900, loss[loss=0.09012, simple_loss=0.1142, pruned_loss=0.02461, audio_tagging_loss=0.008409, over 14761.00 frames. ], tot_loss[loss=0.09564, simple_loss=0.112, pruned_loss=0.02879, audio_tagging_loss=0.01087, over 3046515.10 frames. ], batch size: 55, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:51:43,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=446780.0, ans=0.125 2023-11-18 22:51:44,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=446780.0, ans=0.1 2023-11-18 22:51:45,106 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.23 vs. limit=22.5 2023-11-18 22:51:45,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=446780.0, ans=0.0 2023-11-18 22:51:51,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=446846.6666666667, ans=0.125 2023-11-18 22:51:53,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=446846.6666666667, ans=0.125 2023-11-18 22:51:55,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=446846.6666666667, ans=0.2 2023-11-18 22:52:19,853 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 9.182e+01 9.955e+01 1.058e+02 1.430e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-18 22:52:20,927 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:52:21,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=446980.0, ans=0.125 2023-11-18 22:52:34,141 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 6950, loss[loss=0.1031, simple_loss=0.1247, pruned_loss=0.02913, audio_tagging_loss=0.01157, over 15153.00 frames. ], tot_loss[loss=0.09563, simple_loss=0.1121, pruned_loss=0.02869, audio_tagging_loss=0.01088, over 3048155.06 frames. ], batch size: 58, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:52:38,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=447113.3333333333, ans=0.125 2023-11-18 22:52:42,820 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:53:12,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=447313.3333333333, ans=0.125 2023-11-18 22:53:13,843 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.26 vs. limit=15.0 2023-11-18 22:53:22,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=447380.0, ans=0.0 2023-11-18 22:53:29,857 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7000, loss[loss=0.1032, simple_loss=0.1184, pruned_loss=0.03431, audio_tagging_loss=0.009688, over 15269.00 frames. ], tot_loss[loss=0.09655, simple_loss=0.1129, pruned_loss=0.02905, audio_tagging_loss=0.01103, over 3050528.79 frames. ], batch size: 58, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:53:59,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=447580.0, ans=0.2 2023-11-18 22:54:01,706 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.88 vs. limit=15.0 2023-11-18 22:54:10,526 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.357e+01 8.734e+01 9.498e+01 1.045e+02 1.881e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-18 22:54:20,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=447713.3333333333, ans=0.2 2023-11-18 22:54:25,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=447780.0, ans=0.0 2023-11-18 22:54:25,864 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7050, loss[loss=0.1035, simple_loss=0.1164, pruned_loss=0.03378, audio_tagging_loss=0.01147, over 16562.00 frames. ], tot_loss[loss=0.09748, simple_loss=0.1141, pruned_loss=0.02945, audio_tagging_loss=0.011, over 3050293.07 frames. ], batch size: 62, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:54:31,750 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.88 vs. limit=15.0 2023-11-18 22:54:56,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=447913.3333333333, ans=0.0 2023-11-18 22:55:19,137 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.37 vs. limit=15.0 2023-11-18 22:55:21,693 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7100, loss[loss=0.06801, simple_loss=0.07083, pruned_loss=0.02097, audio_tagging_loss=0.01163, over 15033.00 frames. ], tot_loss[loss=0.09662, simple_loss=0.1129, pruned_loss=0.02908, audio_tagging_loss=0.01108, over 3057810.16 frames. ], batch size: 61, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:55:33,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=448180.0, ans=0.2 2023-11-18 22:55:36,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=448180.0, ans=0.125 2023-11-18 22:55:51,387 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.17 vs. limit=15.0 2023-11-18 22:56:02,997 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 9.011e+01 9.786e+01 1.101e+02 1.464e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-18 22:56:03,253 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=448313.3333333333, ans=0.125 2023-11-18 22:56:15,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=448446.6666666667, ans=0.1 2023-11-18 22:56:16,310 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.77 vs. limit=22.5 2023-11-18 22:56:16,826 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7150, loss[loss=0.08086, simple_loss=0.1042, pruned_loss=0.0196, audio_tagging_loss=0.009176, over 15059.00 frames. ], tot_loss[loss=0.0973, simple_loss=0.1134, pruned_loss=0.02943, audio_tagging_loss=0.01116, over 3050120.36 frames. ], batch size: 58, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:56:18,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=448446.6666666667, ans=0.0 2023-11-18 22:56:31,630 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.61 vs. limit=22.5 2023-11-18 22:56:33,501 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:56:35,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=448513.3333333333, ans=0.0 2023-11-18 22:56:40,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=448580.0, ans=0.125 2023-11-18 22:56:46,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=448580.0, ans=0.1 2023-11-18 22:57:00,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=448646.6666666667, ans=0.95 2023-11-18 22:57:13,663 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7200, loss[loss=0.07817, simple_loss=0.09064, pruned_loss=0.02104, audio_tagging_loss=0.01181, over 15962.00 frames. ], tot_loss[loss=0.09791, simple_loss=0.1142, pruned_loss=0.02958, audio_tagging_loss=0.01122, over 3050673.66 frames. ], batch size: 63, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:57:38,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=448913.3333333333, ans=0.2 2023-11-18 22:57:54,995 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.744e+01 9.483e+01 1.054e+02 1.266e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-18 22:57:55,464 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2023-11-18 22:58:02,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=449046.6666666667, ans=0.125 2023-11-18 22:58:04,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=449046.6666666667, ans=0.125 2023-11-18 22:58:05,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=449046.6666666667, ans=0.0 2023-11-18 22:58:08,832 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7250, loss[loss=0.09964, simple_loss=0.1213, pruned_loss=0.02795, audio_tagging_loss=0.01104, over 15235.00 frames. ], tot_loss[loss=0.09725, simple_loss=0.1132, pruned_loss=0.02928, audio_tagging_loss=0.01137, over 3038843.24 frames. ], batch size: 58, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:58:19,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=449180.0, ans=0.125 2023-11-18 22:58:50,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=449313.3333333333, ans=0.0 2023-11-18 22:59:04,605 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7300, loss[loss=0.1001, simple_loss=0.1135, pruned_loss=0.03086, audio_tagging_loss=0.0125, over 14764.00 frames. ], tot_loss[loss=0.09815, simple_loss=0.1145, pruned_loss=0.02983, audio_tagging_loss=0.01109, over 3049252.25 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:59:05,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=449446.6666666667, ans=0.1 2023-11-18 22:59:06,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=449446.6666666667, ans=0.1 2023-11-18 22:59:08,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=449446.6666666667, ans=0.1 2023-11-18 22:59:20,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=449513.3333333333, ans=0.125 2023-11-18 22:59:22,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=449513.3333333333, ans=0.0 2023-11-18 22:59:22,651 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.57 vs. limit=15.0 2023-11-18 22:59:23,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=449513.3333333333, ans=0.02 2023-11-18 22:59:27,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=449580.0, ans=0.0 2023-11-18 22:59:32,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=449580.0, ans=0.125 2023-11-18 22:59:45,848 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 8.690e+01 9.830e+01 1.104e+02 1.354e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-18 22:59:49,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=449713.3333333333, ans=0.0 2023-11-18 22:59:50,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=449713.3333333333, ans=0.0 2023-11-18 22:59:57,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=449713.3333333333, ans=0.0 2023-11-18 22:59:59,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=449780.0, ans=0.125 2023-11-18 23:00:00,636 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7350, loss[loss=0.06892, simple_loss=0.07767, pruned_loss=0.01913, audio_tagging_loss=0.01095, over 14873.00 frames. ], tot_loss[loss=0.09748, simple_loss=0.1139, pruned_loss=0.02954, audio_tagging_loss=0.011, over 3053025.28 frames. ], batch size: 59, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:00:04,943 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2023-11-18 23:00:10,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=449846.6666666667, ans=0.0 2023-11-18 23:00:13,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=449846.6666666667, ans=0.0 2023-11-18 23:00:15,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=449846.6666666667, ans=0.125 2023-11-18 23:00:18,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=449846.6666666667, ans=0.5 2023-11-18 23:00:36,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=449980.0, ans=0.0 2023-11-18 23:00:50,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=450046.6666666667, ans=0.0 2023-11-18 23:00:54,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=450046.6666666667, ans=0.125 2023-11-18 23:00:55,968 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7400, loss[loss=0.06896, simple_loss=0.07515, pruned_loss=0.01539, audio_tagging_loss=0.01599, over 15561.00 frames. ], tot_loss[loss=0.09785, simple_loss=0.1147, pruned_loss=0.0296, audio_tagging_loss=0.0109, over 3053138.36 frames. ], batch size: 59, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:00:58,612 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2023-11-18 23:01:08,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=450180.0, ans=0.0 2023-11-18 23:01:12,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=450180.0, ans=0.2 2023-11-18 23:01:12,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=450180.0, ans=0.125 2023-11-18 23:01:15,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=450180.0, ans=0.0 2023-11-18 23:01:27,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=450246.6666666667, ans=0.125 2023-11-18 23:01:29,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.whiten.whitening_limit, batch_count=450313.3333333333, ans=12.0 2023-11-18 23:01:37,427 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 9.086e+01 1.005e+02 1.143e+02 1.555e+02, threshold=2.009e+02, percent-clipped=0.0 2023-11-18 23:01:51,873 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7450, loss[loss=0.1177, simple_loss=0.1377, pruned_loss=0.03983, audio_tagging_loss=0.009062, over 15282.00 frames. ], tot_loss[loss=0.09781, simple_loss=0.1146, pruned_loss=0.02961, audio_tagging_loss=0.01088, over 3053150.11 frames. ], batch size: 55, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:01:58,094 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.863e-02 2023-11-18 23:02:04,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=450513.3333333333, ans=0.2 2023-11-18 23:02:20,716 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.19 vs. limit=6.0 2023-11-18 23:02:23,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=450580.0, ans=0.2 2023-11-18 23:02:31,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=450646.6666666667, ans=0.125 2023-11-18 23:02:36,268 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.68 vs. limit=15.0 2023-11-18 23:02:44,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=450713.3333333333, ans=0.1 2023-11-18 23:02:47,588 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7500, loss[loss=0.121, simple_loss=0.1422, pruned_loss=0.04213, audio_tagging_loss=0.007756, over 14597.00 frames. ], tot_loss[loss=0.09717, simple_loss=0.1138, pruned_loss=0.0295, audio_tagging_loss=0.01078, over 3049842.65 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:03:06,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=450846.6666666667, ans=0.0 2023-11-18 23:03:15,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=450913.3333333333, ans=0.0 2023-11-18 23:03:30,364 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.900e+01 8.742e+01 9.569e+01 1.067e+02 1.631e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-18 23:03:34,784 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=12.0 2023-11-18 23:03:37,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=451046.6666666667, ans=0.2 2023-11-18 23:03:41,254 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.31 vs. limit=15.0 2023-11-18 23:03:43,630 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7550, loss[loss=0.147, simple_loss=0.1707, pruned_loss=0.05099, audio_tagging_loss=0.01071, over 15373.00 frames. ], tot_loss[loss=0.09679, simple_loss=0.1133, pruned_loss=0.02929, audio_tagging_loss=0.01084, over 3056009.22 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:03:55,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=451180.0, ans=0.0 2023-11-18 23:04:00,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=451180.0, ans=0.125 2023-11-18 23:04:12,736 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.35 vs. limit=15.0 2023-11-18 23:04:27,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=451380.0, ans=0.125 2023-11-18 23:04:29,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=451380.0, ans=0.1 2023-11-18 23:04:35,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=451380.0, ans=0.125 2023-11-18 23:04:38,065 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7600, loss[loss=0.1091, simple_loss=0.121, pruned_loss=0.03309, audio_tagging_loss=0.01548, over 15009.00 frames. ], tot_loss[loss=0.09598, simple_loss=0.1121, pruned_loss=0.02892, audio_tagging_loss=0.01099, over 3052352.33 frames. ], batch size: 55, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:05:04,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=451580.0, ans=0.0 2023-11-18 23:05:13,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=451646.6666666667, ans=0.1 2023-11-18 23:05:20,678 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.440e+01 8.942e+01 9.830e+01 1.127e+02 1.912e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-18 23:05:33,287 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7650, loss[loss=0.1041, simple_loss=0.1342, pruned_loss=0.03038, audio_tagging_loss=0.006611, over 15066.00 frames. ], tot_loss[loss=0.0958, simple_loss=0.112, pruned_loss=0.02883, audio_tagging_loss=0.01097, over 3050893.15 frames. ], batch size: 55, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:06:03,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=451913.3333333333, ans=0.1 2023-11-18 23:06:11,213 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.15 vs. limit=22.5 2023-11-18 23:06:16,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=451980.0, ans=0.125 2023-11-18 23:06:22,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=452046.6666666667, ans=0.5 2023-11-18 23:06:28,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=452113.3333333333, ans=0.1 2023-11-18 23:06:29,822 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7700, loss[loss=0.0904, simple_loss=0.1013, pruned_loss=0.02708, audio_tagging_loss=0.01266, over 15218.00 frames. ], tot_loss[loss=0.09593, simple_loss=0.1122, pruned_loss=0.0288, audio_tagging_loss=0.01101, over 3046488.81 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:07:13,150 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.026e+01 8.646e+01 9.561e+01 1.050e+02 1.437e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-18 23:07:17,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=452380.0, ans=0.5 2023-11-18 23:07:17,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=452380.0, ans=0.125 2023-11-18 23:07:21,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=452380.0, ans=0.125 2023-11-18 23:07:24,980 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7750, loss[loss=0.0974, simple_loss=0.1108, pruned_loss=0.03107, audio_tagging_loss=0.01095, over 15469.00 frames. ], tot_loss[loss=0.09568, simple_loss=0.1118, pruned_loss=0.0286, audio_tagging_loss=0.01115, over 3048659.47 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:07:46,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.83 vs. limit=22.5 2023-11-18 23:07:55,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=452580.0, ans=0.04949747468305833 2023-11-18 23:08:00,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=452646.6666666667, ans=0.125 2023-11-18 23:08:01,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=452646.6666666667, ans=0.125 2023-11-18 23:08:16,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=452713.3333333333, ans=0.1 2023-11-18 23:08:19,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=452780.0, ans=0.1 2023-11-18 23:08:20,991 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7800, loss[loss=0.09902, simple_loss=0.1081, pruned_loss=0.03072, audio_tagging_loss=0.01423, over 14409.00 frames. ], tot_loss[loss=0.09627, simple_loss=0.1123, pruned_loss=0.02895, audio_tagging_loss=0.01118, over 3041555.00 frames. ], batch size: 57, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:08:23,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=452780.0, ans=0.1 2023-11-18 23:08:43,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=452913.3333333333, ans=0.2 2023-11-18 23:08:58,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=452980.0, ans=0.125 2023-11-18 23:09:00,008 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.06 vs. limit=15.0 2023-11-18 23:09:04,314 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.766e+01 9.659e+01 1.074e+02 1.731e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-18 23:09:14,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=453046.6666666667, ans=0.2 2023-11-18 23:09:17,037 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7850, loss[loss=0.08614, simple_loss=0.104, pruned_loss=0.02448, audio_tagging_loss=0.009661, over 15785.00 frames. ], tot_loss[loss=0.09636, simple_loss=0.1124, pruned_loss=0.02893, audio_tagging_loss=0.01123, over 3053700.40 frames. ], batch size: 63, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:09:26,933 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.75 vs. limit=15.0 2023-11-18 23:09:30,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=453180.0, ans=0.0 2023-11-18 23:09:50,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=453313.3333333333, ans=15.0 2023-11-18 23:10:14,386 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7900, loss[loss=0.07515, simple_loss=0.08565, pruned_loss=0.01841, audio_tagging_loss=0.01392, over 15387.00 frames. ], tot_loss[loss=0.09683, simple_loss=0.1128, pruned_loss=0.02901, audio_tagging_loss=0.0114, over 3048274.77 frames. ], batch size: 60, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:10:28,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=453513.3333333333, ans=0.125 2023-11-18 23:10:31,253 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.99 vs. limit=22.5 2023-11-18 23:10:37,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=453580.0, ans=0.1 2023-11-18 23:10:43,311 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.39 vs. limit=6.0 2023-11-18 23:10:46,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=453580.0, ans=0.0 2023-11-18 23:10:49,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=453646.6666666667, ans=0.125 2023-11-18 23:10:50,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=453646.6666666667, ans=0.125 2023-11-18 23:10:57,621 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.507e+01 8.743e+01 9.470e+01 1.071e+02 1.653e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-18 23:10:57,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=453713.3333333333, ans=0.0 2023-11-18 23:11:05,746 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.30 vs. limit=15.0 2023-11-18 23:11:09,215 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.91 vs. limit=10.0 2023-11-18 23:11:09,813 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 7950, loss[loss=0.08772, simple_loss=0.1014, pruned_loss=0.02347, audio_tagging_loss=0.01355, over 15326.00 frames. ], tot_loss[loss=0.09682, simple_loss=0.1127, pruned_loss=0.02896, audio_tagging_loss=0.0115, over 3048933.79 frames. ], batch size: 59, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:11:22,622 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:11:27,249 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2023-11-18 23:11:30,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=453846.6666666667, ans=0.1 2023-11-18 23:12:00,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=454046.6666666667, ans=0.125 2023-11-18 23:12:05,805 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8000, loss[loss=0.08131, simple_loss=0.09851, pruned_loss=0.02232, audio_tagging_loss=0.009735, over 15335.00 frames. ], tot_loss[loss=0.09566, simple_loss=0.1114, pruned_loss=0.02846, audio_tagging_loss=0.01152, over 3048152.77 frames. ], batch size: 59, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:12:13,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=454113.3333333333, ans=0.1 2023-11-18 23:12:28,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=454246.6666666667, ans=0.0 2023-11-18 23:12:29,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=454246.6666666667, ans=0.0 2023-11-18 23:12:44,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=454313.3333333333, ans=0.0 2023-11-18 23:12:48,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=454313.3333333333, ans=0.125 2023-11-18 23:12:48,897 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.670e+01 9.475e+01 1.059e+02 1.539e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-18 23:13:00,541 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8050, loss[loss=0.0713, simple_loss=0.08826, pruned_loss=0.01633, audio_tagging_loss=0.01084, over 15459.00 frames. ], tot_loss[loss=0.09564, simple_loss=0.1116, pruned_loss=0.02844, audio_tagging_loss=0.01142, over 3049137.89 frames. ], batch size: 58, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:13:00,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=454446.6666666667, ans=0.0 2023-11-18 23:13:29,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=454580.0, ans=0.5 2023-11-18 23:13:43,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=454646.6666666667, ans=0.0 2023-11-18 23:13:47,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=454713.3333333333, ans=0.0 2023-11-18 23:13:56,072 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8100, loss[loss=0.1047, simple_loss=0.1182, pruned_loss=0.03575, audio_tagging_loss=0.009833, over 15009.00 frames. ], tot_loss[loss=0.09679, simple_loss=0.1132, pruned_loss=0.02897, audio_tagging_loss=0.01122, over 3052045.15 frames. ], batch size: 55, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:14:03,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=454780.0, ans=0.125 2023-11-18 23:14:04,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=454780.0, ans=0.2 2023-11-18 23:14:05,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=454780.0, ans=0.125 2023-11-18 23:14:11,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=454846.6666666667, ans=0.07 2023-11-18 23:14:24,073 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.23 vs. limit=8.0 2023-11-18 23:14:32,904 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:14:39,583 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.949e+01 9.117e+01 9.858e+01 1.091e+02 1.353e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-18 23:14:42,419 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.24 vs. limit=15.0 2023-11-18 23:14:45,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=455046.6666666667, ans=0.07 2023-11-18 23:14:47,765 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.66 vs. limit=10.0 2023-11-18 23:14:52,353 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8150, loss[loss=0.1143, simple_loss=0.147, pruned_loss=0.03331, audio_tagging_loss=0.00744, over 15265.00 frames. ], tot_loss[loss=0.09703, simple_loss=0.1137, pruned_loss=0.02906, audio_tagging_loss=0.01112, over 3052685.81 frames. ], batch size: 58, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:14:59,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=455113.3333333333, ans=0.0 2023-11-18 23:14:59,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=455113.3333333333, ans=0.09899494936611666 2023-11-18 23:15:00,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=455113.3333333333, ans=0.0 2023-11-18 23:15:07,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=455180.0, ans=0.125 2023-11-18 23:15:09,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=455180.0, ans=0.07 2023-11-18 23:15:14,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=455246.6666666667, ans=0.015 2023-11-18 23:15:21,793 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.65 vs. limit=10.0 2023-11-18 23:15:47,071 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:15:48,096 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8200, loss[loss=0.08686, simple_loss=0.09814, pruned_loss=0.02446, audio_tagging_loss=0.01334, over 14663.00 frames. ], tot_loss[loss=0.09613, simple_loss=0.1126, pruned_loss=0.02878, audio_tagging_loss=0.01104, over 3050290.28 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:15:48,792 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.04 vs. limit=10.0 2023-11-18 23:16:01,940 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.62 vs. limit=6.0 2023-11-18 23:16:22,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=455646.6666666667, ans=0.0 2023-11-18 23:16:27,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=455646.6666666667, ans=0.125 2023-11-18 23:16:30,915 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 8.905e+01 9.848e+01 1.096e+02 1.904e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-18 23:16:33,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=455713.3333333333, ans=0.1 2023-11-18 23:16:42,988 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8250, loss[loss=0.1084, simple_loss=0.1356, pruned_loss=0.03158, audio_tagging_loss=0.008994, over 14896.00 frames. ], tot_loss[loss=0.09476, simple_loss=0.1107, pruned_loss=0.02832, audio_tagging_loss=0.01109, over 3042534.23 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:16:44,773 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.16 vs. limit=15.0 2023-11-18 23:16:57,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=455846.6666666667, ans=0.05 2023-11-18 23:16:57,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=455846.6666666667, ans=0.125 2023-11-18 23:17:10,005 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.97 vs. limit=12.0 2023-11-18 23:17:19,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.60 vs. limit=15.0 2023-11-18 23:17:20,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=455980.0, ans=0.0 2023-11-18 23:17:38,213 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8300, loss[loss=0.1367, simple_loss=0.174, pruned_loss=0.04007, audio_tagging_loss=0.009702, over 15976.00 frames. ], tot_loss[loss=0.09614, simple_loss=0.1127, pruned_loss=0.02876, audio_tagging_loss=0.01103, over 3048489.88 frames. ], batch size: 57, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:18:02,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=456246.6666666667, ans=0.2 2023-11-18 23:18:11,320 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.74 vs. limit=15.0 2023-11-18 23:18:21,238 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.810e+01 9.840e+01 1.082e+02 1.589e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-18 23:18:33,336 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8350, loss[loss=0.07805, simple_loss=0.09508, pruned_loss=0.02157, audio_tagging_loss=0.00894, over 16202.00 frames. ], tot_loss[loss=0.09655, simple_loss=0.113, pruned_loss=0.02903, audio_tagging_loss=0.01099, over 3047359.89 frames. ], batch size: 63, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:18:34,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=456446.6666666667, ans=0.2 2023-11-18 23:18:48,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=456513.3333333333, ans=0.0 2023-11-18 23:18:48,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=456513.3333333333, ans=0.2 2023-11-18 23:19:14,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=456646.6666666667, ans=0.125 2023-11-18 23:19:23,161 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:19:28,801 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8400, loss[loss=0.1104, simple_loss=0.1213, pruned_loss=0.04014, audio_tagging_loss=0.009558, over 14079.00 frames. ], tot_loss[loss=0.0958, simple_loss=0.1121, pruned_loss=0.02885, audio_tagging_loss=0.01091, over 3041447.41 frames. ], batch size: 54, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:19:34,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=456780.0, ans=0.125 2023-11-18 23:19:56,760 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.44 vs. limit=10.0 2023-11-18 23:19:59,465 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2023-11-18 23:20:12,050 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.924e+01 9.865e+01 1.104e+02 3.626e+02, threshold=1.973e+02, percent-clipped=1.0 2023-11-18 23:20:13,603 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.26 vs. limit=15.0 2023-11-18 23:20:19,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=457046.6666666667, ans=0.125 2023-11-18 23:20:24,655 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8450, loss[loss=0.1082, simple_loss=0.1271, pruned_loss=0.03182, audio_tagging_loss=0.01286, over 15370.00 frames. ], tot_loss[loss=0.09635, simple_loss=0.1128, pruned_loss=0.02901, audio_tagging_loss=0.01096, over 3042384.61 frames. ], batch size: 55, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:20:30,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=457113.3333333333, ans=0.0 2023-11-18 23:20:51,112 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.96 vs. limit=10.0 2023-11-18 23:21:19,996 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8500, loss[loss=0.07823, simple_loss=0.09624, pruned_loss=0.01981, audio_tagging_loss=0.0103, over 15501.00 frames. ], tot_loss[loss=0.09572, simple_loss=0.1119, pruned_loss=0.02872, audio_tagging_loss=0.01105, over 3038321.24 frames. ], batch size: 60, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:21:21,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=457446.6666666667, ans=0.125 2023-11-18 23:21:38,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=457513.3333333333, ans=0.125 2023-11-18 23:21:42,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=457580.0, ans=0.0 2023-11-18 23:22:04,503 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 8.613e+01 9.508e+01 1.037e+02 1.516e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-18 23:22:06,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=457713.3333333333, ans=0.2 2023-11-18 23:22:10,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=457713.3333333333, ans=0.0 2023-11-18 23:22:14,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=457780.0, ans=0.125 2023-11-18 23:22:15,588 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8550, loss[loss=0.09237, simple_loss=0.1135, pruned_loss=0.02442, audio_tagging_loss=0.01119, over 14560.00 frames. ], tot_loss[loss=0.09589, simple_loss=0.1122, pruned_loss=0.02874, audio_tagging_loss=0.01106, over 3044212.93 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:22:15,844 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:22:20,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=457780.0, ans=0.125 2023-11-18 23:22:39,409 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=457913.3333333333, ans=0.0 2023-11-18 23:23:01,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=458046.6666666667, ans=0.1 2023-11-18 23:23:11,596 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8600, loss[loss=0.09274, simple_loss=0.1106, pruned_loss=0.02541, audio_tagging_loss=0.01205, over 15874.00 frames. ], tot_loss[loss=0.09645, simple_loss=0.1128, pruned_loss=0.02893, audio_tagging_loss=0.01113, over 3036367.90 frames. ], batch size: 58, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:23:12,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=458113.3333333333, ans=0.0 2023-11-18 23:23:26,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=458180.0, ans=0.0 2023-11-18 23:23:29,718 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.47 vs. limit=15.0 2023-11-18 23:23:42,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=458246.6666666667, ans=0.025 2023-11-18 23:23:56,308 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.729e+01 9.610e+01 1.061e+02 1.458e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-18 23:24:06,890 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8650, loss[loss=0.09011, simple_loss=0.1042, pruned_loss=0.02661, audio_tagging_loss=0.01138, over 15013.00 frames. ], tot_loss[loss=0.09681, simple_loss=0.1133, pruned_loss=0.02904, audio_tagging_loss=0.01111, over 3042167.66 frames. ], batch size: 55, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:24:33,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=458580.0, ans=0.125 2023-11-18 23:24:33,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=458580.0, ans=0.125 2023-11-18 23:24:37,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=458580.0, ans=0.1 2023-11-18 23:24:45,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=458646.6666666667, ans=0.2 2023-11-18 23:24:47,175 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.92 vs. limit=22.5 2023-11-18 23:24:50,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=458713.3333333333, ans=0.1 2023-11-18 23:24:56,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=458713.3333333333, ans=0.1 2023-11-18 23:25:02,850 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8700, loss[loss=0.07185, simple_loss=0.06242, pruned_loss=0.02344, audio_tagging_loss=0.0172, over 13837.00 frames. ], tot_loss[loss=0.096, simple_loss=0.1121, pruned_loss=0.02872, audio_tagging_loss=0.01122, over 3032690.92 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:25:17,745 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.95 vs. limit=10.0 2023-11-18 23:25:47,299 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 9.228e+01 9.960e+01 1.094e+02 1.937e+02, threshold=1.992e+02, percent-clipped=1.0 2023-11-18 23:25:52,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=459046.6666666667, ans=0.2 2023-11-18 23:25:58,427 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8750, loss[loss=0.1156, simple_loss=0.1367, pruned_loss=0.0362, audio_tagging_loss=0.01103, over 15437.00 frames. ], tot_loss[loss=0.09519, simple_loss=0.1108, pruned_loss=0.02841, audio_tagging_loss=0.01139, over 3039252.83 frames. ], batch size: 57, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:26:02,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=459113.3333333333, ans=0.125 2023-11-18 23:26:07,587 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.85 vs. limit=10.0 2023-11-18 23:26:19,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=459246.6666666667, ans=0.125 2023-11-18 23:26:25,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=459246.6666666667, ans=0.0 2023-11-18 23:26:54,463 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8800, loss[loss=0.08264, simple_loss=0.08521, pruned_loss=0.02445, audio_tagging_loss=0.01559, over 15214.00 frames. ], tot_loss[loss=0.09715, simple_loss=0.1132, pruned_loss=0.02917, audio_tagging_loss=0.01138, over 3044701.07 frames. ], batch size: 58, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:26:59,371 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.39 vs. limit=22.5 2023-11-18 23:26:59,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=459446.6666666667, ans=0.0 2023-11-18 23:27:14,569 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2023-11-18 23:27:38,832 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.027e+01 8.792e+01 9.740e+01 1.071e+02 1.410e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-18 23:27:48,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=459780.0, ans=0.125 2023-11-18 23:27:49,271 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8850, loss[loss=0.1293, simple_loss=0.1481, pruned_loss=0.04534, audio_tagging_loss=0.0099, over 15234.00 frames. ], tot_loss[loss=0.09663, simple_loss=0.1123, pruned_loss=0.029, audio_tagging_loss=0.01148, over 3043892.55 frames. ], batch size: 57, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:27:50,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=459780.0, ans=0.2 2023-11-18 23:27:51,933 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.78 vs. limit=15.0 2023-11-18 23:27:58,581 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.97 vs. limit=15.0 2023-11-18 23:27:58,827 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:28:02,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=459846.6666666667, ans=0.0 2023-11-18 23:28:04,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=459846.6666666667, ans=15.0 2023-11-18 23:28:07,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=459846.6666666667, ans=0.125 2023-11-18 23:28:21,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=459913.3333333333, ans=0.1 2023-11-18 23:28:41,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=460046.6666666667, ans=0.125 2023-11-18 23:28:43,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=460046.6666666667, ans=10.0 2023-11-18 23:28:45,823 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8900, loss[loss=0.1049, simple_loss=0.1195, pruned_loss=0.03462, audio_tagging_loss=0.01054, over 15038.00 frames. ], tot_loss[loss=0.09605, simple_loss=0.1121, pruned_loss=0.02869, audio_tagging_loss=0.01132, over 3050159.66 frames. ], batch size: 57, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:29:30,093 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 9.040e+01 1.017e+02 1.156e+02 1.581e+02, threshold=2.035e+02, percent-clipped=0.0 2023-11-18 23:29:41,800 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 8950, loss[loss=0.09642, simple_loss=0.116, pruned_loss=0.0306, audio_tagging_loss=0.007818, over 15362.00 frames. ], tot_loss[loss=0.09605, simple_loss=0.1125, pruned_loss=0.02875, audio_tagging_loss=0.01103, over 3049821.02 frames. ], batch size: 57, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:30:00,306 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=22.5 2023-11-18 23:30:07,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=460580.0, ans=0.1 2023-11-18 23:30:19,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=460646.6666666667, ans=0.0 2023-11-18 23:30:33,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=460713.3333333333, ans=0.025 2023-11-18 23:30:36,561 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9000, loss[loss=0.1114, simple_loss=0.136, pruned_loss=0.03302, audio_tagging_loss=0.01036, over 14516.00 frames. ], tot_loss[loss=0.09602, simple_loss=0.1126, pruned_loss=0.02871, audio_tagging_loss=0.01103, over 3045543.45 frames. ], batch size: 55, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:30:36,561 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-18 23:31:09,111 INFO [train_asr.py:1147] (2/4) Epoch 6, validation: loss=0.07051, simple_loss=0.05865, pruned_loss=0.008039, audio_tagging_loss=0.03315, over 4681554.00 frames. 2023-11-18 23:31:09,112 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-18 23:31:26,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=460846.6666666667, ans=0.0 2023-11-18 23:31:49,368 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.19 vs. limit=10.0 2023-11-18 23:31:53,345 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.759e+01 8.670e+01 9.700e+01 1.069e+02 1.408e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-18 23:32:02,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=461046.6666666667, ans=0.125 2023-11-18 23:32:04,554 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9050, loss[loss=0.1148, simple_loss=0.1395, pruned_loss=0.03596, audio_tagging_loss=0.009086, over 15303.00 frames. ], tot_loss[loss=0.09728, simple_loss=0.1138, pruned_loss=0.02937, audio_tagging_loss=0.01099, over 3054019.39 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:32:20,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=461180.0, ans=0.0 2023-11-18 23:32:20,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=461180.0, ans=0.125 2023-11-18 23:32:23,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=461180.0, ans=0.0 2023-11-18 23:32:38,246 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.602e-01 2023-11-18 23:32:59,355 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9100, loss[loss=0.0982, simple_loss=0.1164, pruned_loss=0.03102, audio_tagging_loss=0.008987, over 15256.00 frames. ], tot_loss[loss=0.09737, simple_loss=0.1143, pruned_loss=0.02945, audio_tagging_loss=0.01076, over 3055074.31 frames. ], batch size: 59, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:33:16,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=461513.3333333333, ans=0.1 2023-11-18 23:33:21,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=461580.0, ans=0.125 2023-11-18 23:33:22,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=461580.0, ans=0.0 2023-11-18 23:33:43,693 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.366e+01 8.817e+01 9.477e+01 1.033e+02 1.344e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-18 23:33:46,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=461713.3333333333, ans=0.125 2023-11-18 23:33:48,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=461713.3333333333, ans=0.125 2023-11-18 23:33:54,710 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9150, loss[loss=0.1105, simple_loss=0.1349, pruned_loss=0.03095, audio_tagging_loss=0.01207, over 15926.00 frames. ], tot_loss[loss=0.09665, simple_loss=0.1136, pruned_loss=0.029, audio_tagging_loss=0.01085, over 3058254.33 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:33:54,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=461780.0, ans=0.125 2023-11-18 23:33:57,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=461780.0, ans=0.0 2023-11-18 23:34:20,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=461913.3333333333, ans=0.125 2023-11-18 23:34:35,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=461980.0, ans=0.0 2023-11-18 23:34:50,578 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9200, loss[loss=0.09363, simple_loss=0.09683, pruned_loss=0.03087, audio_tagging_loss=0.01435, over 14731.00 frames. ], tot_loss[loss=0.09652, simple_loss=0.1133, pruned_loss=0.02902, audio_tagging_loss=0.01084, over 3052284.87 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:34:51,098 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.83 vs. limit=15.0 2023-11-18 23:35:08,890 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.50 vs. limit=15.0 2023-11-18 23:35:28,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=462313.3333333333, ans=0.1 2023-11-18 23:35:33,918 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.421e+01 8.725e+01 9.517e+01 1.069e+02 1.464e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-18 23:35:35,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=462380.0, ans=0.125 2023-11-18 23:35:39,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=462380.0, ans=0.125 2023-11-18 23:35:44,339 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9250, loss[loss=0.1097, simple_loss=0.129, pruned_loss=0.03438, audio_tagging_loss=0.01085, over 15637.00 frames. ], tot_loss[loss=0.09646, simple_loss=0.1132, pruned_loss=0.02898, audio_tagging_loss=0.01087, over 3052478.86 frames. ], batch size: 57, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:35:47,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=462446.6666666667, ans=0.125 2023-11-18 23:36:08,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=462580.0, ans=0.05 2023-11-18 23:36:16,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=462580.0, ans=0.125 2023-11-18 23:36:39,674 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9300, loss[loss=0.1024, simple_loss=0.125, pruned_loss=0.03201, audio_tagging_loss=0.007855, over 15256.00 frames. ], tot_loss[loss=0.09689, simple_loss=0.1135, pruned_loss=0.02912, audio_tagging_loss=0.01099, over 3057290.21 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:36:57,399 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:37:23,846 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 8.607e+01 9.427e+01 1.041e+02 1.346e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-18 23:37:29,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=463046.6666666667, ans=0.125 2023-11-18 23:37:35,918 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9350, loss[loss=0.1126, simple_loss=0.143, pruned_loss=0.03173, audio_tagging_loss=0.009406, over 16089.00 frames. ], tot_loss[loss=0.09668, simple_loss=0.1132, pruned_loss=0.02903, audio_tagging_loss=0.01105, over 3062053.80 frames. ], batch size: 58, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:37:39,630 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.41 vs. limit=15.0 2023-11-18 23:37:50,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=463180.0, ans=0.0 2023-11-18 23:38:18,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=463313.3333333333, ans=0.0 2023-11-18 23:38:18,682 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.45 vs. limit=15.0 2023-11-18 23:38:19,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=463380.0, ans=0.125 2023-11-18 23:38:24,724 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.12 vs. limit=6.0 2023-11-18 23:38:25,947 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.40 vs. limit=15.0 2023-11-18 23:38:27,789 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.44 vs. limit=15.0 2023-11-18 23:38:30,377 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9400, loss[loss=0.1062, simple_loss=0.1354, pruned_loss=0.03321, audio_tagging_loss=0.005304, over 14342.00 frames. ], tot_loss[loss=0.09638, simple_loss=0.1129, pruned_loss=0.02882, audio_tagging_loss=0.01109, over 3061232.36 frames. ], batch size: 53, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:38:35,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=463446.6666666667, ans=0.035 2023-11-18 23:38:36,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=463446.6666666667, ans=0.2 2023-11-18 23:38:57,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=463580.0, ans=0.1 2023-11-18 23:39:07,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=463646.6666666667, ans=0.0 2023-11-18 23:39:15,428 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.264e+01 8.967e+01 9.961e+01 1.049e+02 1.500e+02, threshold=1.992e+02, percent-clipped=0.0 2023-11-18 23:39:20,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=463713.3333333333, ans=0.125 2023-11-18 23:39:21,675 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:39:25,419 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9450, loss[loss=0.0864, simple_loss=0.1045, pruned_loss=0.02342, audio_tagging_loss=0.01072, over 15527.00 frames. ], tot_loss[loss=0.0964, simple_loss=0.1127, pruned_loss=0.02884, audio_tagging_loss=0.0112, over 3061737.62 frames. ], batch size: 59, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:39:38,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=463846.6666666667, ans=0.125 2023-11-18 23:39:42,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=463846.6666666667, ans=0.125 2023-11-18 23:39:55,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=463913.3333333333, ans=0.125 2023-11-18 23:39:57,319 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.20 vs. limit=22.5 2023-11-18 23:40:03,699 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.59 vs. limit=22.5 2023-11-18 23:40:09,524 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.26 vs. limit=22.5 2023-11-18 23:40:13,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=464046.6666666667, ans=0.09899494936611666 2023-11-18 23:40:19,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=464046.6666666667, ans=0.0 2023-11-18 23:40:20,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=464113.3333333333, ans=0.125 2023-11-18 23:40:21,506 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9500, loss[loss=0.08527, simple_loss=0.09819, pruned_loss=0.02756, audio_tagging_loss=0.008614, over 14210.00 frames. ], tot_loss[loss=0.09667, simple_loss=0.1132, pruned_loss=0.02886, audio_tagging_loss=0.0112, over 3057784.72 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:40:25,176 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.52 vs. limit=10.0 2023-11-18 23:40:46,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=464246.6666666667, ans=0.125 2023-11-18 23:40:55,273 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.55 vs. limit=15.0 2023-11-18 23:41:05,776 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.26 vs. limit=22.5 2023-11-18 23:41:07,216 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 9.036e+01 9.803e+01 1.077e+02 1.985e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-18 23:41:08,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=464380.0, ans=0.125 2023-11-18 23:41:13,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=464380.0, ans=0.2 2023-11-18 23:41:17,273 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9550, loss[loss=0.08603, simple_loss=0.1014, pruned_loss=0.02418, audio_tagging_loss=0.01116, over 14569.00 frames. ], tot_loss[loss=0.09678, simple_loss=0.1134, pruned_loss=0.02888, audio_tagging_loss=0.01119, over 3055465.55 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:41:19,922 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.68 vs. limit=15.0 2023-11-18 23:42:12,593 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9600, loss[loss=0.06408, simple_loss=0.0748, pruned_loss=0.01513, audio_tagging_loss=0.01155, over 14972.00 frames. ], tot_loss[loss=0.0964, simple_loss=0.1128, pruned_loss=0.02871, audio_tagging_loss=0.01128, over 3052937.81 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:42:25,116 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.01 vs. limit=22.5 2023-11-18 23:42:28,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=464846.6666666667, ans=0.0 2023-11-18 23:42:28,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=464846.6666666667, ans=0.0 2023-11-18 23:42:35,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=464913.3333333333, ans=0.2 2023-11-18 23:42:44,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=464913.3333333333, ans=0.1 2023-11-18 23:42:58,488 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 8.742e+01 9.765e+01 1.051e+02 1.389e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-18 23:43:02,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=465046.6666666667, ans=0.125 2023-11-18 23:43:06,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=465046.6666666667, ans=0.125 2023-11-18 23:43:09,306 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9650, loss[loss=0.1224, simple_loss=0.1543, pruned_loss=0.03525, audio_tagging_loss=0.01005, over 15279.00 frames. ], tot_loss[loss=0.09692, simple_loss=0.1137, pruned_loss=0.02889, audio_tagging_loss=0.01119, over 3049461.46 frames. ], batch size: 55, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:43:34,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=465246.6666666667, ans=0.1 2023-11-18 23:43:41,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=465313.3333333333, ans=0.125 2023-11-18 23:43:47,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=465313.3333333333, ans=0.125 2023-11-18 23:43:58,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=465380.0, ans=0.125 2023-11-18 23:43:58,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=465380.0, ans=0.125 2023-11-18 23:44:01,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=465380.0, ans=0.2 2023-11-18 23:44:01,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=465380.0, ans=0.125 2023-11-18 23:44:04,658 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9700, loss[loss=0.07828, simple_loss=0.08986, pruned_loss=0.02387, audio_tagging_loss=0.009477, over 15197.00 frames. ], tot_loss[loss=0.09748, simple_loss=0.1145, pruned_loss=0.02922, audio_tagging_loss=0.01103, over 3050952.46 frames. ], batch size: 58, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:44:22,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=465513.3333333333, ans=0.125 2023-11-18 23:44:23,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=465513.3333333333, ans=0.2 2023-11-18 23:44:24,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=465513.3333333333, ans=0.125 2023-11-18 23:44:50,714 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.590e+01 9.606e+01 1.115e+02 1.456e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-18 23:44:52,037 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.554e-01 2023-11-18 23:45:00,258 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9750, loss[loss=0.08186, simple_loss=0.09584, pruned_loss=0.02291, audio_tagging_loss=0.01104, over 15716.00 frames. ], tot_loss[loss=0.09667, simple_loss=0.1136, pruned_loss=0.0289, audio_tagging_loss=0.01094, over 3052526.12 frames. ], batch size: 63, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:45:01,539 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.003e-02 2023-11-18 23:45:40,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=465980.0, ans=0.0 2023-11-18 23:45:46,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=466046.6666666667, ans=0.0 2023-11-18 23:45:57,146 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9800, loss[loss=0.1027, simple_loss=0.1217, pruned_loss=0.0336, audio_tagging_loss=0.008297, over 15059.00 frames. ], tot_loss[loss=0.09636, simple_loss=0.1131, pruned_loss=0.02889, audio_tagging_loss=0.01093, over 3051758.34 frames. ], batch size: 55, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:46:02,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=466113.3333333333, ans=0.125 2023-11-18 23:46:04,274 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.56 vs. limit=15.0 2023-11-18 23:46:11,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=466180.0, ans=0.1 2023-11-18 23:46:26,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=466246.6666666667, ans=0.2 2023-11-18 23:46:36,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=466313.3333333333, ans=0.125 2023-11-18 23:46:38,734 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.44 vs. limit=22.5 2023-11-18 23:46:42,479 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.868e+01 8.589e+01 9.720e+01 1.056e+02 1.437e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-18 23:46:42,886 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=15.0 2023-11-18 23:46:44,613 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:46:52,053 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9850, loss[loss=0.08763, simple_loss=0.1091, pruned_loss=0.02153, audio_tagging_loss=0.01154, over 14517.00 frames. ], tot_loss[loss=0.09708, simple_loss=0.114, pruned_loss=0.02931, audio_tagging_loss=0.01079, over 3046245.41 frames. ], batch size: 54, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:47:33,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=466646.6666666667, ans=0.2 2023-11-18 23:47:47,557 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9900, loss[loss=0.08327, simple_loss=0.09385, pruned_loss=0.02614, audio_tagging_loss=0.01021, over 16410.00 frames. ], tot_loss[loss=0.09686, simple_loss=0.1136, pruned_loss=0.02918, audio_tagging_loss=0.01086, over 3045174.64 frames. ], batch size: 64, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:47:57,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=466846.6666666667, ans=0.125 2023-11-18 23:47:59,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=466846.6666666667, ans=0.1 2023-11-18 23:48:02,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=466846.6666666667, ans=0.07 2023-11-18 23:48:10,771 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2023-11-18 23:48:15,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=466913.3333333333, ans=0.1 2023-11-18 23:48:19,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=466980.0, ans=0.0 2023-11-18 23:48:26,393 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.70 vs. limit=15.0 2023-11-18 23:48:29,724 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.60 vs. limit=10.0 2023-11-18 23:48:32,822 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.839e+01 9.469e+01 1.066e+02 1.468e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-18 23:48:43,395 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 9950, loss[loss=0.09552, simple_loss=0.111, pruned_loss=0.02769, audio_tagging_loss=0.01235, over 14997.00 frames. ], tot_loss[loss=0.09615, simple_loss=0.1127, pruned_loss=0.02896, audio_tagging_loss=0.01085, over 3044524.52 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:48:46,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=467113.3333333333, ans=0.2 2023-11-18 23:48:50,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=467113.3333333333, ans=0.125 2023-11-18 23:48:58,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=467180.0, ans=0.125 2023-11-18 23:49:10,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=467246.6666666667, ans=0.0 2023-11-18 23:49:38,709 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10000, loss[loss=0.0931, simple_loss=0.1106, pruned_loss=0.02377, audio_tagging_loss=0.01405, over 15303.00 frames. ], tot_loss[loss=0.09531, simple_loss=0.1118, pruned_loss=0.0286, audio_tagging_loss=0.01083, over 3045156.50 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:49:55,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=467513.3333333333, ans=0.95 2023-11-18 23:50:03,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=467580.0, ans=0.0 2023-11-18 23:50:10,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=467580.0, ans=0.125 2023-11-18 23:50:11,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=467646.6666666667, ans=15.0 2023-11-18 23:50:13,702 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=22.5 2023-11-18 23:50:17,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=467646.6666666667, ans=0.1 2023-11-18 23:50:23,829 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.767e+01 8.943e+01 9.834e+01 1.077e+02 1.357e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-18 23:50:27,717 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.82 vs. limit=22.5 2023-11-18 23:50:33,369 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10050, loss[loss=0.08347, simple_loss=0.08773, pruned_loss=0.02476, audio_tagging_loss=0.01484, over 14219.00 frames. ], tot_loss[loss=0.09512, simple_loss=0.1116, pruned_loss=0.02841, audio_tagging_loss=0.0109, over 3041687.55 frames. ], batch size: 55, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:50:40,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=467780.0, ans=0.125 2023-11-18 23:51:23,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=468046.6666666667, ans=0.125 2023-11-18 23:51:28,608 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10100, loss[loss=0.09947, simple_loss=0.1118, pruned_loss=0.02867, audio_tagging_loss=0.01489, over 15252.00 frames. ], tot_loss[loss=0.0959, simple_loss=0.1125, pruned_loss=0.02872, audio_tagging_loss=0.01093, over 3051632.94 frames. ], batch size: 58, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:52:02,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=468313.3333333333, ans=0.0 2023-11-18 23:52:08,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=468313.3333333333, ans=0.0 2023-11-18 23:52:08,637 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.39 vs. limit=12.0 2023-11-18 23:52:09,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=468313.3333333333, ans=0.0 2023-11-18 23:52:10,326 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:52:15,594 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.783e+01 8.875e+01 9.595e+01 1.083e+02 1.455e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-18 23:52:17,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=468380.0, ans=0.125 2023-11-18 23:52:17,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=468380.0, ans=0.125 2023-11-18 23:52:21,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=468380.0, ans=0.0 2023-11-18 23:52:24,028 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10150, loss[loss=0.1013, simple_loss=0.1084, pruned_loss=0.03625, audio_tagging_loss=0.01081, over 15684.00 frames. ], tot_loss[loss=0.09635, simple_loss=0.1128, pruned_loss=0.02896, audio_tagging_loss=0.01098, over 3058253.78 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:52:26,633 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.08 vs. limit=6.0 2023-11-18 23:52:33,950 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.40 vs. limit=15.0 2023-11-18 23:52:36,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=468513.3333333333, ans=0.1 2023-11-18 23:52:40,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=468513.3333333333, ans=0.125 2023-11-18 23:52:46,783 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:52:47,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=468580.0, ans=0.0 2023-11-18 23:53:05,243 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=468646.6666666667, ans=0.125 2023-11-18 23:53:09,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=468713.3333333333, ans=0.0 2023-11-18 23:53:13,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=468713.3333333333, ans=0.0 2023-11-18 23:53:16,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=468713.3333333333, ans=0.07 2023-11-18 23:53:18,611 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10200, loss[loss=0.1037, simple_loss=0.1283, pruned_loss=0.02749, audio_tagging_loss=0.01209, over 15950.00 frames. ], tot_loss[loss=0.09642, simple_loss=0.1127, pruned_loss=0.02887, audio_tagging_loss=0.01119, over 3066783.70 frames. ], batch size: 58, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:53:30,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=468846.6666666667, ans=0.0 2023-11-18 23:53:37,351 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:53:39,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=468846.6666666667, ans=0.0 2023-11-18 23:53:43,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=468913.3333333333, ans=0.0 2023-11-18 23:53:48,234 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:53:53,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=468980.0, ans=0.125 2023-11-18 23:54:01,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=469046.6666666667, ans=0.125 2023-11-18 23:54:04,272 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=469046.6666666667, ans=10.0 2023-11-18 23:54:04,805 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.998e+01 1.003e+02 1.100e+02 1.354e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-18 23:54:07,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=469046.6666666667, ans=0.2 2023-11-18 23:54:08,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=469046.6666666667, ans=0.125 2023-11-18 23:54:10,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=469046.6666666667, ans=0.125 2023-11-18 23:54:13,816 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10250, loss[loss=0.07839, simple_loss=0.08984, pruned_loss=0.02157, audio_tagging_loss=0.01189, over 14797.00 frames. ], tot_loss[loss=0.09595, simple_loss=0.1124, pruned_loss=0.0286, audio_tagging_loss=0.01114, over 3055234.55 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:54:15,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=469113.3333333333, ans=0.125 2023-11-18 23:54:19,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=469113.3333333333, ans=0.95 2023-11-18 23:54:22,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=469113.3333333333, ans=15.0 2023-11-18 23:54:34,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=469180.0, ans=0.125 2023-11-18 23:54:38,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=469246.6666666667, ans=0.125 2023-11-18 23:54:54,338 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:55:03,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=469380.0, ans=0.0 2023-11-18 23:55:10,096 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10300, loss[loss=0.1032, simple_loss=0.1169, pruned_loss=0.03201, audio_tagging_loss=0.01274, over 15418.00 frames. ], tot_loss[loss=0.09576, simple_loss=0.1123, pruned_loss=0.0284, audio_tagging_loss=0.01118, over 3057174.17 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:55:37,823 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.72 vs. limit=12.0 2023-11-18 23:55:56,572 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 9.104e+01 1.005e+02 1.145e+02 1.607e+02, threshold=2.009e+02, percent-clipped=0.0 2023-11-18 23:55:58,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=469713.3333333333, ans=0.125 2023-11-18 23:56:05,045 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10350, loss[loss=0.1156, simple_loss=0.1359, pruned_loss=0.03878, audio_tagging_loss=0.008917, over 15161.00 frames. ], tot_loss[loss=0.09602, simple_loss=0.1123, pruned_loss=0.02854, audio_tagging_loss=0.01133, over 3056235.64 frames. ], batch size: 54, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:56:19,756 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.62 vs. limit=22.5 2023-11-18 23:56:31,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=469913.3333333333, ans=0.2 2023-11-18 23:56:37,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=469980.0, ans=0.125 2023-11-18 23:56:39,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=469980.0, ans=0.125 2023-11-18 23:56:40,242 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.47 vs. limit=15.0 2023-11-18 23:56:43,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=469980.0, ans=0.04949747468305833 2023-11-18 23:56:54,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=470046.6666666667, ans=0.1 2023-11-18 23:57:00,327 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10400, loss[loss=0.105, simple_loss=0.1372, pruned_loss=0.02291, audio_tagging_loss=0.0135, over 16818.00 frames. ], tot_loss[loss=0.09588, simple_loss=0.1123, pruned_loss=0.02826, audio_tagging_loss=0.01145, over 3059549.97 frames. ], batch size: 59, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:57:05,205 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:57:22,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=470246.6666666667, ans=0.0 2023-11-18 23:57:26,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=470246.6666666667, ans=0.0 2023-11-18 23:57:38,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=470313.3333333333, ans=0.2 2023-11-18 23:57:41,016 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=12.0 2023-11-18 23:57:47,236 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.298e+01 8.610e+01 9.344e+01 1.013e+02 2.407e+02, threshold=1.869e+02, percent-clipped=1.0 2023-11-18 23:57:51,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=470380.0, ans=0.125 2023-11-18 23:57:53,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=470380.0, ans=0.125 2023-11-18 23:57:56,729 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10450, loss[loss=0.1224, simple_loss=0.1486, pruned_loss=0.04058, audio_tagging_loss=0.00749, over 15701.00 frames. ], tot_loss[loss=0.09589, simple_loss=0.1122, pruned_loss=0.02845, audio_tagging_loss=0.01133, over 3052275.43 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:58:41,481 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:58:41,965 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.02 vs. limit=6.0 2023-11-18 23:58:51,740 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10500, loss[loss=0.09003, simple_loss=0.1075, pruned_loss=0.0265, audio_tagging_loss=0.009792, over 14796.00 frames. ], tot_loss[loss=0.09541, simple_loss=0.1117, pruned_loss=0.02845, audio_tagging_loss=0.01112, over 3053884.13 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:58:52,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=470780.0, ans=0.0 2023-11-18 23:58:54,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=470780.0, ans=0.125 2023-11-18 23:59:07,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=470846.6666666667, ans=0.2 2023-11-18 23:59:24,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=470980.0, ans=0.2 2023-11-18 23:59:33,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=470980.0, ans=0.0 2023-11-18 23:59:38,867 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.568e+01 9.489e+01 1.065e+02 1.523e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-18 23:59:46,879 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10550, loss[loss=0.1035, simple_loss=0.1246, pruned_loss=0.03288, audio_tagging_loss=0.008355, over 15744.00 frames. ], tot_loss[loss=0.09494, simple_loss=0.1112, pruned_loss=0.0283, audio_tagging_loss=0.01102, over 3052018.35 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:59:56,953 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.64 vs. limit=12.0 2023-11-19 00:00:14,365 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.46 vs. limit=15.0 2023-11-19 00:00:27,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=471313.3333333333, ans=0.0 2023-11-19 00:00:43,093 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10600, loss[loss=0.08363, simple_loss=0.09478, pruned_loss=0.02621, audio_tagging_loss=0.01002, over 13101.00 frames. ], tot_loss[loss=0.09483, simple_loss=0.1113, pruned_loss=0.02814, audio_tagging_loss=0.01104, over 3049290.25 frames. ], batch size: 53, lr: 1.11e-02, grad_scale: 16.0 2023-11-19 00:00:45,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=471446.6666666667, ans=0.125 2023-11-19 00:00:45,747 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.14 vs. limit=15.0 2023-11-19 00:01:15,280 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=471646.6666666667, ans=0.2 2023-11-19 00:01:31,015 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.594e+01 9.038e+01 9.665e+01 1.088e+02 1.655e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-19 00:01:36,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=471713.3333333333, ans=0.0 2023-11-19 00:01:38,981 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10650, loss[loss=0.1018, simple_loss=0.1216, pruned_loss=0.03291, audio_tagging_loss=0.008037, over 15603.00 frames. ], tot_loss[loss=0.09477, simple_loss=0.1114, pruned_loss=0.02806, audio_tagging_loss=0.01101, over 3043352.78 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 16.0 2023-11-19 00:01:46,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=471780.0, ans=0.1 2023-11-19 00:01:53,798 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.39 vs. limit=22.5 2023-11-19 00:02:16,433 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.77 vs. limit=22.5 2023-11-19 00:02:17,239 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=7.729e-02 2023-11-19 00:02:19,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=471980.0, ans=0.125 2023-11-19 00:02:30,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=472046.6666666667, ans=0.1 2023-11-19 00:02:32,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=472046.6666666667, ans=0.125 2023-11-19 00:02:32,389 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:02:34,788 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10700, loss[loss=0.08882, simple_loss=0.1054, pruned_loss=0.02917, audio_tagging_loss=0.006947, over 14949.00 frames. ], tot_loss[loss=0.09529, simple_loss=0.1121, pruned_loss=0.02826, audio_tagging_loss=0.01097, over 3047078.92 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 16.0 2023-11-19 00:02:35,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=472113.3333333333, ans=0.0 2023-11-19 00:02:37,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=472113.3333333333, ans=0.125 2023-11-19 00:02:41,697 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.11 vs. limit=15.0 2023-11-19 00:02:47,071 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.95 vs. limit=15.0 2023-11-19 00:02:51,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=472180.0, ans=0.125 2023-11-19 00:02:56,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=472246.6666666667, ans=0.125 2023-11-19 00:02:57,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=472246.6666666667, ans=0.2 2023-11-19 00:03:03,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=472246.6666666667, ans=0.0 2023-11-19 00:03:22,301 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.388e+01 8.951e+01 9.625e+01 1.080e+02 1.426e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-19 00:03:22,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=472380.0, ans=0.1 2023-11-19 00:03:30,897 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10750, loss[loss=0.09705, simple_loss=0.1104, pruned_loss=0.03328, audio_tagging_loss=0.008587, over 14680.00 frames. ], tot_loss[loss=0.09582, simple_loss=0.1129, pruned_loss=0.02849, audio_tagging_loss=0.01087, over 3048567.86 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 16.0 2023-11-19 00:03:32,287 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=472446.6666666667, ans=0.125 2023-11-19 00:03:49,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=472513.3333333333, ans=0.125 2023-11-19 00:04:01,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=472580.0, ans=0.125 2023-11-19 00:04:20,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=472713.3333333333, ans=0.125 2023-11-19 00:04:25,553 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10800, loss[loss=0.08015, simple_loss=0.09194, pruned_loss=0.02391, audio_tagging_loss=0.01027, over 14909.00 frames. ], tot_loss[loss=0.0953, simple_loss=0.1122, pruned_loss=0.02825, audio_tagging_loss=0.01095, over 3048319.59 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 32.0 2023-11-19 00:04:41,866 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.39 vs. limit=22.5 2023-11-19 00:05:04,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=472980.0, ans=0.0 2023-11-19 00:05:13,539 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 8.608e+01 9.354e+01 1.065e+02 1.440e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-19 00:05:15,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=473046.6666666667, ans=0.125 2023-11-19 00:05:20,959 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10850, loss[loss=0.1347, simple_loss=0.1597, pruned_loss=0.04727, audio_tagging_loss=0.007632, over 15945.00 frames. ], tot_loss[loss=0.09602, simple_loss=0.113, pruned_loss=0.02858, audio_tagging_loss=0.01095, over 3048847.38 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:05:36,761 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.93 vs. limit=22.5 2023-11-19 00:05:40,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=473180.0, ans=0.1 2023-11-19 00:05:42,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=473246.6666666667, ans=0.125 2023-11-19 00:06:11,161 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:06:17,539 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10900, loss[loss=0.1079, simple_loss=0.1246, pruned_loss=0.03394, audio_tagging_loss=0.01161, over 14948.00 frames. ], tot_loss[loss=0.09576, simple_loss=0.1126, pruned_loss=0.02849, audio_tagging_loss=0.011, over 3043859.26 frames. ], batch size: 56, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:06:44,042 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.38 vs. limit=15.0 2023-11-19 00:06:45,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=473580.0, ans=0.035 2023-11-19 00:06:54,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=473646.6666666667, ans=0.2 2023-11-19 00:07:01,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=473713.3333333333, ans=0.0 2023-11-19 00:07:05,078 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 8.580e+01 9.550e+01 1.089e+02 1.595e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-19 00:07:07,712 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.83 vs. limit=15.0 2023-11-19 00:07:12,538 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 10950, loss[loss=0.07504, simple_loss=0.08783, pruned_loss=0.01894, audio_tagging_loss=0.01219, over 15520.00 frames. ], tot_loss[loss=0.09699, simple_loss=0.1137, pruned_loss=0.02906, audio_tagging_loss=0.01111, over 3047715.89 frames. ], batch size: 59, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:07:14,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=473780.0, ans=0.0 2023-11-19 00:07:15,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=473780.0, ans=0.125 2023-11-19 00:07:35,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=473913.3333333333, ans=0.0 2023-11-19 00:07:42,224 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.409e-02 2023-11-19 00:07:42,711 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.87 vs. limit=15.0 2023-11-19 00:07:51,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=473980.0, ans=0.125 2023-11-19 00:07:55,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=474046.6666666667, ans=0.0 2023-11-19 00:07:58,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=474046.6666666667, ans=0.0 2023-11-19 00:08:01,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=474046.6666666667, ans=0.2 2023-11-19 00:08:02,013 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=474046.6666666667, ans=0.1 2023-11-19 00:08:07,094 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=15.0 2023-11-19 00:08:07,624 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11000, loss[loss=0.09268, simple_loss=0.1091, pruned_loss=0.02387, audio_tagging_loss=0.01425, over 15245.00 frames. ], tot_loss[loss=0.09697, simple_loss=0.1136, pruned_loss=0.02906, audio_tagging_loss=0.0111, over 3041400.41 frames. ], batch size: 56, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:08:11,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=474113.3333333333, ans=0.1 2023-11-19 00:08:14,388 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:08:29,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=474246.6666666667, ans=0.125 2023-11-19 00:08:35,066 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.06 vs. limit=22.5 2023-11-19 00:08:46,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=474313.3333333333, ans=0.2 2023-11-19 00:08:52,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=474380.0, ans=0.0 2023-11-19 00:08:56,476 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 9.044e+01 9.862e+01 1.100e+02 1.802e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-19 00:09:03,357 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11050, loss[loss=0.09766, simple_loss=0.1004, pruned_loss=0.0263, audio_tagging_loss=0.02116, over 14919.00 frames. ], tot_loss[loss=0.09623, simple_loss=0.1123, pruned_loss=0.02876, audio_tagging_loss=0.01131, over 3040722.61 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:09:07,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=474446.6666666667, ans=0.0 2023-11-19 00:09:37,856 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:09:42,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=474646.6666666667, ans=0.1 2023-11-19 00:09:43,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=474646.6666666667, ans=0.0 2023-11-19 00:09:43,663 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.78 vs. limit=15.0 2023-11-19 00:09:47,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=474713.3333333333, ans=0.0 2023-11-19 00:09:59,069 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11100, loss[loss=0.09261, simple_loss=0.1142, pruned_loss=0.02498, audio_tagging_loss=0.01051, over 15919.00 frames. ], tot_loss[loss=0.09661, simple_loss=0.1129, pruned_loss=0.02884, audio_tagging_loss=0.01134, over 3047453.91 frames. ], batch size: 60, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:10:05,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=474780.0, ans=0.125 2023-11-19 00:10:19,941 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.26 vs. limit=15.0 2023-11-19 00:10:20,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=474913.3333333333, ans=0.0 2023-11-19 00:10:32,156 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=474980.0, ans=0.1 2023-11-19 00:10:35,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff3.min_abs, batch_count=474980.0, ans=0.2 2023-11-19 00:10:42,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=475046.6666666667, ans=0.125 2023-11-19 00:10:47,793 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.819e+01 8.698e+01 9.641e+01 1.040e+02 1.445e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-19 00:10:54,145 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11150, loss[loss=0.1021, simple_loss=0.1163, pruned_loss=0.03339, audio_tagging_loss=0.0106, over 14701.00 frames. ], tot_loss[loss=0.09631, simple_loss=0.1122, pruned_loss=0.02868, audio_tagging_loss=0.01155, over 3040875.23 frames. ], batch size: 53, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:11:00,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=475113.3333333333, ans=0.0 2023-11-19 00:11:25,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=475246.6666666667, ans=0.1 2023-11-19 00:11:47,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=475380.0, ans=0.125 2023-11-19 00:11:49,596 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11200, loss[loss=0.1001, simple_loss=0.1135, pruned_loss=0.03188, audio_tagging_loss=0.01143, over 15397.00 frames. ], tot_loss[loss=0.09625, simple_loss=0.1122, pruned_loss=0.02851, audio_tagging_loss=0.01165, over 3049495.37 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:11:49,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=475446.6666666667, ans=0.125 2023-11-19 00:12:10,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=475513.3333333333, ans=0.125 2023-11-19 00:12:10,324 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.09 vs. limit=22.5 2023-11-19 00:12:22,736 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:12:39,480 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 8.628e+01 9.761e+01 1.045e+02 1.473e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-19 00:12:45,836 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11250, loss[loss=0.08019, simple_loss=0.08648, pruned_loss=0.02503, audio_tagging_loss=0.01191, over 14314.00 frames. ], tot_loss[loss=0.09663, simple_loss=0.1127, pruned_loss=0.02877, audio_tagging_loss=0.01148, over 3051606.52 frames. ], batch size: 55, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:12:52,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=475780.0, ans=0.1 2023-11-19 00:13:16,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=475913.3333333333, ans=0.125 2023-11-19 00:13:18,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=475980.0, ans=0.125 2023-11-19 00:13:30,994 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.32 vs. limit=15.0 2023-11-19 00:13:41,020 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11300, loss[loss=0.1249, simple_loss=0.1521, pruned_loss=0.03995, audio_tagging_loss=0.008868, over 15882.00 frames. ], tot_loss[loss=0.09574, simple_loss=0.1113, pruned_loss=0.02871, audio_tagging_loss=0.01138, over 3049126.60 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:14:04,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=476246.6666666667, ans=0.125 2023-11-19 00:14:29,494 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.711e+01 9.512e+01 1.035e+02 1.315e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 00:14:36,319 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11350, loss[loss=0.09559, simple_loss=0.1152, pruned_loss=0.02907, audio_tagging_loss=0.008944, over 16488.00 frames. ], tot_loss[loss=0.0955, simple_loss=0.111, pruned_loss=0.02872, audio_tagging_loss=0.01127, over 3042104.71 frames. ], batch size: 61, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:14:47,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=476513.3333333333, ans=0.125 2023-11-19 00:15:10,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=476646.6666666667, ans=0.125 2023-11-19 00:15:11,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=476646.6666666667, ans=0.0 2023-11-19 00:15:21,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=476713.3333333333, ans=0.0 2023-11-19 00:15:32,730 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11400, loss[loss=0.1012, simple_loss=0.1153, pruned_loss=0.03071, audio_tagging_loss=0.01279, over 15096.00 frames. ], tot_loss[loss=0.09543, simple_loss=0.1114, pruned_loss=0.02867, audio_tagging_loss=0.01106, over 3036342.75 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:15:43,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=476846.6666666667, ans=0.125 2023-11-19 00:15:43,976 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.16 vs. limit=22.5 2023-11-19 00:15:50,380 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2023-11-19 00:15:51,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=476846.6666666667, ans=0.0 2023-11-19 00:15:57,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=476913.3333333333, ans=0.015 2023-11-19 00:16:08,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=476980.0, ans=0.125 2023-11-19 00:16:12,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=476980.0, ans=0.125 2023-11-19 00:16:12,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=476980.0, ans=0.025 2023-11-19 00:16:20,945 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.841e+01 9.746e+01 1.056e+02 1.411e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-19 00:16:27,296 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11450, loss[loss=0.124, simple_loss=0.1562, pruned_loss=0.03567, audio_tagging_loss=0.01016, over 15775.00 frames. ], tot_loss[loss=0.09474, simple_loss=0.1109, pruned_loss=0.02825, audio_tagging_loss=0.01106, over 3031662.89 frames. ], batch size: 55, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:16:40,939 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.87 vs. limit=15.0 2023-11-19 00:16:51,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=477246.6666666667, ans=0.125 2023-11-19 00:17:15,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=477380.0, ans=0.125 2023-11-19 00:17:18,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=477380.0, ans=0.125 2023-11-19 00:17:22,950 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11500, loss[loss=0.1054, simple_loss=0.1302, pruned_loss=0.03111, audio_tagging_loss=0.009186, over 15002.00 frames. ], tot_loss[loss=0.0948, simple_loss=0.111, pruned_loss=0.02827, audio_tagging_loss=0.01104, over 3033330.52 frames. ], batch size: 55, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:17:37,938 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.96 vs. limit=22.5 2023-11-19 00:17:49,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=477580.0, ans=0.0 2023-11-19 00:17:49,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=477580.0, ans=0.125 2023-11-19 00:18:00,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=477646.6666666667, ans=0.035 2023-11-19 00:18:03,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.03 vs. limit=6.0 2023-11-19 00:18:11,941 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 8.969e+01 9.661e+01 1.076e+02 1.537e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-19 00:18:19,411 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11550, loss[loss=0.1034, simple_loss=0.1149, pruned_loss=0.03405, audio_tagging_loss=0.01193, over 15627.00 frames. ], tot_loss[loss=0.09528, simple_loss=0.1117, pruned_loss=0.0284, audio_tagging_loss=0.01101, over 3040229.09 frames. ], batch size: 60, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:18:31,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=477846.6666666667, ans=0.125 2023-11-19 00:18:39,722 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:18:49,479 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:19:05,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=478046.6666666667, ans=0.0 2023-11-19 00:19:14,383 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11600, loss[loss=0.08341, simple_loss=0.09268, pruned_loss=0.02456, audio_tagging_loss=0.01251, over 14268.00 frames. ], tot_loss[loss=0.09511, simple_loss=0.1117, pruned_loss=0.0282, audio_tagging_loss=0.01104, over 3043672.75 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:19:48,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=478313.3333333333, ans=0.125 2023-11-19 00:20:02,995 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.140e+01 8.994e+01 9.981e+01 1.100e+02 1.554e+02, threshold=1.996e+02, percent-clipped=0.0 2023-11-19 00:20:03,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=478380.0, ans=0.0 2023-11-19 00:20:07,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=478380.0, ans=0.0 2023-11-19 00:20:09,877 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11650, loss[loss=0.1097, simple_loss=0.1327, pruned_loss=0.03261, audio_tagging_loss=0.01075, over 16475.00 frames. ], tot_loss[loss=0.09523, simple_loss=0.1119, pruned_loss=0.02825, audio_tagging_loss=0.01104, over 3042203.21 frames. ], batch size: 60, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:20:12,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=478446.6666666667, ans=0.0 2023-11-19 00:20:27,780 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=7.883e-02 2023-11-19 00:20:35,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=478580.0, ans=0.5 2023-11-19 00:20:50,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=478646.6666666667, ans=0.125 2023-11-19 00:21:00,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=478713.3333333333, ans=0.125 2023-11-19 00:21:01,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=478713.3333333333, ans=0.0 2023-11-19 00:21:06,351 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11700, loss[loss=0.07847, simple_loss=0.08748, pruned_loss=0.01971, audio_tagging_loss=0.01502, over 15314.00 frames. ], tot_loss[loss=0.09544, simple_loss=0.112, pruned_loss=0.0284, audio_tagging_loss=0.01103, over 3043878.59 frames. ], batch size: 56, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:21:18,911 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.31 vs. limit=22.5 2023-11-19 00:21:23,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=478846.6666666667, ans=0.0 2023-11-19 00:21:24,421 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.43 vs. limit=15.0 2023-11-19 00:21:33,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=478913.3333333333, ans=0.125 2023-11-19 00:21:37,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=478913.3333333333, ans=0.1 2023-11-19 00:21:50,017 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.99 vs. limit=15.0 2023-11-19 00:21:55,332 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.970e+01 9.668e+01 1.084e+02 1.454e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-19 00:22:01,684 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11750, loss[loss=0.09762, simple_loss=0.1071, pruned_loss=0.03273, audio_tagging_loss=0.01134, over 15567.00 frames. ], tot_loss[loss=0.09535, simple_loss=0.1117, pruned_loss=0.02839, audio_tagging_loss=0.01109, over 3050557.79 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:22:02,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=479113.3333333333, ans=0.09899494936611666 2023-11-19 00:22:13,921 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.11 vs. limit=10.0 2023-11-19 00:22:22,965 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.90 vs. limit=22.5 2023-11-19 00:22:25,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=479246.6666666667, ans=0.125 2023-11-19 00:22:56,855 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11800, loss[loss=0.09363, simple_loss=0.1069, pruned_loss=0.02941, audio_tagging_loss=0.01078, over 15107.00 frames. ], tot_loss[loss=0.09485, simple_loss=0.1108, pruned_loss=0.02826, audio_tagging_loss=0.01121, over 3046702.70 frames. ], batch size: 56, lr: 1.10e-02, grad_scale: 8.0 2023-11-19 00:23:07,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=479513.3333333333, ans=0.125 2023-11-19 00:23:11,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=479513.3333333333, ans=0.0 2023-11-19 00:23:13,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=479513.3333333333, ans=0.125 2023-11-19 00:23:19,139 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.80 vs. limit=22.5 2023-11-19 00:23:37,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=479646.6666666667, ans=0.0 2023-11-19 00:23:48,152 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 9.014e+01 9.704e+01 1.070e+02 1.627e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-19 00:23:53,376 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11850, loss[loss=0.07729, simple_loss=0.08922, pruned_loss=0.01985, audio_tagging_loss=0.01282, over 14904.00 frames. ], tot_loss[loss=0.09491, simple_loss=0.1109, pruned_loss=0.02813, audio_tagging_loss=0.01131, over 3047943.17 frames. ], batch size: 55, lr: 1.10e-02, grad_scale: 8.0 2023-11-19 00:23:58,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=479780.0, ans=0.1 2023-11-19 00:24:15,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=479913.3333333333, ans=0.1 2023-11-19 00:24:23,609 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-19 00:24:28,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=479980.0, ans=0.125 2023-11-19 00:24:44,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=480046.6666666667, ans=0.0 2023-11-19 00:24:50,971 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11900, loss[loss=0.08036, simple_loss=0.08539, pruned_loss=0.02184, audio_tagging_loss=0.01582, over 16355.00 frames. ], tot_loss[loss=0.09543, simple_loss=0.1111, pruned_loss=0.02846, audio_tagging_loss=0.01143, over 3047713.73 frames. ], batch size: 62, lr: 1.10e-02, grad_scale: 8.0 2023-11-19 00:25:13,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=480246.6666666667, ans=0.5 2023-11-19 00:25:24,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=480313.3333333333, ans=0.125 2023-11-19 00:25:30,933 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.46 vs. limit=22.5 2023-11-19 00:25:33,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=480313.3333333333, ans=0.1 2023-11-19 00:25:37,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=480380.0, ans=0.125 2023-11-19 00:25:40,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=480380.0, ans=0.2 2023-11-19 00:25:41,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1.whitening_limit, batch_count=480380.0, ans=10.0 2023-11-19 00:25:41,558 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.633e+01 9.352e+01 1.050e+02 1.397e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-19 00:25:45,904 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 11950, loss[loss=0.1091, simple_loss=0.1185, pruned_loss=0.03556, audio_tagging_loss=0.01428, over 16081.00 frames. ], tot_loss[loss=0.09539, simple_loss=0.1111, pruned_loss=0.02846, audio_tagging_loss=0.01138, over 3049544.08 frames. ], batch size: 59, lr: 1.10e-02, grad_scale: 8.0 2023-11-19 00:25:46,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=480446.6666666667, ans=0.1 2023-11-19 00:25:49,840 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:25:54,247 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.94 vs. limit=22.5 2023-11-19 00:26:05,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=480513.3333333333, ans=0.125 2023-11-19 00:26:10,051 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.81 vs. limit=15.0 2023-11-19 00:26:24,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=480646.6666666667, ans=0.0 2023-11-19 00:26:27,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=480646.6666666667, ans=0.125 2023-11-19 00:26:28,742 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=480713.3333333333, ans=0.2 2023-11-19 00:26:39,766 INFO [train_asr.py:1115] (2/4) Epoch 6, batch 12000, loss[loss=0.1168, simple_loss=0.126, pruned_loss=0.04067, audio_tagging_loss=0.01313, over 15689.00 frames. ], tot_loss[loss=0.09697, simple_loss=0.1131, pruned_loss=0.02909, audio_tagging_loss=0.01132, over 3051417.78 frames. ], batch size: 60, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:26:39,766 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 00:27:12,320 INFO [train_asr.py:1147] (2/4) Epoch 6, validation: loss=0.07011, simple_loss=0.05856, pruned_loss=0.008079, audio_tagging_loss=0.03275, over 4681554.00 frames. 2023-11-19 00:27:12,320 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 00:27:18,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=480780.0, ans=0.125 2023-11-19 00:27:23,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=480846.6666666667, ans=0.125 2023-11-19 00:28:10,669 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 0, loss[loss=0.1041, simple_loss=0.1124, pruned_loss=0.0241, audio_tagging_loss=0.02375, over 14366.00 frames. ], tot_loss[loss=0.1041, simple_loss=0.1124, pruned_loss=0.0241, audio_tagging_loss=0.02375, over 14366.00 frames. ], batch size: 55, lr: 1.03e-02, grad_scale: 32.0 2023-11-19 00:28:10,670 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 00:28:37,356 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.2780, 4.9825, 4.7833, 5.1057], device='cuda:2') 2023-11-19 00:28:42,244 INFO [train_asr.py:1147] (2/4) Epoch 7, validation: loss=0.06897, simple_loss=0.05854, pruned_loss=0.008004, audio_tagging_loss=0.03169, over 4681554.00 frames. 2023-11-19 00:28:42,245 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 00:28:59,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=480993.3333333333, ans=0.0 2023-11-19 00:29:06,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=481060.0, ans=0.125 2023-11-19 00:29:08,501 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 8.969e+01 9.678e+01 1.084e+02 1.742e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-19 00:29:26,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=481193.3333333333, ans=0.125 2023-11-19 00:29:36,960 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 50, loss[loss=0.1041, simple_loss=0.1122, pruned_loss=0.03088, audio_tagging_loss=0.01712, over 15725.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1073, pruned_loss=0.02767, audio_tagging_loss=0.02158, over 685280.43 frames. ], batch size: 58, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:29:59,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=481393.3333333333, ans=0.1 2023-11-19 00:30:03,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=481393.3333333333, ans=0.125 2023-11-19 00:30:33,427 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 100, loss[loss=0.09979, simple_loss=0.1139, pruned_loss=0.02775, audio_tagging_loss=0.01509, over 16715.00 frames. ], tot_loss[loss=0.1046, simple_loss=0.1109, pruned_loss=0.02858, audio_tagging_loss=0.02056, over 1211546.03 frames. ], batch size: 60, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:30:51,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=481660.0, ans=0.0 2023-11-19 00:30:56,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=481726.6666666667, ans=0.1 2023-11-19 00:31:01,053 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 8.882e+01 9.750e+01 1.051e+02 1.477e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-19 00:31:09,038 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.41 vs. limit=15.0 2023-11-19 00:31:12,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=481793.3333333333, ans=0.125 2023-11-19 00:31:21,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=481860.0, ans=0.2 2023-11-19 00:31:27,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=481926.6666666667, ans=0.035 2023-11-19 00:31:28,789 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 150, loss[loss=0.09965, simple_loss=0.1326, pruned_loss=0.02258, audio_tagging_loss=0.01076, over 14712.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1107, pruned_loss=0.02832, audio_tagging_loss=0.01865, over 1618070.96 frames. ], batch size: 54, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:31:34,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=481926.6666666667, ans=0.0 2023-11-19 00:31:50,779 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.76 vs. limit=15.0 2023-11-19 00:32:06,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=482126.6666666667, ans=0.0 2023-11-19 00:32:12,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=482193.3333333333, ans=0.125 2023-11-19 00:32:16,787 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=482193.3333333333, ans=0.025 2023-11-19 00:32:16,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=482193.3333333333, ans=0.125 2023-11-19 00:32:25,132 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 200, loss[loss=0.09086, simple_loss=0.09571, pruned_loss=0.03031, audio_tagging_loss=0.01269, over 16368.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1122, pruned_loss=0.02877, audio_tagging_loss=0.01632, over 1936028.36 frames. ], batch size: 62, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:32:48,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=482393.3333333333, ans=0.125 2023-11-19 00:32:52,545 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 9.072e+01 1.001e+02 1.087e+02 1.831e+02, threshold=2.002e+02, percent-clipped=0.0 2023-11-19 00:32:54,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=482393.3333333333, ans=0.5 2023-11-19 00:32:57,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=482460.0, ans=0.125 2023-11-19 00:33:21,287 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 250, loss[loss=0.08996, simple_loss=0.1069, pruned_loss=0.0244, audio_tagging_loss=0.01213, over 14404.00 frames. ], tot_loss[loss=0.09994, simple_loss=0.1126, pruned_loss=0.02892, audio_tagging_loss=0.0147, over 2182629.20 frames. ], batch size: 54, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:33:32,533 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.87 vs. limit=12.0 2023-11-19 00:33:39,162 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.13 vs. limit=15.0 2023-11-19 00:33:44,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=482726.6666666667, ans=0.0 2023-11-19 00:33:46,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=482726.6666666667, ans=0.125 2023-11-19 00:33:49,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=482726.6666666667, ans=0.125 2023-11-19 00:34:16,418 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 300, loss[loss=0.08874, simple_loss=0.1046, pruned_loss=0.02489, audio_tagging_loss=0.01154, over 14827.00 frames. ], tot_loss[loss=0.09885, simple_loss=0.1127, pruned_loss=0.02888, audio_tagging_loss=0.01362, over 2371469.14 frames. ], batch size: 56, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:34:16,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=482926.6666666667, ans=0.0 2023-11-19 00:34:36,051 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.89 vs. limit=15.0 2023-11-19 00:34:41,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=483060.0, ans=0.125 2023-11-19 00:34:42,210 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.77 vs. limit=6.0 2023-11-19 00:34:44,694 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.825e+01 8.903e+01 9.554e+01 1.061e+02 1.704e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-19 00:34:49,586 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.90 vs. limit=15.0 2023-11-19 00:35:07,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=483193.3333333333, ans=0.0 2023-11-19 00:35:12,358 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 350, loss[loss=0.1242, simple_loss=0.1448, pruned_loss=0.04215, audio_tagging_loss=0.009643, over 14687.00 frames. ], tot_loss[loss=0.09767, simple_loss=0.1126, pruned_loss=0.02844, audio_tagging_loss=0.01291, over 2522965.46 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:35:53,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=483460.0, ans=0.125 2023-11-19 00:35:56,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=483526.6666666667, ans=0.1 2023-11-19 00:36:07,561 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 400, loss[loss=0.0611, simple_loss=0.07363, pruned_loss=0.01331, audio_tagging_loss=0.01097, over 14762.00 frames. ], tot_loss[loss=0.09683, simple_loss=0.1124, pruned_loss=0.02825, audio_tagging_loss=0.0124, over 2635529.39 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:36:08,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=483593.3333333333, ans=0.125 2023-11-19 00:36:13,194 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=22.5 2023-11-19 00:36:18,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=483660.0, ans=0.125 2023-11-19 00:36:21,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=483660.0, ans=0.2 2023-11-19 00:36:21,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=483660.0, ans=0.0 2023-11-19 00:36:31,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=483726.6666666667, ans=0.125 2023-11-19 00:36:34,359 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.424e+01 8.614e+01 9.359e+01 1.038e+02 1.564e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 00:36:54,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=483860.0, ans=0.0 2023-11-19 00:36:56,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=483860.0, ans=0.125 2023-11-19 00:37:01,741 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 450, loss[loss=0.0842, simple_loss=0.1037, pruned_loss=0.02429, audio_tagging_loss=0.008076, over 15818.00 frames. ], tot_loss[loss=0.09627, simple_loss=0.1123, pruned_loss=0.02812, audio_tagging_loss=0.012, over 2726445.13 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:37:03,340 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.07 vs. limit=10.0 2023-11-19 00:37:08,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=483926.6666666667, ans=0.0 2023-11-19 00:37:16,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=483993.3333333333, ans=0.05 2023-11-19 00:37:18,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=483993.3333333333, ans=0.125 2023-11-19 00:37:18,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=483993.3333333333, ans=0.125 2023-11-19 00:37:31,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=484060.0, ans=0.125 2023-11-19 00:37:33,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=484060.0, ans=0.0 2023-11-19 00:37:50,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=484193.3333333333, ans=0.2 2023-11-19 00:37:57,269 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 500, loss[loss=0.1052, simple_loss=0.1243, pruned_loss=0.03234, audio_tagging_loss=0.01073, over 16187.00 frames. ], tot_loss[loss=0.09578, simple_loss=0.1119, pruned_loss=0.02811, audio_tagging_loss=0.01171, over 2796106.54 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:38:02,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=484260.0, ans=22.5 2023-11-19 00:38:06,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=484260.0, ans=0.015 2023-11-19 00:38:22,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=484393.3333333333, ans=0.1 2023-11-19 00:38:24,795 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 8.485e+01 9.298e+01 1.059e+02 1.299e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 00:38:32,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=484460.0, ans=0.125 2023-11-19 00:38:37,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=484460.0, ans=0.1 2023-11-19 00:38:41,868 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.34 vs. limit=10.0 2023-11-19 00:38:52,398 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 550, loss[loss=0.09349, simple_loss=0.1078, pruned_loss=0.02734, audio_tagging_loss=0.01223, over 14423.00 frames. ], tot_loss[loss=0.09467, simple_loss=0.1107, pruned_loss=0.02779, audio_tagging_loss=0.01154, over 2843221.30 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:38:57,612 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.62 vs. limit=22.5 2023-11-19 00:39:01,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=484593.3333333333, ans=0.2 2023-11-19 00:39:08,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=484660.0, ans=0.1 2023-11-19 00:39:32,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=484793.3333333333, ans=0.0 2023-11-19 00:39:39,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=484860.0, ans=0.5 2023-11-19 00:39:45,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=484860.0, ans=0.1 2023-11-19 00:39:47,630 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.81 vs. limit=6.0 2023-11-19 00:39:48,096 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 600, loss[loss=0.1212, simple_loss=0.1487, pruned_loss=0.03969, audio_tagging_loss=0.007102, over 15948.00 frames. ], tot_loss[loss=0.09574, simple_loss=0.1123, pruned_loss=0.02821, audio_tagging_loss=0.0114, over 2894970.66 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:39:55,096 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.60 vs. limit=15.0 2023-11-19 00:39:56,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=484926.6666666667, ans=0.2 2023-11-19 00:40:15,507 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 8.943e+01 9.833e+01 1.134e+02 1.508e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-19 00:40:18,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=485060.0, ans=0.125 2023-11-19 00:40:19,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=485060.0, ans=0.0 2023-11-19 00:40:22,983 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.10 vs. limit=15.0 2023-11-19 00:40:42,651 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 650, loss[loss=0.08694, simple_loss=0.1043, pruned_loss=0.02394, audio_tagging_loss=0.01087, over 15202.00 frames. ], tot_loss[loss=0.09482, simple_loss=0.1113, pruned_loss=0.02782, audio_tagging_loss=0.01137, over 2930479.03 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:40:42,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=485260.0, ans=0.0 2023-11-19 00:41:05,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=485393.3333333333, ans=0.0 2023-11-19 00:41:21,364 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.31 vs. limit=10.0 2023-11-19 00:41:38,262 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 700, loss[loss=0.08351, simple_loss=0.1017, pruned_loss=0.023, audio_tagging_loss=0.009657, over 15296.00 frames. ], tot_loss[loss=0.09373, simple_loss=0.11, pruned_loss=0.02742, audio_tagging_loss=0.01131, over 2948484.83 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:41:42,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=485593.3333333333, ans=0.125 2023-11-19 00:41:51,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=485660.0, ans=0.125 2023-11-19 00:42:06,503 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.220e+01 8.517e+01 9.340e+01 1.042e+02 1.556e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 00:42:12,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=485793.3333333333, ans=0.125 2023-11-19 00:42:25,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=485860.0, ans=0.2 2023-11-19 00:42:33,685 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 750, loss[loss=0.1061, simple_loss=0.1285, pruned_loss=0.03166, audio_tagging_loss=0.01015, over 15527.00 frames. ], tot_loss[loss=0.09402, simple_loss=0.1104, pruned_loss=0.02752, audio_tagging_loss=0.01127, over 2972063.84 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:42:40,840 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.whiten.whitening_limit, batch_count=485926.6666666667, ans=12.0 2023-11-19 00:43:12,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=486126.6666666667, ans=0.0 2023-11-19 00:43:18,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=486193.3333333333, ans=0.0 2023-11-19 00:43:22,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=486193.3333333333, ans=0.0 2023-11-19 00:43:28,903 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 800, loss[loss=0.1034, simple_loss=0.1219, pruned_loss=0.03128, audio_tagging_loss=0.01114, over 15521.00 frames. ], tot_loss[loss=0.09457, simple_loss=0.1108, pruned_loss=0.02785, audio_tagging_loss=0.01131, over 2988040.30 frames. ], batch size: 59, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:43:30,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2023-11-19 00:43:33,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=486260.0, ans=0.2 2023-11-19 00:43:54,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=486393.3333333333, ans=0.1 2023-11-19 00:43:58,560 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.127e+01 8.961e+01 9.604e+01 1.088e+02 1.734e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-19 00:44:06,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=486460.0, ans=0.125 2023-11-19 00:44:06,768 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.75 vs. limit=10.0 2023-11-19 00:44:07,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=486460.0, ans=0.0 2023-11-19 00:44:09,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=486460.0, ans=0.2 2023-11-19 00:44:09,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=486460.0, ans=0.0 2023-11-19 00:44:18,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=486526.6666666667, ans=0.125 2023-11-19 00:44:18,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=486526.6666666667, ans=0.0 2023-11-19 00:44:24,838 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 850, loss[loss=0.08504, simple_loss=0.1076, pruned_loss=0.0216, audio_tagging_loss=0.009618, over 15262.00 frames. ], tot_loss[loss=0.09435, simple_loss=0.1108, pruned_loss=0.02757, audio_tagging_loss=0.01137, over 2999443.42 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:44:28,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=486593.3333333333, ans=0.125 2023-11-19 00:44:44,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=486660.0, ans=0.025 2023-11-19 00:44:47,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=486726.6666666667, ans=0.125 2023-11-19 00:44:53,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=486726.6666666667, ans=0.125 2023-11-19 00:45:04,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=486793.3333333333, ans=10.0 2023-11-19 00:45:05,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=486793.3333333333, ans=0.125 2023-11-19 00:45:14,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=486860.0, ans=0.125 2023-11-19 00:45:19,208 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.50 vs. limit=10.0 2023-11-19 00:45:21,333 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 900, loss[loss=0.115, simple_loss=0.1331, pruned_loss=0.03658, audio_tagging_loss=0.01191, over 15494.00 frames. ], tot_loss[loss=0.09494, simple_loss=0.1114, pruned_loss=0.02785, audio_tagging_loss=0.01141, over 3008958.58 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:45:29,202 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2023-11-19 00:45:32,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=486993.3333333333, ans=0.125 2023-11-19 00:45:41,112 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.88 vs. limit=15.0 2023-11-19 00:45:49,161 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.581e+01 9.444e+01 1.025e+02 1.382e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-19 00:46:01,998 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:46:16,167 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 950, loss[loss=0.09312, simple_loss=0.112, pruned_loss=0.02887, audio_tagging_loss=0.008273, over 15300.00 frames. ], tot_loss[loss=0.09477, simple_loss=0.1114, pruned_loss=0.02783, audio_tagging_loss=0.01126, over 3015181.40 frames. ], batch size: 60, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:46:49,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=487460.0, ans=0.125 2023-11-19 00:47:09,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=487526.6666666667, ans=0.125 2023-11-19 00:47:11,643 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1000, loss[loss=0.1299, simple_loss=0.1689, pruned_loss=0.04088, audio_tagging_loss=0.004613, over 15462.00 frames. ], tot_loss[loss=0.0947, simple_loss=0.1117, pruned_loss=0.02787, audio_tagging_loss=0.01096, over 3020964.92 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:47:26,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=487660.0, ans=0.125 2023-11-19 00:47:34,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=487726.6666666667, ans=0.125 2023-11-19 00:47:35,314 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:47:41,630 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.644e+01 9.195e+01 1.009e+02 1.438e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-19 00:48:01,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=487860.0, ans=0.125 2023-11-19 00:48:02,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=487860.0, ans=0.2 2023-11-19 00:48:07,471 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1050, loss[loss=0.09967, simple_loss=0.1189, pruned_loss=0.03144, audio_tagging_loss=0.008785, over 14966.00 frames. ], tot_loss[loss=0.09603, simple_loss=0.1135, pruned_loss=0.02849, audio_tagging_loss=0.01079, over 3031868.11 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:48:11,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=487926.6666666667, ans=0.125 2023-11-19 00:48:12,572 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.78 vs. limit=15.0 2023-11-19 00:48:18,466 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.94 vs. limit=15.0 2023-11-19 00:48:34,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=488060.0, ans=0.125 2023-11-19 00:48:39,890 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.69 vs. limit=15.0 2023-11-19 00:49:03,268 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1100, loss[loss=0.1016, simple_loss=0.1222, pruned_loss=0.02978, audio_tagging_loss=0.01072, over 14750.00 frames. ], tot_loss[loss=0.09561, simple_loss=0.1126, pruned_loss=0.02847, audio_tagging_loss=0.01083, over 3031786.61 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:49:06,400 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:49:08,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=488260.0, ans=0.125 2023-11-19 00:49:13,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=488326.6666666667, ans=0.125 2023-11-19 00:49:14,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=488326.6666666667, ans=0.09899494936611666 2023-11-19 00:49:16,946 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.58 vs. limit=22.5 2023-11-19 00:49:17,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=488326.6666666667, ans=0.125 2023-11-19 00:49:33,461 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 8.821e+01 9.518e+01 1.052e+02 1.526e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 00:49:39,970 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:49:58,550 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.79 vs. limit=12.0 2023-11-19 00:49:58,982 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1150, loss[loss=0.1047, simple_loss=0.1326, pruned_loss=0.0306, audio_tagging_loss=0.007804, over 14651.00 frames. ], tot_loss[loss=0.09577, simple_loss=0.1129, pruned_loss=0.02857, audio_tagging_loss=0.01075, over 3032237.54 frames. ], batch size: 55, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:50:09,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=488660.0, ans=0.0 2023-11-19 00:50:18,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=488660.0, ans=0.1 2023-11-19 00:50:30,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=488726.6666666667, ans=0.1 2023-11-19 00:50:33,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=488793.3333333333, ans=0.125 2023-11-19 00:50:33,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=488793.3333333333, ans=0.0 2023-11-19 00:50:43,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=488860.0, ans=0.1 2023-11-19 00:50:55,535 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1200, loss[loss=0.1079, simple_loss=0.1267, pruned_loss=0.03432, audio_tagging_loss=0.01018, over 14856.00 frames. ], tot_loss[loss=0.0954, simple_loss=0.1125, pruned_loss=0.02836, audio_tagging_loss=0.01076, over 3032799.71 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:50:58,493 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.54 vs. limit=15.0 2023-11-19 00:51:00,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=488926.6666666667, ans=0.0 2023-11-19 00:51:08,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=488993.3333333333, ans=0.125 2023-11-19 00:51:16,477 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.945e-01 2023-11-19 00:51:25,293 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.775e+01 9.458e+01 1.050e+02 1.338e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-19 00:51:39,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=489193.3333333333, ans=0.2 2023-11-19 00:51:50,851 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1250, loss[loss=0.1058, simple_loss=0.1278, pruned_loss=0.0304, audio_tagging_loss=0.01144, over 15260.00 frames. ], tot_loss[loss=0.09531, simple_loss=0.1125, pruned_loss=0.02829, audio_tagging_loss=0.01076, over 3039251.06 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:52:05,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=489326.6666666667, ans=0.0 2023-11-19 00:52:22,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=489393.3333333333, ans=0.125 2023-11-19 00:52:44,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=489526.6666666667, ans=0.09899494936611666 2023-11-19 00:52:47,251 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1300, loss[loss=0.08134, simple_loss=0.09815, pruned_loss=0.02317, audio_tagging_loss=0.009093, over 15270.00 frames. ], tot_loss[loss=0.09372, simple_loss=0.1105, pruned_loss=0.02766, audio_tagging_loss=0.0108, over 3037163.09 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:52:53,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=489593.3333333333, ans=0.1 2023-11-19 00:53:04,740 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.04 vs. limit=22.5 2023-11-19 00:53:17,085 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.399e+01 8.502e+01 9.491e+01 1.040e+02 1.421e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-19 00:53:43,620 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1350, loss[loss=0.1084, simple_loss=0.1182, pruned_loss=0.03818, audio_tagging_loss=0.01108, over 15470.00 frames. ], tot_loss[loss=0.09361, simple_loss=0.1103, pruned_loss=0.02765, audio_tagging_loss=0.01082, over 3034524.41 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:53:47,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=489926.6666666667, ans=0.125 2023-11-19 00:53:48,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=489926.6666666667, ans=0.125 2023-11-19 00:53:50,233 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=489926.6666666667, ans=0.125 2023-11-19 00:54:04,387 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:54:17,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=490126.6666666667, ans=0.1 2023-11-19 00:54:23,451 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2023-11-19 00:54:24,936 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:54:25,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=490126.6666666667, ans=0.125 2023-11-19 00:54:30,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=490193.3333333333, ans=0.125 2023-11-19 00:54:31,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=490193.3333333333, ans=0.125 2023-11-19 00:54:38,697 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1400, loss[loss=0.07224, simple_loss=0.07588, pruned_loss=0.0203, audio_tagging_loss=0.01401, over 14922.00 frames. ], tot_loss[loss=0.09307, simple_loss=0.1093, pruned_loss=0.02745, audio_tagging_loss=0.01095, over 3035291.10 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:54:41,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=490260.0, ans=0.125 2023-11-19 00:54:56,240 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.43 vs. limit=15.0 2023-11-19 00:55:09,550 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.650e+01 9.368e+01 1.053e+02 1.666e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 00:55:23,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=490526.6666666667, ans=0.2 2023-11-19 00:55:24,769 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=22.5 2023-11-19 00:55:27,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=490526.6666666667, ans=0.2 2023-11-19 00:55:35,057 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1450, loss[loss=0.1174, simple_loss=0.1402, pruned_loss=0.03789, audio_tagging_loss=0.009366, over 16027.00 frames. ], tot_loss[loss=0.09326, simple_loss=0.1098, pruned_loss=0.02743, audio_tagging_loss=0.01094, over 3036464.89 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:55:37,800 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.95 vs. limit=15.0 2023-11-19 00:55:55,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=490660.0, ans=0.125 2023-11-19 00:56:08,642 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.20 vs. limit=15.0 2023-11-19 00:56:30,791 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1500, loss[loss=0.1145, simple_loss=0.1357, pruned_loss=0.03544, audio_tagging_loss=0.01125, over 15183.00 frames. ], tot_loss[loss=0.0939, simple_loss=0.1105, pruned_loss=0.0276, audio_tagging_loss=0.01104, over 3033050.68 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:56:36,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=490926.6666666667, ans=10.0 2023-11-19 00:56:53,610 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.47 vs. limit=10.0 2023-11-19 00:56:55,734 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.45 vs. limit=12.0 2023-11-19 00:57:00,322 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 8.708e+01 9.682e+01 1.052e+02 1.356e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-19 00:57:12,441 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.88 vs. limit=15.0 2023-11-19 00:57:15,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491193.3333333333, ans=0.1 2023-11-19 00:57:16,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=491193.3333333333, ans=0.0 2023-11-19 00:57:25,683 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1550, loss[loss=0.08056, simple_loss=0.08645, pruned_loss=0.02283, audio_tagging_loss=0.0145, over 14101.00 frames. ], tot_loss[loss=0.09406, simple_loss=0.1103, pruned_loss=0.02768, audio_tagging_loss=0.01121, over 3037472.72 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:57:32,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=491260.0, ans=0.125 2023-11-19 00:57:36,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=491326.6666666667, ans=0.07 2023-11-19 00:57:41,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=491326.6666666667, ans=0.125 2023-11-19 00:57:51,404 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.48 vs. limit=10.0 2023-11-19 00:58:17,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=491526.6666666667, ans=0.0 2023-11-19 00:58:20,575 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1600, loss[loss=0.07434, simple_loss=0.08524, pruned_loss=0.01712, audio_tagging_loss=0.0146, over 14686.00 frames. ], tot_loss[loss=0.09332, simple_loss=0.1096, pruned_loss=0.02734, audio_tagging_loss=0.01119, over 3042181.84 frames. ], batch size: 55, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:58:39,878 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.70 vs. limit=15.0 2023-11-19 00:58:43,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491726.6666666667, ans=0.1 2023-11-19 00:58:44,885 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:58:50,951 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.136e+01 8.782e+01 9.645e+01 1.086e+02 1.733e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-19 00:58:53,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=491793.3333333333, ans=0.125 2023-11-19 00:59:06,092 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.07 vs. limit=15.0 2023-11-19 00:59:17,056 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1650, loss[loss=0.06347, simple_loss=0.07497, pruned_loss=0.013, audio_tagging_loss=0.01299, over 14636.00 frames. ], tot_loss[loss=0.09207, simple_loss=0.1079, pruned_loss=0.02678, audio_tagging_loss=0.01136, over 3043694.08 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:59:20,216 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.33 vs. limit=15.0 2023-11-19 00:59:26,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=491926.6666666667, ans=0.07 2023-11-19 00:59:31,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=491993.3333333333, ans=0.125 2023-11-19 00:59:38,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=492060.0, ans=0.125 2023-11-19 00:59:53,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=492126.6666666667, ans=0.125 2023-11-19 00:59:55,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=492126.6666666667, ans=0.2 2023-11-19 00:59:57,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=492126.6666666667, ans=0.125 2023-11-19 01:00:12,744 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1700, loss[loss=0.113, simple_loss=0.134, pruned_loss=0.03614, audio_tagging_loss=0.009838, over 15241.00 frames. ], tot_loss[loss=0.09335, simple_loss=0.1095, pruned_loss=0.02739, audio_tagging_loss=0.01121, over 3045839.99 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 01:00:16,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=492260.0, ans=0.05 2023-11-19 01:00:43,065 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.546e+01 9.504e+01 1.048e+02 1.501e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-19 01:01:08,024 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1750, loss[loss=0.09173, simple_loss=0.1116, pruned_loss=0.03004, audio_tagging_loss=0.005863, over 14624.00 frames. ], tot_loss[loss=0.09431, simple_loss=0.1108, pruned_loss=0.02791, audio_tagging_loss=0.01098, over 3043694.28 frames. ], batch size: 54, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 01:01:08,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=492593.3333333333, ans=0.0 2023-11-19 01:01:24,822 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=492660.0, ans=0.0 2023-11-19 01:01:38,354 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.83 vs. limit=22.5 2023-11-19 01:02:04,399 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1800, loss[loss=0.09105, simple_loss=0.1044, pruned_loss=0.02851, audio_tagging_loss=0.01033, over 15410.00 frames. ], tot_loss[loss=0.09376, simple_loss=0.1103, pruned_loss=0.02775, audio_tagging_loss=0.01085, over 3042719.19 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:02:08,931 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=492926.6666666667, ans=0.125 2023-11-19 01:02:18,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=492993.3333333333, ans=0.125 2023-11-19 01:02:20,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=492993.3333333333, ans=0.1 2023-11-19 01:02:21,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=492993.3333333333, ans=0.125 2023-11-19 01:02:26,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=493060.0, ans=0.125 2023-11-19 01:02:34,106 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.111e+01 8.598e+01 9.390e+01 1.040e+02 1.619e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-19 01:02:37,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=493126.6666666667, ans=0.0 2023-11-19 01:02:44,796 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.71 vs. limit=22.5 2023-11-19 01:02:46,859 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=8.60 vs. limit=12.0 2023-11-19 01:02:48,450 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2023-11-19 01:02:51,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=493193.3333333333, ans=0.125 2023-11-19 01:03:00,477 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1850, loss[loss=0.1109, simple_loss=0.1307, pruned_loss=0.03657, audio_tagging_loss=0.008975, over 15964.00 frames. ], tot_loss[loss=0.09464, simple_loss=0.1114, pruned_loss=0.02809, audio_tagging_loss=0.01083, over 3043535.49 frames. ], batch size: 60, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:03:10,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=493326.6666666667, ans=0.2 2023-11-19 01:03:15,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=493326.6666666667, ans=0.125 2023-11-19 01:03:23,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=493393.3333333333, ans=0.2 2023-11-19 01:03:37,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=493460.0, ans=0.2 2023-11-19 01:03:41,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=493460.0, ans=0.125 2023-11-19 01:03:42,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.whiten.whitening_limit, batch_count=493460.0, ans=12.0 2023-11-19 01:03:55,762 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1900, loss[loss=0.1097, simple_loss=0.1327, pruned_loss=0.03371, audio_tagging_loss=0.009632, over 15916.00 frames. ], tot_loss[loss=0.09472, simple_loss=0.1117, pruned_loss=0.02809, audio_tagging_loss=0.01077, over 3043998.30 frames. ], batch size: 58, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:04:21,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=493726.6666666667, ans=0.0 2023-11-19 01:04:25,810 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.583e+01 8.519e+01 9.193e+01 1.005e+02 1.310e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-19 01:04:35,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=493793.3333333333, ans=0.0 2023-11-19 01:04:40,398 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.65 vs. limit=15.0 2023-11-19 01:04:50,849 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 1950, loss[loss=0.1297, simple_loss=0.1421, pruned_loss=0.04851, audio_tagging_loss=0.01014, over 15720.00 frames. ], tot_loss[loss=0.09473, simple_loss=0.1118, pruned_loss=0.02809, audio_tagging_loss=0.01072, over 3050964.19 frames. ], batch size: 61, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:04:59,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=493926.6666666667, ans=0.125 2023-11-19 01:05:18,578 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.95 vs. limit=15.0 2023-11-19 01:05:22,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=494060.0, ans=0.1 2023-11-19 01:05:32,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=494126.6666666667, ans=0.125 2023-11-19 01:05:47,466 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2000, loss[loss=0.1108, simple_loss=0.1297, pruned_loss=0.0349, audio_tagging_loss=0.01106, over 14156.00 frames. ], tot_loss[loss=0.09444, simple_loss=0.1115, pruned_loss=0.02793, audio_tagging_loss=0.01075, over 3041378.97 frames. ], batch size: 53, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:05:54,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=494260.0, ans=0.0 2023-11-19 01:06:05,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=494326.6666666667, ans=0.025 2023-11-19 01:06:06,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.80 vs. limit=22.5 2023-11-19 01:06:13,902 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.47 vs. limit=12.0 2023-11-19 01:06:16,513 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.706e+01 9.238e+01 1.036e+02 1.404e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 01:06:22,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=494460.0, ans=0.2 2023-11-19 01:06:37,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=494526.6666666667, ans=0.0 2023-11-19 01:06:38,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=494526.6666666667, ans=0.0 2023-11-19 01:06:42,716 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2050, loss[loss=0.08944, simple_loss=0.09508, pruned_loss=0.02882, audio_tagging_loss=0.01308, over 14258.00 frames. ], tot_loss[loss=0.0947, simple_loss=0.1118, pruned_loss=0.02796, audio_tagging_loss=0.01082, over 3046943.07 frames. ], batch size: 55, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:06:48,356 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:07:04,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=494726.6666666667, ans=0.0 2023-11-19 01:07:07,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=494726.6666666667, ans=0.125 2023-11-19 01:07:10,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=494726.6666666667, ans=0.125 2023-11-19 01:07:16,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=494793.3333333333, ans=10.0 2023-11-19 01:07:19,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=494793.3333333333, ans=0.0 2023-11-19 01:07:37,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=494860.0, ans=0.0 2023-11-19 01:07:39,054 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2100, loss[loss=0.06397, simple_loss=0.07704, pruned_loss=0.0131, audio_tagging_loss=0.01235, over 15728.00 frames. ], tot_loss[loss=0.09385, simple_loss=0.111, pruned_loss=0.02758, audio_tagging_loss=0.01075, over 3042729.61 frames. ], batch size: 62, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:07:42,902 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.32 vs. limit=15.0 2023-11-19 01:07:53,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=494993.3333333333, ans=0.0 2023-11-19 01:08:04,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=495060.0, ans=0.04949747468305833 2023-11-19 01:08:09,436 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.837e+01 9.503e+01 1.029e+02 1.417e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-19 01:08:18,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=495126.6666666667, ans=15.0 2023-11-19 01:08:19,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=495126.6666666667, ans=0.0 2023-11-19 01:08:33,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=495193.3333333333, ans=0.125 2023-11-19 01:08:33,904 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.75 vs. limit=15.0 2023-11-19 01:08:34,043 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2023-11-19 01:08:35,531 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2150, loss[loss=0.1082, simple_loss=0.1369, pruned_loss=0.03007, audio_tagging_loss=0.009705, over 14976.00 frames. ], tot_loss[loss=0.09393, simple_loss=0.1109, pruned_loss=0.02765, audio_tagging_loss=0.0108, over 3035883.87 frames. ], batch size: 55, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:08:48,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=495326.6666666667, ans=0.125 2023-11-19 01:08:48,211 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.50 vs. limit=15.0 2023-11-19 01:08:50,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=495326.6666666667, ans=0.0 2023-11-19 01:08:52,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=495326.6666666667, ans=0.2 2023-11-19 01:08:55,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=495326.6666666667, ans=0.0 2023-11-19 01:08:55,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=495326.6666666667, ans=0.0 2023-11-19 01:08:56,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=495393.3333333333, ans=0.1 2023-11-19 01:08:56,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=495393.3333333333, ans=0.1 2023-11-19 01:09:10,121 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:09:11,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=495460.0, ans=0.125 2023-11-19 01:09:30,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=495593.3333333333, ans=0.05 2023-11-19 01:09:31,316 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2200, loss[loss=0.09325, simple_loss=0.1089, pruned_loss=0.02802, audio_tagging_loss=0.01079, over 13890.00 frames. ], tot_loss[loss=0.09385, simple_loss=0.1108, pruned_loss=0.0276, audio_tagging_loss=0.01087, over 3027306.16 frames. ], batch size: 52, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:09:48,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=495660.0, ans=0.2 2023-11-19 01:10:02,324 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.926e+01 8.544e+01 9.673e+01 1.053e+02 1.527e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-19 01:10:09,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=495793.3333333333, ans=0.125 2023-11-19 01:10:09,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=495793.3333333333, ans=0.07 2023-11-19 01:10:11,593 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:10:26,496 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2250, loss[loss=0.09379, simple_loss=0.1114, pruned_loss=0.02605, audio_tagging_loss=0.01202, over 15060.00 frames. ], tot_loss[loss=0.09397, simple_loss=0.1111, pruned_loss=0.02754, audio_tagging_loss=0.01088, over 3030811.08 frames. ], batch size: 54, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:10:59,389 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2023-11-19 01:11:11,385 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=496193.3333333333, ans=0.125 2023-11-19 01:11:14,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=496193.3333333333, ans=0.125 2023-11-19 01:11:23,070 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2300, loss[loss=0.09695, simple_loss=0.1087, pruned_loss=0.03166, audio_tagging_loss=0.01095, over 14265.00 frames. ], tot_loss[loss=0.09442, simple_loss=0.1115, pruned_loss=0.02775, audio_tagging_loss=0.01093, over 3030250.77 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:11:47,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=496393.3333333333, ans=0.125 2023-11-19 01:11:53,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=496393.3333333333, ans=0.125 2023-11-19 01:11:53,799 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 9.064e+01 9.790e+01 1.107e+02 1.454e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-19 01:11:55,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=496460.0, ans=0.2 2023-11-19 01:12:02,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=496460.0, ans=0.125 2023-11-19 01:12:05,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=496460.0, ans=0.1 2023-11-19 01:12:05,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=496460.0, ans=0.1 2023-11-19 01:12:12,935 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:12:18,212 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2350, loss[loss=0.07664, simple_loss=0.08973, pruned_loss=0.02065, audio_tagging_loss=0.01113, over 14633.00 frames. ], tot_loss[loss=0.09376, simple_loss=0.1105, pruned_loss=0.0275, audio_tagging_loss=0.01102, over 3039019.98 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:12:32,576 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.42 vs. limit=10.0 2023-11-19 01:12:36,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=496660.0, ans=0.0 2023-11-19 01:12:52,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=496793.3333333333, ans=0.0 2023-11-19 01:13:14,603 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2400, loss[loss=0.07905, simple_loss=0.08785, pruned_loss=0.02358, audio_tagging_loss=0.01155, over 14376.00 frames. ], tot_loss[loss=0.09437, simple_loss=0.1114, pruned_loss=0.02762, audio_tagging_loss=0.01107, over 3040846.82 frames. ], batch size: 55, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:13:27,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=496993.3333333333, ans=0.95 2023-11-19 01:13:28,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=496993.3333333333, ans=0.1 2023-11-19 01:13:38,414 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.06 vs. limit=22.5 2023-11-19 01:13:45,238 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.739e+01 9.521e+01 1.008e+02 1.350e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 01:14:07,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=497193.3333333333, ans=0.1 2023-11-19 01:14:10,592 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2450, loss[loss=0.08386, simple_loss=0.1038, pruned_loss=0.02223, audio_tagging_loss=0.009726, over 16485.00 frames. ], tot_loss[loss=0.09412, simple_loss=0.1107, pruned_loss=0.02751, audio_tagging_loss=0.01125, over 3035339.23 frames. ], batch size: 59, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:14:18,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=497260.0, ans=0.0 2023-11-19 01:14:40,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=497393.3333333333, ans=0.0 2023-11-19 01:14:53,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=497460.0, ans=0.95 2023-11-19 01:15:06,184 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2500, loss[loss=0.07736, simple_loss=0.09581, pruned_loss=0.02016, audio_tagging_loss=0.009298, over 15596.00 frames. ], tot_loss[loss=0.09421, simple_loss=0.1108, pruned_loss=0.02756, audio_tagging_loss=0.01125, over 3042201.94 frames. ], batch size: 58, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:15:25,762 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=15.0 2023-11-19 01:15:31,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=497726.6666666667, ans=0.2 2023-11-19 01:15:37,853 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.172e+01 8.612e+01 9.372e+01 1.003e+02 1.252e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 01:16:01,808 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2550, loss[loss=0.1162, simple_loss=0.1291, pruned_loss=0.03916, audio_tagging_loss=0.01253, over 14595.00 frames. ], tot_loss[loss=0.0943, simple_loss=0.111, pruned_loss=0.0277, audio_tagging_loss=0.01111, over 3046123.50 frames. ], batch size: 55, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:16:02,288 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2023-11-19 01:16:04,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=497926.6666666667, ans=0.035 2023-11-19 01:16:10,125 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=497926.6666666667, ans=0.125 2023-11-19 01:16:18,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=497993.3333333333, ans=0.125 2023-11-19 01:16:32,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=498060.0, ans=0.125 2023-11-19 01:16:37,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=498126.6666666667, ans=0.125 2023-11-19 01:16:57,738 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2600, loss[loss=0.118, simple_loss=0.142, pruned_loss=0.03528, audio_tagging_loss=0.01173, over 15782.00 frames. ], tot_loss[loss=0.09409, simple_loss=0.111, pruned_loss=0.0276, audio_tagging_loss=0.01097, over 3043740.20 frames. ], batch size: 55, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:16:59,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=498260.0, ans=0.0 2023-11-19 01:17:03,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=498260.0, ans=0.0 2023-11-19 01:17:09,568 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2023-11-19 01:17:12,755 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.73 vs. limit=15.0 2023-11-19 01:17:16,568 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.01 vs. limit=15.0 2023-11-19 01:17:21,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=498393.3333333333, ans=0.125 2023-11-19 01:17:28,699 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.310e+01 9.022e+01 9.947e+01 2.048e+02, threshold=1.804e+02, percent-clipped=1.0 2023-11-19 01:17:32,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=498460.0, ans=0.04949747468305833 2023-11-19 01:17:46,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=498526.6666666667, ans=0.125 2023-11-19 01:17:50,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=498526.6666666667, ans=0.0 2023-11-19 01:17:53,112 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2650, loss[loss=0.1152, simple_loss=0.132, pruned_loss=0.0414, audio_tagging_loss=0.00782, over 15437.00 frames. ], tot_loss[loss=0.09348, simple_loss=0.1107, pruned_loss=0.02726, audio_tagging_loss=0.01087, over 3045272.98 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:18:18,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=498726.6666666667, ans=0.125 2023-11-19 01:18:43,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=498860.0, ans=0.0 2023-11-19 01:18:48,516 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2700, loss[loss=0.08122, simple_loss=0.09484, pruned_loss=0.02334, audio_tagging_loss=0.01045, over 14867.00 frames. ], tot_loss[loss=0.09311, simple_loss=0.1104, pruned_loss=0.02716, audio_tagging_loss=0.01076, over 3050689.42 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:19:20,348 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.993e+01 8.454e+01 8.998e+01 9.749e+01 1.436e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 01:19:21,740 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=499126.6666666667, ans=0.125 2023-11-19 01:19:26,040 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.60 vs. limit=15.0 2023-11-19 01:19:44,789 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2750, loss[loss=0.07657, simple_loss=0.09417, pruned_loss=0.01853, audio_tagging_loss=0.01095, over 15555.00 frames. ], tot_loss[loss=0.09335, simple_loss=0.1104, pruned_loss=0.02735, audio_tagging_loss=0.01078, over 3045371.18 frames. ], batch size: 58, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:20:06,379 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=499393.3333333333, ans=0.125 2023-11-19 01:20:09,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=499393.3333333333, ans=0.125 2023-11-19 01:20:21,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=499460.0, ans=0.1 2023-11-19 01:20:33,914 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:20:40,205 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2800, loss[loss=0.0992, simple_loss=0.1193, pruned_loss=0.0309, audio_tagging_loss=0.008633, over 15609.00 frames. ], tot_loss[loss=0.0929, simple_loss=0.1097, pruned_loss=0.02722, audio_tagging_loss=0.01082, over 3047221.61 frames. ], batch size: 58, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:20:42,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=499593.3333333333, ans=0.0 2023-11-19 01:20:56,447 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.49 vs. limit=15.0 2023-11-19 01:21:11,306 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.572e+01 8.889e+01 9.395e+01 1.013e+02 1.273e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 01:21:20,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=499793.3333333333, ans=0.1 2023-11-19 01:21:35,074 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2850, loss[loss=0.1063, simple_loss=0.1238, pruned_loss=0.03415, audio_tagging_loss=0.01028, over 15728.00 frames. ], tot_loss[loss=0.09341, simple_loss=0.1103, pruned_loss=0.02738, audio_tagging_loss=0.01087, over 3051582.41 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:22:05,719 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.51 vs. limit=22.5 2023-11-19 01:22:32,060 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2900, loss[loss=0.09103, simple_loss=0.1003, pruned_loss=0.02845, audio_tagging_loss=0.01245, over 15280.00 frames. ], tot_loss[loss=0.09337, simple_loss=0.1102, pruned_loss=0.02746, audio_tagging_loss=0.0108, over 3046566.85 frames. ], batch size: 59, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:22:54,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=500393.3333333333, ans=0.1 2023-11-19 01:22:55,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=500393.3333333333, ans=0.0 2023-11-19 01:23:02,356 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.772e+01 8.483e+01 9.561e+01 1.049e+02 1.503e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-19 01:23:10,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=500460.0, ans=0.125 2023-11-19 01:23:16,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=500526.6666666667, ans=0.2 2023-11-19 01:23:16,402 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.67 vs. limit=22.5 2023-11-19 01:23:26,459 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.35 vs. limit=6.0 2023-11-19 01:23:27,926 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 2950, loss[loss=0.113, simple_loss=0.1376, pruned_loss=0.03287, audio_tagging_loss=0.01136, over 15604.00 frames. ], tot_loss[loss=0.09442, simple_loss=0.1115, pruned_loss=0.02787, audio_tagging_loss=0.01081, over 3043408.14 frames. ], batch size: 55, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:23:35,531 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:23:44,346 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.91 vs. limit=22.5 2023-11-19 01:23:45,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=500660.0, ans=0.0 2023-11-19 01:23:46,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=500660.0, ans=0.1 2023-11-19 01:24:13,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=500860.0, ans=0.125 2023-11-19 01:24:20,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=500860.0, ans=0.125 2023-11-19 01:24:22,440 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3000, loss[loss=0.1004, simple_loss=0.1146, pruned_loss=0.03085, audio_tagging_loss=0.01222, over 15648.00 frames. ], tot_loss[loss=0.09464, simple_loss=0.1114, pruned_loss=0.02801, audio_tagging_loss=0.01096, over 3044467.91 frames. ], batch size: 60, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:24:22,441 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 01:24:54,910 INFO [train_asr.py:1147] (2/4) Epoch 7, validation: loss=0.06857, simple_loss=0.05795, pruned_loss=0.007692, audio_tagging_loss=0.0319, over 4681554.00 frames. 2023-11-19 01:24:54,911 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 01:25:13,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=500993.3333333333, ans=0.125 2023-11-19 01:25:22,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=501060.0, ans=0.125 2023-11-19 01:25:24,917 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.894e+01 8.798e+01 9.751e+01 1.102e+02 1.409e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-19 01:25:28,857 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=501126.6666666667, ans=0.0 2023-11-19 01:25:33,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=501126.6666666667, ans=0.1 2023-11-19 01:25:35,255 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.59 vs. limit=15.0 2023-11-19 01:25:45,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=501193.3333333333, ans=0.125 2023-11-19 01:25:49,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=501260.0, ans=0.125 2023-11-19 01:25:50,600 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3050, loss[loss=0.1051, simple_loss=0.1186, pruned_loss=0.03357, audio_tagging_loss=0.01223, over 14862.00 frames. ], tot_loss[loss=0.09489, simple_loss=0.1114, pruned_loss=0.02809, audio_tagging_loss=0.01109, over 3042556.66 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:26:25,156 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:26:45,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=501593.3333333333, ans=0.09899494936611666 2023-11-19 01:26:46,167 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3100, loss[loss=0.0871, simple_loss=0.09395, pruned_loss=0.02619, audio_tagging_loss=0.01394, over 16542.00 frames. ], tot_loss[loss=0.09478, simple_loss=0.1115, pruned_loss=0.02801, audio_tagging_loss=0.01104, over 3045724.66 frames. ], batch size: 63, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:27:17,744 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.054e+01 8.685e+01 9.204e+01 1.020e+02 1.427e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 01:27:37,018 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.87 vs. limit=22.5 2023-11-19 01:27:37,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=501860.0, ans=0.1 2023-11-19 01:27:38,105 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.48 vs. limit=22.5 2023-11-19 01:27:42,164 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3150, loss[loss=0.09914, simple_loss=0.123, pruned_loss=0.0303, audio_tagging_loss=0.007328, over 14414.00 frames. ], tot_loss[loss=0.0958, simple_loss=0.1129, pruned_loss=0.02831, audio_tagging_loss=0.01104, over 3046817.34 frames. ], batch size: 54, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:27:43,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=501926.6666666667, ans=0.125 2023-11-19 01:27:48,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=501926.6666666667, ans=0.09899494936611666 2023-11-19 01:27:50,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=501926.6666666667, ans=0.125 2023-11-19 01:27:50,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=501926.6666666667, ans=0.2 2023-11-19 01:28:04,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=502060.0, ans=0.125 2023-11-19 01:28:10,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=502060.0, ans=0.125 2023-11-19 01:28:14,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=502126.6666666667, ans=0.125 2023-11-19 01:28:29,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=502193.3333333333, ans=0.0 2023-11-19 01:28:37,935 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3200, loss[loss=0.09771, simple_loss=0.1165, pruned_loss=0.02846, audio_tagging_loss=0.01102, over 14652.00 frames. ], tot_loss[loss=0.09546, simple_loss=0.1124, pruned_loss=0.02806, audio_tagging_loss=0.01117, over 3043871.30 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:28:42,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=502260.0, ans=0.125 2023-11-19 01:29:07,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=502393.3333333333, ans=0.125 2023-11-19 01:29:08,657 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.725e+01 8.572e+01 9.353e+01 1.015e+02 1.372e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-19 01:29:11,578 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=502460.0, ans=0.2 2023-11-19 01:29:21,976 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.08 vs. limit=6.0 2023-11-19 01:29:33,136 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3250, loss[loss=0.09914, simple_loss=0.1083, pruned_loss=0.03265, audio_tagging_loss=0.01235, over 14301.00 frames. ], tot_loss[loss=0.0954, simple_loss=0.1125, pruned_loss=0.028, audio_tagging_loss=0.01113, over 3046932.82 frames. ], batch size: 52, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:29:35,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=502593.3333333333, ans=0.0 2023-11-19 01:29:38,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=502593.3333333333, ans=0.125 2023-11-19 01:29:39,965 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.86 vs. limit=15.0 2023-11-19 01:29:40,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=502593.3333333333, ans=0.125 2023-11-19 01:29:40,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=502593.3333333333, ans=0.125 2023-11-19 01:29:46,347 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.07 vs. limit=15.0 2023-11-19 01:29:48,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=502660.0, ans=0.125 2023-11-19 01:30:08,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=502793.3333333333, ans=0.09899494936611666 2023-11-19 01:30:10,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=502793.3333333333, ans=0.125 2023-11-19 01:30:29,460 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3300, loss[loss=0.08972, simple_loss=0.1109, pruned_loss=0.02495, audio_tagging_loss=0.009344, over 15288.00 frames. ], tot_loss[loss=0.09378, simple_loss=0.1103, pruned_loss=0.02734, audio_tagging_loss=0.01129, over 3048343.86 frames. ], batch size: 56, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:30:47,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=502993.3333333333, ans=0.125 2023-11-19 01:31:00,698 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.005e+01 8.481e+01 9.247e+01 1.047e+02 1.658e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-19 01:31:03,083 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:31:21,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=503193.3333333333, ans=0.125 2023-11-19 01:31:26,465 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3350, loss[loss=0.1075, simple_loss=0.1359, pruned_loss=0.03107, audio_tagging_loss=0.008529, over 15154.00 frames. ], tot_loss[loss=0.09421, simple_loss=0.1109, pruned_loss=0.02759, audio_tagging_loss=0.01117, over 3046965.77 frames. ], batch size: 55, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:31:36,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=503326.6666666667, ans=0.0 2023-11-19 01:31:39,562 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.84 vs. limit=22.5 2023-11-19 01:32:05,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=503460.0, ans=0.2 2023-11-19 01:32:21,467 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3400, loss[loss=0.1106, simple_loss=0.1403, pruned_loss=0.03275, audio_tagging_loss=0.007695, over 15618.00 frames. ], tot_loss[loss=0.09459, simple_loss=0.1118, pruned_loss=0.02772, audio_tagging_loss=0.01097, over 3048325.27 frames. ], batch size: 55, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:32:22,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=503593.3333333333, ans=0.0 2023-11-19 01:32:25,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=503593.3333333333, ans=0.1 2023-11-19 01:32:26,263 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2023-11-19 01:32:29,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=503593.3333333333, ans=0.125 2023-11-19 01:32:46,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=503726.6666666667, ans=0.0 2023-11-19 01:32:53,221 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.287e+01 9.053e+01 9.903e+01 1.231e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 01:33:10,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=503860.0, ans=0.0 2023-11-19 01:33:11,678 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2023-11-19 01:33:17,099 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3450, loss[loss=0.089, simple_loss=0.1019, pruned_loss=0.02676, audio_tagging_loss=0.01127, over 14630.00 frames. ], tot_loss[loss=0.09456, simple_loss=0.1121, pruned_loss=0.0277, audio_tagging_loss=0.01084, over 3057009.29 frames. ], batch size: 56, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:33:19,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=503926.6666666667, ans=0.09899494936611666 2023-11-19 01:33:32,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503993.3333333333, ans=0.1 2023-11-19 01:33:48,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=504060.0, ans=0.025 2023-11-19 01:33:50,465 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.31 vs. limit=15.0 2023-11-19 01:34:01,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=504193.3333333333, ans=0.0 2023-11-19 01:34:05,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=504193.3333333333, ans=0.1 2023-11-19 01:34:11,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=504193.3333333333, ans=0.5 2023-11-19 01:34:11,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=504193.3333333333, ans=0.0 2023-11-19 01:34:13,636 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3500, loss[loss=0.08813, simple_loss=0.1036, pruned_loss=0.02691, audio_tagging_loss=0.009426, over 14191.00 frames. ], tot_loss[loss=0.09434, simple_loss=0.1118, pruned_loss=0.02771, audio_tagging_loss=0.0107, over 3054720.49 frames. ], batch size: 55, lr: 1.00e-02, grad_scale: 16.0 2023-11-19 01:34:35,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=504393.3333333333, ans=0.0 2023-11-19 01:34:38,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=504393.3333333333, ans=0.0 2023-11-19 01:34:43,109 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:34:45,803 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.641e+01 8.567e+01 9.282e+01 1.040e+02 1.334e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-19 01:35:09,361 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3550, loss[loss=0.08157, simple_loss=0.09416, pruned_loss=0.02292, audio_tagging_loss=0.01157, over 15735.00 frames. ], tot_loss[loss=0.0936, simple_loss=0.1109, pruned_loss=0.02744, audio_tagging_loss=0.01071, over 3045579.91 frames. ], batch size: 61, lr: 1.00e-02, grad_scale: 16.0 2023-11-19 01:35:18,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=504593.3333333333, ans=0.07 2023-11-19 01:35:22,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=504660.0, ans=0.1 2023-11-19 01:35:23,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=504660.0, ans=0.125 2023-11-19 01:35:26,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=504660.0, ans=0.125 2023-11-19 01:35:30,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=504726.6666666667, ans=0.125 2023-11-19 01:35:36,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=504726.6666666667, ans=0.2 2023-11-19 01:35:39,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=504726.6666666667, ans=0.125 2023-11-19 01:35:44,009 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.57 vs. limit=12.0 2023-11-19 01:35:46,207 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.23 vs. limit=10.0 2023-11-19 01:35:59,602 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-11-19 01:36:04,812 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3600, loss[loss=0.09529, simple_loss=0.1197, pruned_loss=0.0212, audio_tagging_loss=0.01423, over 14860.00 frames. ], tot_loss[loss=0.09279, simple_loss=0.1099, pruned_loss=0.02704, audio_tagging_loss=0.01079, over 3043543.68 frames. ], batch size: 56, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:36:22,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=504993.3333333333, ans=0.125 2023-11-19 01:36:36,844 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.969e+01 8.561e+01 9.359e+01 1.025e+02 1.551e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 01:36:40,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=505126.6666666667, ans=0.125 2023-11-19 01:37:00,647 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3650, loss[loss=0.0653, simple_loss=0.06996, pruned_loss=0.01413, audio_tagging_loss=0.01619, over 15054.00 frames. ], tot_loss[loss=0.09283, simple_loss=0.11, pruned_loss=0.02702, audio_tagging_loss=0.0108, over 3046035.59 frames. ], batch size: 58, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:37:07,558 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=15.0 2023-11-19 01:37:14,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=505326.6666666667, ans=0.125 2023-11-19 01:37:25,319 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.67 vs. limit=15.0 2023-11-19 01:37:27,521 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.99 vs. limit=12.0 2023-11-19 01:37:31,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=505393.3333333333, ans=0.125 2023-11-19 01:37:41,707 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.94 vs. limit=15.0 2023-11-19 01:37:43,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=505460.0, ans=0.125 2023-11-19 01:37:49,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=505526.6666666667, ans=0.125 2023-11-19 01:37:55,875 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3700, loss[loss=0.0832, simple_loss=0.09202, pruned_loss=0.02663, audio_tagging_loss=0.01055, over 15173.00 frames. ], tot_loss[loss=0.09361, simple_loss=0.1107, pruned_loss=0.02747, audio_tagging_loss=0.01079, over 3051455.35 frames. ], batch size: 58, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:38:00,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=505593.3333333333, ans=0.125 2023-11-19 01:38:07,581 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.90 vs. limit=15.0 2023-11-19 01:38:12,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=505660.0, ans=0.125 2023-11-19 01:38:28,860 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.900e+01 9.822e+01 1.122e+02 1.774e+02, threshold=1.964e+02, percent-clipped=0.0 2023-11-19 01:38:51,784 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3750, loss[loss=0.1038, simple_loss=0.119, pruned_loss=0.03222, audio_tagging_loss=0.01213, over 15220.00 frames. ], tot_loss[loss=0.09397, simple_loss=0.1106, pruned_loss=0.02774, audio_tagging_loss=0.01093, over 3056600.54 frames. ], batch size: 55, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:38:53,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.33 vs. limit=22.5 2023-11-19 01:39:30,907 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:39:44,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=506193.3333333333, ans=0.125 2023-11-19 01:39:48,458 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3800, loss[loss=0.1135, simple_loss=0.1428, pruned_loss=0.03457, audio_tagging_loss=0.00756, over 15251.00 frames. ], tot_loss[loss=0.09484, simple_loss=0.1115, pruned_loss=0.02817, audio_tagging_loss=0.01091, over 3051066.58 frames. ], batch size: 56, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:40:04,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=506326.6666666667, ans=0.125 2023-11-19 01:40:20,055 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.858e+01 8.899e+01 9.421e+01 1.052e+02 1.490e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-19 01:40:22,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=506460.0, ans=0.125 2023-11-19 01:40:22,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=506460.0, ans=0.2 2023-11-19 01:40:43,419 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3850, loss[loss=0.09684, simple_loss=0.1016, pruned_loss=0.03394, audio_tagging_loss=0.01208, over 14582.00 frames. ], tot_loss[loss=0.09512, simple_loss=0.1119, pruned_loss=0.02824, audio_tagging_loss=0.01094, over 3048653.41 frames. ], batch size: 57, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:40:51,373 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2023-11-19 01:40:58,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=506660.0, ans=0.0 2023-11-19 01:41:03,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=506660.0, ans=0.0 2023-11-19 01:41:06,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=506660.0, ans=0.125 2023-11-19 01:41:07,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=506726.6666666667, ans=0.1 2023-11-19 01:41:41,578 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3900, loss[loss=0.1235, simple_loss=0.1523, pruned_loss=0.03687, audio_tagging_loss=0.01051, over 16167.00 frames. ], tot_loss[loss=0.09504, simple_loss=0.1119, pruned_loss=0.02808, audio_tagging_loss=0.01102, over 3050008.55 frames. ], batch size: 56, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:41:46,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=506926.6666666667, ans=0.125 2023-11-19 01:41:50,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=506926.6666666667, ans=0.125 2023-11-19 01:42:03,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=507060.0, ans=0.0 2023-11-19 01:42:05,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=507060.0, ans=0.2 2023-11-19 01:42:05,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=507060.0, ans=0.0 2023-11-19 01:42:13,963 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.813e+01 8.583e+01 9.293e+01 1.003e+02 1.876e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 01:42:14,863 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.44 vs. limit=5.0 2023-11-19 01:42:38,334 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 3950, loss[loss=0.08009, simple_loss=0.101, pruned_loss=0.01817, audio_tagging_loss=0.01143, over 16220.00 frames. ], tot_loss[loss=0.09506, simple_loss=0.1119, pruned_loss=0.02806, audio_tagging_loss=0.01105, over 3053627.78 frames. ], batch size: 60, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:42:52,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=507326.6666666667, ans=10.0 2023-11-19 01:42:58,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.13 vs. limit=10.0 2023-11-19 01:43:13,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=507460.0, ans=0.125 2023-11-19 01:43:33,329 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4000, loss[loss=0.08892, simple_loss=0.09751, pruned_loss=0.0276, audio_tagging_loss=0.01256, over 15616.00 frames. ], tot_loss[loss=0.09475, simple_loss=0.1114, pruned_loss=0.02787, audio_tagging_loss=0.01121, over 3049788.15 frames. ], batch size: 60, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:43:49,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=507660.0, ans=0.1 2023-11-19 01:44:03,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=507726.6666666667, ans=0.1 2023-11-19 01:44:06,446 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.614e+01 9.104e+01 9.882e+01 1.124e+02 1.409e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-19 01:44:10,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=507793.3333333333, ans=0.125 2023-11-19 01:44:23,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=507860.0, ans=0.0 2023-11-19 01:44:28,700 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4050, loss[loss=0.09466, simple_loss=0.1186, pruned_loss=0.02287, audio_tagging_loss=0.0125, over 14046.00 frames. ], tot_loss[loss=0.09482, simple_loss=0.1113, pruned_loss=0.02788, audio_tagging_loss=0.01127, over 3046710.26 frames. ], batch size: 57, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:44:29,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=507926.6666666667, ans=0.015 2023-11-19 01:44:32,460 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:44:38,787 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2023-11-19 01:45:07,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=508126.6666666667, ans=0.1 2023-11-19 01:45:24,984 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4100, loss[loss=0.1017, simple_loss=0.1123, pruned_loss=0.03298, audio_tagging_loss=0.01255, over 15104.00 frames. ], tot_loss[loss=0.09474, simple_loss=0.1111, pruned_loss=0.0279, audio_tagging_loss=0.01128, over 3044409.27 frames. ], batch size: 55, lr: 9.99e-03, grad_scale: 32.0 2023-11-19 01:45:36,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=508326.6666666667, ans=0.2 2023-11-19 01:45:48,170 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.34 vs. limit=22.5 2023-11-19 01:45:53,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=508393.3333333333, ans=0.0 2023-11-19 01:45:56,109 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.454e+01 9.161e+01 9.715e+01 1.284e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 01:45:58,433 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.21 vs. limit=22.5 2023-11-19 01:46:14,364 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=508526.6666666667, ans=0.95 2023-11-19 01:46:20,585 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4150, loss[loss=0.0868, simple_loss=0.08974, pruned_loss=0.03064, audio_tagging_loss=0.0113, over 14922.00 frames. ], tot_loss[loss=0.0941, simple_loss=0.1105, pruned_loss=0.02766, audio_tagging_loss=0.01122, over 3040520.99 frames. ], batch size: 58, lr: 9.99e-03, grad_scale: 32.0 2023-11-19 01:46:33,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=508660.0, ans=0.0 2023-11-19 01:46:33,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=508660.0, ans=0.125 2023-11-19 01:46:35,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=508660.0, ans=0.125 2023-11-19 01:46:49,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=508726.6666666667, ans=0.125 2023-11-19 01:47:01,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=508793.3333333333, ans=0.125 2023-11-19 01:47:01,988 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:47:08,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=508860.0, ans=0.2 2023-11-19 01:47:15,857 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4200, loss[loss=0.1005, simple_loss=0.1147, pruned_loss=0.03189, audio_tagging_loss=0.01127, over 14631.00 frames. ], tot_loss[loss=0.09351, simple_loss=0.1102, pruned_loss=0.02737, audio_tagging_loss=0.01107, over 3039439.66 frames. ], batch size: 56, lr: 9.99e-03, grad_scale: 32.0 2023-11-19 01:47:15,995 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=508926.6666666667, ans=0.0 2023-11-19 01:47:21,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=508926.6666666667, ans=0.035 2023-11-19 01:47:26,769 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.22 vs. limit=22.5 2023-11-19 01:47:48,449 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.868e+01 9.396e+01 1.025e+02 1.839e+02, threshold=1.879e+02, percent-clipped=1.0 2023-11-19 01:47:50,042 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.51 vs. limit=6.0 2023-11-19 01:47:50,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=509126.6666666667, ans=0.125 2023-11-19 01:47:58,121 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:48:11,860 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4250, loss[loss=0.1082, simple_loss=0.1264, pruned_loss=0.03411, audio_tagging_loss=0.01086, over 15185.00 frames. ], tot_loss[loss=0.09339, simple_loss=0.11, pruned_loss=0.02735, audio_tagging_loss=0.01106, over 3040748.55 frames. ], batch size: 56, lr: 9.98e-03, grad_scale: 32.0 2023-11-19 01:48:21,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=509260.0, ans=0.125 2023-11-19 01:48:30,849 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.95 vs. limit=15.0 2023-11-19 01:48:36,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=509393.3333333333, ans=10.0 2023-11-19 01:48:37,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=509393.3333333333, ans=0.2 2023-11-19 01:48:45,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=509460.0, ans=0.125 2023-11-19 01:48:47,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=509460.0, ans=0.0 2023-11-19 01:48:48,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=509460.0, ans=0.1 2023-11-19 01:48:49,130 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.84 vs. limit=22.5 2023-11-19 01:48:53,338 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.53 vs. limit=12.0 2023-11-19 01:48:57,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=509526.6666666667, ans=0.2 2023-11-19 01:48:59,047 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.86 vs. limit=10.0 2023-11-19 01:49:07,981 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4300, loss[loss=0.09488, simple_loss=0.1113, pruned_loss=0.02715, audio_tagging_loss=0.01208, over 15874.00 frames. ], tot_loss[loss=0.09331, simple_loss=0.1099, pruned_loss=0.02728, audio_tagging_loss=0.01105, over 3040260.76 frames. ], batch size: 59, lr: 9.98e-03, grad_scale: 32.0 2023-11-19 01:49:26,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=509660.0, ans=0.125 2023-11-19 01:49:29,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=509726.6666666667, ans=0.125 2023-11-19 01:49:33,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=509726.6666666667, ans=0.125 2023-11-19 01:49:38,031 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.35 vs. limit=15.0 2023-11-19 01:49:39,660 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.988e+01 8.902e+01 9.994e+01 1.090e+02 2.369e+02, threshold=1.999e+02, percent-clipped=2.0 2023-11-19 01:49:51,531 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.86 vs. limit=15.0 2023-11-19 01:50:02,568 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4350, loss[loss=0.08203, simple_loss=0.09996, pruned_loss=0.01964, audio_tagging_loss=0.0124, over 14864.00 frames. ], tot_loss[loss=0.09324, simple_loss=0.1099, pruned_loss=0.02727, audio_tagging_loss=0.01102, over 3035449.30 frames. ], batch size: 56, lr: 9.98e-03, grad_scale: 32.0 2023-11-19 01:50:05,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=509926.6666666667, ans=0.125 2023-11-19 01:50:12,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=509993.3333333333, ans=0.1 2023-11-19 01:50:37,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=510126.6666666667, ans=0.1 2023-11-19 01:50:39,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=510126.6666666667, ans=0.0 2023-11-19 01:50:52,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=510193.3333333333, ans=0.125 2023-11-19 01:50:55,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=510193.3333333333, ans=0.125 2023-11-19 01:50:58,221 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4400, loss[loss=0.09863, simple_loss=0.1151, pruned_loss=0.02813, audio_tagging_loss=0.01294, over 15564.00 frames. ], tot_loss[loss=0.09237, simple_loss=0.109, pruned_loss=0.02693, audio_tagging_loss=0.01095, over 3039086.37 frames. ], batch size: 58, lr: 9.98e-03, grad_scale: 32.0 2023-11-19 01:51:11,062 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.18 vs. limit=15.0 2023-11-19 01:51:30,391 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.846e+01 8.237e+01 9.053e+01 9.942e+01 1.233e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 01:51:46,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=510526.6666666667, ans=0.125 2023-11-19 01:51:54,567 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4450, loss[loss=0.1006, simple_loss=0.1209, pruned_loss=0.03221, audio_tagging_loss=0.007985, over 14858.00 frames. ], tot_loss[loss=0.0932, simple_loss=0.11, pruned_loss=0.0273, audio_tagging_loss=0.0109, over 3043575.04 frames. ], batch size: 55, lr: 9.97e-03, grad_scale: 32.0 2023-11-19 01:51:59,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=510593.3333333333, ans=0.1 2023-11-19 01:52:00,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=510593.3333333333, ans=0.125 2023-11-19 01:52:15,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=510726.6666666667, ans=0.1 2023-11-19 01:52:16,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=510726.6666666667, ans=0.1 2023-11-19 01:52:34,252 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.66 vs. limit=15.0 2023-11-19 01:52:37,518 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=15.0 2023-11-19 01:52:42,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=510860.0, ans=0.125 2023-11-19 01:52:49,744 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4500, loss[loss=0.1073, simple_loss=0.1324, pruned_loss=0.03142, audio_tagging_loss=0.009718, over 16213.00 frames. ], tot_loss[loss=0.09344, simple_loss=0.1105, pruned_loss=0.02732, audio_tagging_loss=0.01086, over 3050243.36 frames. ], batch size: 58, lr: 9.97e-03, grad_scale: 32.0 2023-11-19 01:53:11,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=511060.0, ans=0.0 2023-11-19 01:53:22,565 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 9.019e+01 9.835e+01 1.060e+02 1.349e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-19 01:53:27,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=511126.6666666667, ans=0.125 2023-11-19 01:53:31,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=511126.6666666667, ans=0.0 2023-11-19 01:53:34,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=511193.3333333333, ans=0.2 2023-11-19 01:53:40,154 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=12.0 2023-11-19 01:53:43,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=511193.3333333333, ans=0.04949747468305833 2023-11-19 01:53:45,532 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4550, loss[loss=0.07878, simple_loss=0.0895, pruned_loss=0.02133, audio_tagging_loss=0.01269, over 15989.00 frames. ], tot_loss[loss=0.09389, simple_loss=0.1113, pruned_loss=0.02747, audio_tagging_loss=0.01079, over 3053560.20 frames. ], batch size: 61, lr: 9.97e-03, grad_scale: 32.0 2023-11-19 01:54:11,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=511393.3333333333, ans=0.125 2023-11-19 01:54:13,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=511393.3333333333, ans=0.2 2023-11-19 01:54:29,222 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:54:41,986 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4600, loss[loss=0.08975, simple_loss=0.09782, pruned_loss=0.02684, audio_tagging_loss=0.014, over 15794.00 frames. ], tot_loss[loss=0.0937, simple_loss=0.1107, pruned_loss=0.0274, audio_tagging_loss=0.01096, over 3049148.83 frames. ], batch size: 62, lr: 9.96e-03, grad_scale: 32.0 2023-11-19 01:54:44,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=511593.3333333333, ans=0.125 2023-11-19 01:54:51,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=511593.3333333333, ans=0.04949747468305833 2023-11-19 01:55:01,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=511660.0, ans=0.125 2023-11-19 01:55:06,812 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.32 vs. limit=22.5 2023-11-19 01:55:14,086 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.023e+01 8.730e+01 9.456e+01 1.065e+02 1.421e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-19 01:55:14,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=511793.3333333333, ans=0.2 2023-11-19 01:55:26,098 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:55:36,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=511860.0, ans=0.125 2023-11-19 01:55:37,982 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4650, loss[loss=0.103, simple_loss=0.1219, pruned_loss=0.03159, audio_tagging_loss=0.0104, over 15279.00 frames. ], tot_loss[loss=0.09446, simple_loss=0.1119, pruned_loss=0.02763, audio_tagging_loss=0.01089, over 3047099.50 frames. ], batch size: 57, lr: 9.96e-03, grad_scale: 32.0 2023-11-19 01:55:40,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=511926.6666666667, ans=0.125 2023-11-19 01:55:45,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=511926.6666666667, ans=0.5 2023-11-19 01:55:59,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=512060.0, ans=0.0 2023-11-19 01:56:11,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=512126.6666666667, ans=0.1 2023-11-19 01:56:17,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=512126.6666666667, ans=0.0 2023-11-19 01:56:25,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=512193.3333333333, ans=0.125 2023-11-19 01:56:29,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=512193.3333333333, ans=0.125 2023-11-19 01:56:29,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=512193.3333333333, ans=0.0 2023-11-19 01:56:32,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=512260.0, ans=0.125 2023-11-19 01:56:33,517 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4700, loss[loss=0.1162, simple_loss=0.1344, pruned_loss=0.03861, audio_tagging_loss=0.01043, over 15325.00 frames. ], tot_loss[loss=0.09491, simple_loss=0.1122, pruned_loss=0.02785, audio_tagging_loss=0.01094, over 3051002.33 frames. ], batch size: 57, lr: 9.96e-03, grad_scale: 32.0 2023-11-19 01:56:43,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=512260.0, ans=0.1 2023-11-19 01:56:55,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=512393.3333333333, ans=0.0 2023-11-19 01:57:05,666 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.776e+01 8.634e+01 9.306e+01 1.008e+02 1.470e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-19 01:57:08,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=512460.0, ans=0.0 2023-11-19 01:57:29,459 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4750, loss[loss=0.1005, simple_loss=0.11, pruned_loss=0.03003, audio_tagging_loss=0.0155, over 17208.00 frames. ], tot_loss[loss=0.0946, simple_loss=0.1115, pruned_loss=0.02773, audio_tagging_loss=0.01109, over 3049506.72 frames. ], batch size: 63, lr: 9.95e-03, grad_scale: 32.0 2023-11-19 01:58:11,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=512793.3333333333, ans=0.0 2023-11-19 01:58:18,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=512860.0, ans=0.0 2023-11-19 01:58:19,121 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.70 vs. limit=15.0 2023-11-19 01:58:23,435 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.89 vs. limit=22.5 2023-11-19 01:58:25,649 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4800, loss[loss=0.09457, simple_loss=0.1149, pruned_loss=0.02831, audio_tagging_loss=0.008832, over 15810.00 frames. ], tot_loss[loss=0.09471, simple_loss=0.1117, pruned_loss=0.02773, audio_tagging_loss=0.01113, over 3052198.30 frames. ], batch size: 61, lr: 9.95e-03, grad_scale: 32.0 2023-11-19 01:58:34,318 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.114e-02 2023-11-19 01:58:47,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=513060.0, ans=0.125 2023-11-19 01:58:57,703 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 8.526e+01 9.167e+01 1.022e+02 1.486e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-19 01:59:19,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=513260.0, ans=0.1 2023-11-19 01:59:20,440 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4850, loss[loss=0.07266, simple_loss=0.08842, pruned_loss=0.01819, audio_tagging_loss=0.01025, over 16360.00 frames. ], tot_loss[loss=0.09455, simple_loss=0.1116, pruned_loss=0.02758, audio_tagging_loss=0.01118, over 3061160.96 frames. ], batch size: 62, lr: 9.95e-03, grad_scale: 32.0 2023-11-19 01:59:25,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=513260.0, ans=0.125 2023-11-19 01:59:34,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=513326.6666666667, ans=0.0 2023-11-19 01:59:42,152 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.25 vs. limit=15.0 2023-11-19 02:00:17,672 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4900, loss[loss=0.09379, simple_loss=0.1254, pruned_loss=0.02528, audio_tagging_loss=0.005817, over 15340.00 frames. ], tot_loss[loss=0.09403, simple_loss=0.1112, pruned_loss=0.02742, audio_tagging_loss=0.01103, over 3049339.84 frames. ], batch size: 57, lr: 9.94e-03, grad_scale: 32.0 2023-11-19 02:00:17,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=513593.3333333333, ans=0.0 2023-11-19 02:00:35,936 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=513660.0, ans=0.125 2023-11-19 02:00:37,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=513660.0, ans=0.1 2023-11-19 02:00:39,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=513726.6666666667, ans=0.1 2023-11-19 02:00:45,173 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.82 vs. limit=15.0 2023-11-19 02:00:45,954 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:00:49,353 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.320e+01 8.524e+01 9.139e+01 9.781e+01 1.634e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 02:00:50,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=513793.3333333333, ans=0.0 2023-11-19 02:01:12,689 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 4950, loss[loss=0.1062, simple_loss=0.1378, pruned_loss=0.02962, audio_tagging_loss=0.00767, over 17014.00 frames. ], tot_loss[loss=0.09352, simple_loss=0.1105, pruned_loss=0.02725, audio_tagging_loss=0.01102, over 3047118.10 frames. ], batch size: 60, lr: 9.94e-03, grad_scale: 32.0 2023-11-19 02:01:17,477 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.41 vs. limit=10.0 2023-11-19 02:01:32,269 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.44 vs. limit=22.5 2023-11-19 02:01:36,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=514060.0, ans=0.125 2023-11-19 02:01:44,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=514060.0, ans=0.0 2023-11-19 02:01:50,052 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.34 vs. limit=15.0 2023-11-19 02:01:56,633 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.20 vs. limit=15.0 2023-11-19 02:02:08,129 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5000, loss[loss=0.08706, simple_loss=0.1119, pruned_loss=0.02278, audio_tagging_loss=0.008319, over 15852.00 frames. ], tot_loss[loss=0.09334, simple_loss=0.1104, pruned_loss=0.02726, audio_tagging_loss=0.01086, over 3051051.10 frames. ], batch size: 58, lr: 9.94e-03, grad_scale: 32.0 2023-11-19 02:02:15,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=514260.0, ans=0.0 2023-11-19 02:02:26,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=514326.6666666667, ans=0.1 2023-11-19 02:02:32,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=514393.3333333333, ans=0.2 2023-11-19 02:02:40,225 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.705e+01 9.523e+01 1.061e+02 1.468e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-19 02:02:41,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=514460.0, ans=0.1 2023-11-19 02:02:50,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=514460.0, ans=0.015 2023-11-19 02:02:59,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=514526.6666666667, ans=0.2 2023-11-19 02:03:04,445 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5050, loss[loss=0.08825, simple_loss=0.1111, pruned_loss=0.01968, audio_tagging_loss=0.01303, over 15442.00 frames. ], tot_loss[loss=0.09248, simple_loss=0.1093, pruned_loss=0.02702, audio_tagging_loss=0.01082, over 3042374.83 frames. ], batch size: 56, lr: 9.93e-03, grad_scale: 32.0 2023-11-19 02:03:28,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=514726.6666666667, ans=0.2 2023-11-19 02:03:51,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=514860.0, ans=0.125 2023-11-19 02:03:58,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=514926.6666666667, ans=0.0 2023-11-19 02:03:59,686 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5100, loss[loss=0.0894, simple_loss=0.1172, pruned_loss=0.0227, audio_tagging_loss=0.00812, over 15502.00 frames. ], tot_loss[loss=0.09225, simple_loss=0.1091, pruned_loss=0.02689, audio_tagging_loss=0.01081, over 3045321.23 frames. ], batch size: 59, lr: 9.93e-03, grad_scale: 32.0 2023-11-19 02:04:14,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=514993.3333333333, ans=0.2 2023-11-19 02:04:18,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=514993.3333333333, ans=0.125 2023-11-19 02:04:20,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=514993.3333333333, ans=0.125 2023-11-19 02:04:32,431 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.830e+01 8.394e+01 9.066e+01 1.016e+02 2.426e+02, threshold=1.813e+02, percent-clipped=1.0 2023-11-19 02:04:53,010 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.21 vs. limit=10.0 2023-11-19 02:04:54,076 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2023-11-19 02:04:54,521 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5150, loss[loss=0.09231, simple_loss=0.09995, pruned_loss=0.03177, audio_tagging_loss=0.01057, over 16723.00 frames. ], tot_loss[loss=0.09196, simple_loss=0.1088, pruned_loss=0.0267, audio_tagging_loss=0.01085, over 3042411.57 frames. ], batch size: 66, lr: 9.93e-03, grad_scale: 32.0 2023-11-19 02:04:55,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=515260.0, ans=0.125 2023-11-19 02:05:06,486 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=515326.6666666667, ans=0.125 2023-11-19 02:05:06,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=515326.6666666667, ans=0.0 2023-11-19 02:05:17,207 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=7.793e-03 2023-11-19 02:05:31,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=515460.0, ans=0.125 2023-11-19 02:05:50,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=515593.3333333333, ans=0.1 2023-11-19 02:05:51,344 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5200, loss[loss=0.09298, simple_loss=0.1146, pruned_loss=0.02695, audio_tagging_loss=0.008734, over 15639.00 frames. ], tot_loss[loss=0.09181, simple_loss=0.1087, pruned_loss=0.02667, audio_tagging_loss=0.01082, over 3043365.10 frames. ], batch size: 56, lr: 9.92e-03, grad_scale: 32.0 2023-11-19 02:06:02,890 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.08 vs. limit=6.0 2023-11-19 02:06:22,515 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.952e+01 8.557e+01 9.238e+01 1.015e+02 1.542e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 02:06:36,034 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=515860.0, ans=22.5 2023-11-19 02:06:43,357 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.28 vs. limit=15.0 2023-11-19 02:06:46,886 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5250, loss[loss=0.0782, simple_loss=0.09553, pruned_loss=0.01575, audio_tagging_loss=0.01468, over 15171.00 frames. ], tot_loss[loss=0.0929, simple_loss=0.1101, pruned_loss=0.02705, audio_tagging_loss=0.01081, over 3044604.34 frames. ], batch size: 57, lr: 9.92e-03, grad_scale: 32.0 2023-11-19 02:07:04,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=515993.3333333333, ans=0.125 2023-11-19 02:07:15,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=516060.0, ans=0.2 2023-11-19 02:07:20,015 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.50 vs. limit=22.5 2023-11-19 02:07:20,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=516126.6666666667, ans=0.125 2023-11-19 02:07:35,048 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.78 vs. limit=15.0 2023-11-19 02:07:37,921 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=516193.3333333333, ans=0.1 2023-11-19 02:07:41,871 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5300, loss[loss=0.09558, simple_loss=0.1125, pruned_loss=0.02798, audio_tagging_loss=0.01137, over 16363.00 frames. ], tot_loss[loss=0.0941, simple_loss=0.1116, pruned_loss=0.02764, audio_tagging_loss=0.01069, over 3047046.88 frames. ], batch size: 64, lr: 9.92e-03, grad_scale: 32.0 2023-11-19 02:08:08,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=516393.3333333333, ans=0.1 2023-11-19 02:08:09,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=516393.3333333333, ans=0.0 2023-11-19 02:08:14,519 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 8.731e+01 9.566e+01 1.032e+02 1.487e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-19 02:08:18,347 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.97 vs. limit=6.0 2023-11-19 02:08:21,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=516460.0, ans=0.125 2023-11-19 02:08:37,652 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5350, loss[loss=0.1273, simple_loss=0.1556, pruned_loss=0.0407, audio_tagging_loss=0.00881, over 15548.00 frames. ], tot_loss[loss=0.09448, simple_loss=0.112, pruned_loss=0.02779, audio_tagging_loss=0.01067, over 3049347.11 frames. ], batch size: 54, lr: 9.91e-03, grad_scale: 32.0 2023-11-19 02:08:38,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=516593.3333333333, ans=0.0 2023-11-19 02:08:45,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=516593.3333333333, ans=0.125 2023-11-19 02:08:53,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=516660.0, ans=0.125 2023-11-19 02:08:54,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=516660.0, ans=0.1 2023-11-19 02:09:00,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=516726.6666666667, ans=0.1 2023-11-19 02:09:01,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=516726.6666666667, ans=0.0 2023-11-19 02:09:05,240 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=516726.6666666667, ans=0.125 2023-11-19 02:09:26,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=516860.0, ans=0.1 2023-11-19 02:09:29,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=516860.0, ans=0.125 2023-11-19 02:09:33,609 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5400, loss[loss=0.1075, simple_loss=0.1259, pruned_loss=0.03322, audio_tagging_loss=0.01134, over 15289.00 frames. ], tot_loss[loss=0.09375, simple_loss=0.1111, pruned_loss=0.02746, audio_tagging_loss=0.01075, over 3045036.50 frames. ], batch size: 57, lr: 9.91e-03, grad_scale: 32.0 2023-11-19 02:09:51,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=516993.3333333333, ans=0.125 2023-11-19 02:10:05,677 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.972e+01 8.660e+01 9.823e+01 1.115e+02 1.582e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-19 02:10:12,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=517126.6666666667, ans=0.125 2023-11-19 02:10:24,647 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.28 vs. limit=22.5 2023-11-19 02:10:27,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=517260.0, ans=0.1 2023-11-19 02:10:28,358 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5450, loss[loss=0.1007, simple_loss=0.1164, pruned_loss=0.03273, audio_tagging_loss=0.009779, over 15336.00 frames. ], tot_loss[loss=0.09369, simple_loss=0.1108, pruned_loss=0.02743, audio_tagging_loss=0.01085, over 3048662.49 frames. ], batch size: 60, lr: 9.91e-03, grad_scale: 32.0 2023-11-19 02:10:29,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=517260.0, ans=0.1 2023-11-19 02:10:38,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=517326.6666666667, ans=0.0 2023-11-19 02:10:50,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=517326.6666666667, ans=0.2 2023-11-19 02:10:58,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=517393.3333333333, ans=0.1 2023-11-19 02:11:10,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=517460.0, ans=0.09899494936611666 2023-11-19 02:11:23,475 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.70 vs. limit=15.0 2023-11-19 02:11:23,703 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=12.0 2023-11-19 02:11:24,126 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5500, loss[loss=0.1039, simple_loss=0.1265, pruned_loss=0.0308, audio_tagging_loss=0.009829, over 15928.00 frames. ], tot_loss[loss=0.09354, simple_loss=0.1108, pruned_loss=0.02723, audio_tagging_loss=0.01091, over 3049779.19 frames. ], batch size: 59, lr: 9.90e-03, grad_scale: 64.0 2023-11-19 02:11:31,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=517593.3333333333, ans=0.2 2023-11-19 02:11:33,960 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=517593.3333333333, ans=0.0 2023-11-19 02:11:39,620 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.41 vs. limit=15.0 2023-11-19 02:11:41,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=517660.0, ans=0.0 2023-11-19 02:11:46,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=517726.6666666667, ans=0.125 2023-11-19 02:11:55,915 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.797e+01 8.498e+01 9.493e+01 1.061e+02 1.375e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-19 02:12:05,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=517793.3333333333, ans=0.125 2023-11-19 02:12:14,311 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.30 vs. limit=22.5 2023-11-19 02:12:20,006 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5550, loss[loss=0.1073, simple_loss=0.1341, pruned_loss=0.02992, audio_tagging_loss=0.01034, over 15176.00 frames. ], tot_loss[loss=0.09331, simple_loss=0.1102, pruned_loss=0.02706, audio_tagging_loss=0.01116, over 3047623.73 frames. ], batch size: 55, lr: 9.90e-03, grad_scale: 64.0 2023-11-19 02:12:28,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=517926.6666666667, ans=0.025 2023-11-19 02:12:34,096 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=15.0 2023-11-19 02:13:01,691 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.50 vs. limit=22.5 2023-11-19 02:13:11,018 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:13:11,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=518193.3333333333, ans=0.125 2023-11-19 02:13:15,067 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5600, loss[loss=0.07217, simple_loss=0.08263, pruned_loss=0.01837, audio_tagging_loss=0.01249, over 15758.00 frames. ], tot_loss[loss=0.09352, simple_loss=0.1103, pruned_loss=0.02717, audio_tagging_loss=0.01119, over 3049796.63 frames. ], batch size: 62, lr: 9.90e-03, grad_scale: 64.0 2023-11-19 02:13:16,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=518260.0, ans=0.125 2023-11-19 02:13:45,666 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.59 vs. limit=15.0 2023-11-19 02:13:46,977 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.885e+01 8.345e+01 9.240e+01 1.027e+02 1.400e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 02:13:55,454 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 02:13:57,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=518526.6666666667, ans=0.0 2023-11-19 02:14:08,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=518593.3333333333, ans=0.09899494936611666 2023-11-19 02:14:09,779 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5650, loss[loss=0.06304, simple_loss=0.0668, pruned_loss=0.01707, audio_tagging_loss=0.01257, over 15003.00 frames. ], tot_loss[loss=0.09366, simple_loss=0.1102, pruned_loss=0.02729, audio_tagging_loss=0.01126, over 3047489.47 frames. ], batch size: 58, lr: 9.90e-03, grad_scale: 64.0 2023-11-19 02:14:09,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=518593.3333333333, ans=0.0 2023-11-19 02:14:26,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=518660.0, ans=0.125 2023-11-19 02:14:30,589 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.38 vs. limit=15.0 2023-11-19 02:14:37,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=518726.6666666667, ans=0.0 2023-11-19 02:14:49,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=518793.3333333333, ans=0.125 2023-11-19 02:14:57,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=518860.0, ans=0.2 2023-11-19 02:14:59,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=518860.0, ans=0.0 2023-11-19 02:15:06,279 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5700, loss[loss=0.08464, simple_loss=0.09767, pruned_loss=0.02469, audio_tagging_loss=0.01112, over 14244.00 frames. ], tot_loss[loss=0.09327, simple_loss=0.1094, pruned_loss=0.02726, audio_tagging_loss=0.01129, over 3049615.72 frames. ], batch size: 55, lr: 9.89e-03, grad_scale: 64.0 2023-11-19 02:15:07,627 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:15:09,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=518926.6666666667, ans=0.125 2023-11-19 02:15:19,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=518993.3333333333, ans=0.0 2023-11-19 02:15:38,526 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.416e+01 9.360e+01 1.085e+02 2.101e+02, threshold=1.872e+02, percent-clipped=1.0 2023-11-19 02:16:01,537 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5750, loss[loss=0.08425, simple_loss=0.09672, pruned_loss=0.02535, audio_tagging_loss=0.01054, over 15275.00 frames. ], tot_loss[loss=0.09315, simple_loss=0.1098, pruned_loss=0.02719, audio_tagging_loss=0.01107, over 3057600.78 frames. ], batch size: 59, lr: 9.89e-03, grad_scale: 32.0 2023-11-19 02:16:13,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=519326.6666666667, ans=0.2 2023-11-19 02:16:14,869 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.97 vs. limit=10.0 2023-11-19 02:16:27,093 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.33 vs. limit=22.5 2023-11-19 02:16:30,236 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-19 02:16:56,630 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5800, loss[loss=0.1138, simple_loss=0.1429, pruned_loss=0.03295, audio_tagging_loss=0.009374, over 16334.00 frames. ], tot_loss[loss=0.09334, simple_loss=0.1099, pruned_loss=0.02734, audio_tagging_loss=0.01103, over 3052861.40 frames. ], batch size: 60, lr: 9.89e-03, grad_scale: 16.0 2023-11-19 02:17:07,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=519660.0, ans=0.0 2023-11-19 02:17:10,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=519660.0, ans=0.125 2023-11-19 02:17:31,533 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.960e+01 8.540e+01 9.536e+01 1.116e+02 2.278e+02, threshold=1.907e+02, percent-clipped=1.0 2023-11-19 02:17:33,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=519793.3333333333, ans=0.1 2023-11-19 02:17:53,444 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5850, loss[loss=0.1049, simple_loss=0.142, pruned_loss=0.02558, audio_tagging_loss=0.008353, over 15112.00 frames. ], tot_loss[loss=0.09299, simple_loss=0.11, pruned_loss=0.02709, audio_tagging_loss=0.01091, over 3053757.08 frames. ], batch size: 54, lr: 9.88e-03, grad_scale: 16.0 2023-11-19 02:18:02,140 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.80 vs. limit=15.0 2023-11-19 02:18:02,181 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.28 vs. limit=8.0 2023-11-19 02:18:14,743 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=520060.0, ans=0.125 2023-11-19 02:18:42,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=520193.3333333333, ans=0.2 2023-11-19 02:18:43,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=520193.3333333333, ans=0.0 2023-11-19 02:18:49,462 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5900, loss[loss=0.1061, simple_loss=0.1274, pruned_loss=0.03107, audio_tagging_loss=0.01136, over 14972.00 frames. ], tot_loss[loss=0.09312, simple_loss=0.1102, pruned_loss=0.02725, audio_tagging_loss=0.01076, over 3048074.54 frames. ], batch size: 55, lr: 9.88e-03, grad_scale: 16.0 2023-11-19 02:19:23,929 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 8.828e+01 9.582e+01 1.059e+02 2.362e+02, threshold=1.916e+02, percent-clipped=1.0 2023-11-19 02:19:28,683 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.75 vs. limit=15.0 2023-11-19 02:19:30,189 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2023-11-19 02:19:44,674 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 5950, loss[loss=0.1089, simple_loss=0.1314, pruned_loss=0.03267, audio_tagging_loss=0.01048, over 14153.00 frames. ], tot_loss[loss=0.0931, simple_loss=0.1107, pruned_loss=0.02708, audio_tagging_loss=0.01068, over 3050933.24 frames. ], batch size: 54, lr: 9.88e-03, grad_scale: 16.0 2023-11-19 02:19:44,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=520593.3333333333, ans=0.025 2023-11-19 02:20:03,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=520660.0, ans=0.125 2023-11-19 02:20:14,771 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2023-11-19 02:20:29,640 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2023-11-19 02:20:40,635 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6000, loss[loss=0.0857, simple_loss=0.09324, pruned_loss=0.0272, audio_tagging_loss=0.01188, over 14768.00 frames. ], tot_loss[loss=0.09416, simple_loss=0.1121, pruned_loss=0.02746, audio_tagging_loss=0.01064, over 3047606.79 frames. ], batch size: 56, lr: 9.87e-03, grad_scale: 32.0 2023-11-19 02:20:40,636 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 02:21:13,035 INFO [train_asr.py:1147] (2/4) Epoch 7, validation: loss=0.06924, simple_loss=0.05776, pruned_loss=0.007549, audio_tagging_loss=0.0328, over 4681554.00 frames. 2023-11-19 02:21:13,036 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 02:21:16,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=520926.6666666667, ans=0.0 2023-11-19 02:21:33,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=520993.3333333333, ans=0.1 2023-11-19 02:21:37,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=521060.0, ans=0.125 2023-11-19 02:21:41,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=521060.0, ans=0.0 2023-11-19 02:21:47,398 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.133e+01 8.768e+01 9.511e+01 1.039e+02 1.786e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 02:21:49,942 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.02 vs. limit=15.0 2023-11-19 02:21:55,309 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 02:21:59,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=521193.3333333333, ans=0.125 2023-11-19 02:22:06,564 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.91 vs. limit=15.0 2023-11-19 02:22:07,961 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6050, loss[loss=0.08176, simple_loss=0.0876, pruned_loss=0.02593, audio_tagging_loss=0.01203, over 14929.00 frames. ], tot_loss[loss=0.09508, simple_loss=0.1133, pruned_loss=0.02784, audio_tagging_loss=0.0106, over 3055845.09 frames. ], batch size: 56, lr: 9.87e-03, grad_scale: 32.0 2023-11-19 02:22:13,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=521260.0, ans=0.0 2023-11-19 02:22:18,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=521260.0, ans=0.1 2023-11-19 02:22:26,084 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.75 vs. limit=15.0 2023-11-19 02:22:43,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=521460.0, ans=0.0 2023-11-19 02:22:55,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=521526.6666666667, ans=0.0 2023-11-19 02:23:02,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=521526.6666666667, ans=0.125 2023-11-19 02:23:05,054 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6100, loss[loss=0.1245, simple_loss=0.1436, pruned_loss=0.0418, audio_tagging_loss=0.01087, over 14610.00 frames. ], tot_loss[loss=0.09498, simple_loss=0.1131, pruned_loss=0.02782, audio_tagging_loss=0.01061, over 3054106.02 frames. ], batch size: 53, lr: 9.87e-03, grad_scale: 32.0 2023-11-19 02:23:11,587 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=521593.3333333333, ans=0.04949747468305833 2023-11-19 02:23:21,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=521660.0, ans=0.125 2023-11-19 02:23:38,649 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.659e+01 8.713e+01 9.332e+01 1.071e+02 1.394e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-19 02:23:49,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=521860.0, ans=0.1 2023-11-19 02:23:59,653 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6150, loss[loss=0.093, simple_loss=0.1108, pruned_loss=0.02659, audio_tagging_loss=0.01101, over 15516.00 frames. ], tot_loss[loss=0.09488, simple_loss=0.1128, pruned_loss=0.02779, audio_tagging_loss=0.01067, over 3051085.86 frames. ], batch size: 57, lr: 9.86e-03, grad_scale: 32.0 2023-11-19 02:24:08,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=521926.6666666667, ans=0.125 2023-11-19 02:24:21,121 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:24:29,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=522060.0, ans=0.1 2023-11-19 02:24:41,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=522126.6666666667, ans=0.0 2023-11-19 02:24:48,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=522193.3333333333, ans=0.125 2023-11-19 02:24:51,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=522193.3333333333, ans=0.125 2023-11-19 02:24:54,745 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6200, loss[loss=0.116, simple_loss=0.1359, pruned_loss=0.03622, audio_tagging_loss=0.01183, over 15174.00 frames. ], tot_loss[loss=0.09433, simple_loss=0.1119, pruned_loss=0.02754, audio_tagging_loss=0.01083, over 3051014.90 frames. ], batch size: 56, lr: 9.86e-03, grad_scale: 32.0 2023-11-19 02:24:58,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=522260.0, ans=0.0 2023-11-19 02:25:00,181 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.12 vs. limit=15.0 2023-11-19 02:25:01,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=522260.0, ans=0.025 2023-11-19 02:25:25,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=522393.3333333333, ans=0.1 2023-11-19 02:25:28,330 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.684e+01 8.853e+01 9.668e+01 1.071e+02 1.653e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-19 02:25:43,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=522526.6666666667, ans=0.0 2023-11-19 02:25:48,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=522526.6666666667, ans=0.0 2023-11-19 02:25:50,572 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6250, loss[loss=0.1039, simple_loss=0.1332, pruned_loss=0.02889, audio_tagging_loss=0.00836, over 16874.00 frames. ], tot_loss[loss=0.09392, simple_loss=0.1115, pruned_loss=0.02731, audio_tagging_loss=0.01088, over 3053582.25 frames. ], batch size: 62, lr: 9.86e-03, grad_scale: 32.0 2023-11-19 02:25:57,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=522593.3333333333, ans=0.125 2023-11-19 02:25:57,478 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.89 vs. limit=10.0 2023-11-19 02:26:19,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=522726.6666666667, ans=0.125 2023-11-19 02:26:20,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=522726.6666666667, ans=0.125 2023-11-19 02:26:23,169 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.16 vs. limit=22.5 2023-11-19 02:26:37,694 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.08 vs. limit=15.0 2023-11-19 02:26:39,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=522860.0, ans=0.0 2023-11-19 02:26:45,718 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6300, loss[loss=0.09084, simple_loss=0.1056, pruned_loss=0.02606, audio_tagging_loss=0.01197, over 14813.00 frames. ], tot_loss[loss=0.09412, simple_loss=0.1113, pruned_loss=0.02755, audio_tagging_loss=0.0109, over 3048994.13 frames. ], batch size: 56, lr: 9.85e-03, grad_scale: 32.0 2023-11-19 02:26:49,428 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.66 vs. limit=22.5 2023-11-19 02:26:51,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=522926.6666666667, ans=0.125 2023-11-19 02:26:54,849 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.03 vs. limit=15.0 2023-11-19 02:26:57,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=522993.3333333333, ans=0.125 2023-11-19 02:27:22,023 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.961e+01 8.911e+01 9.683e+01 1.073e+02 1.509e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-19 02:27:29,370 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.42 vs. limit=10.0 2023-11-19 02:27:41,850 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6350, loss[loss=0.09798, simple_loss=0.1212, pruned_loss=0.02726, audio_tagging_loss=0.01011, over 15246.00 frames. ], tot_loss[loss=0.09406, simple_loss=0.1111, pruned_loss=0.02746, audio_tagging_loss=0.01104, over 3053297.85 frames. ], batch size: 57, lr: 9.85e-03, grad_scale: 16.0 2023-11-19 02:27:47,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=523260.0, ans=0.125 2023-11-19 02:27:49,228 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2023-11-19 02:27:52,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=523326.6666666667, ans=0.025 2023-11-19 02:28:01,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=523326.6666666667, ans=0.07 2023-11-19 02:28:01,808 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=15.0 2023-11-19 02:28:31,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=523526.6666666667, ans=0.09899494936611666 2023-11-19 02:28:37,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=523593.3333333333, ans=0.125 2023-11-19 02:28:37,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=523593.3333333333, ans=0.125 2023-11-19 02:28:38,413 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6400, loss[loss=0.106, simple_loss=0.1257, pruned_loss=0.03098, audio_tagging_loss=0.01212, over 14903.00 frames. ], tot_loss[loss=0.09309, simple_loss=0.1098, pruned_loss=0.02699, audio_tagging_loss=0.01119, over 3049076.23 frames. ], batch size: 55, lr: 9.85e-03, grad_scale: 32.0 2023-11-19 02:28:45,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=523593.3333333333, ans=0.025 2023-11-19 02:28:59,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.90 vs. limit=15.0 2023-11-19 02:29:13,023 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.312e+01 8.426e+01 9.075e+01 1.046e+02 1.342e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-19 02:29:25,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=523860.0, ans=0.125 2023-11-19 02:29:33,090 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6450, loss[loss=0.07562, simple_loss=0.08598, pruned_loss=0.01954, audio_tagging_loss=0.01309, over 15436.00 frames. ], tot_loss[loss=0.09272, simple_loss=0.1093, pruned_loss=0.0268, audio_tagging_loss=0.01126, over 3047309.28 frames. ], batch size: 59, lr: 9.85e-03, grad_scale: 32.0 2023-11-19 02:29:39,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=523926.6666666667, ans=0.0 2023-11-19 02:29:50,568 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=523993.3333333333, ans=0.125 2023-11-19 02:30:13,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=524126.6666666667, ans=0.0 2023-11-19 02:30:28,434 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6500, loss[loss=0.09091, simple_loss=0.1087, pruned_loss=0.02434, audio_tagging_loss=0.01222, over 14931.00 frames. ], tot_loss[loss=0.09213, simple_loss=0.1086, pruned_loss=0.0266, audio_tagging_loss=0.01124, over 3045637.93 frames. ], batch size: 55, lr: 9.84e-03, grad_scale: 32.0 2023-11-19 02:31:04,407 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.918e+01 8.655e+01 9.206e+01 1.007e+02 1.269e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 02:31:05,998 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2023-11-19 02:31:24,103 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.61 vs. limit=22.5 2023-11-19 02:31:24,664 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6550, loss[loss=0.06837, simple_loss=0.07762, pruned_loss=0.016, audio_tagging_loss=0.01356, over 15843.00 frames. ], tot_loss[loss=0.09244, simple_loss=0.1094, pruned_loss=0.02677, audio_tagging_loss=0.01099, over 3050494.72 frames. ], batch size: 60, lr: 9.84e-03, grad_scale: 32.0 2023-11-19 02:31:26,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=524593.3333333334, ans=0.0 2023-11-19 02:31:27,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=524593.3333333334, ans=0.125 2023-11-19 02:31:36,389 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.16 vs. limit=15.0 2023-11-19 02:31:43,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=524660.0, ans=0.0 2023-11-19 02:31:44,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=524660.0, ans=0.125 2023-11-19 02:31:51,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=524726.6666666666, ans=0.125 2023-11-19 02:31:53,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=524726.6666666666, ans=0.125 2023-11-19 02:31:57,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=524793.3333333334, ans=0.2 2023-11-19 02:31:57,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=524793.3333333334, ans=0.1 2023-11-19 02:32:19,632 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2023-11-19 02:32:20,067 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6600, loss[loss=0.09983, simple_loss=0.1166, pruned_loss=0.03168, audio_tagging_loss=0.009864, over 14825.00 frames. ], tot_loss[loss=0.09213, simple_loss=0.1091, pruned_loss=0.02666, audio_tagging_loss=0.01093, over 3046451.79 frames. ], batch size: 55, lr: 9.84e-03, grad_scale: 32.0 2023-11-19 02:32:20,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=524926.6666666666, ans=0.0 2023-11-19 02:32:23,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=524926.6666666666, ans=0.125 2023-11-19 02:32:31,827 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=524993.3333333334, ans=0.1 2023-11-19 02:32:36,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=524993.3333333334, ans=0.125 2023-11-19 02:32:40,457 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=525060.0, ans=0.0 2023-11-19 02:32:45,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=525060.0, ans=0.1 2023-11-19 02:32:46,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=525060.0, ans=0.125 2023-11-19 02:32:52,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=525126.6666666666, ans=0.07 2023-11-19 02:32:53,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=525126.6666666666, ans=0.07 2023-11-19 02:32:54,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=525126.6666666666, ans=0.0 2023-11-19 02:32:55,754 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.348e+01 8.622e+01 9.360e+01 1.038e+02 1.373e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 02:33:14,200 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.77 vs. limit=22.5 2023-11-19 02:33:14,897 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6650, loss[loss=0.08108, simple_loss=0.09298, pruned_loss=0.02398, audio_tagging_loss=0.01061, over 14224.00 frames. ], tot_loss[loss=0.09224, simple_loss=0.1091, pruned_loss=0.02678, audio_tagging_loss=0.01091, over 3044528.17 frames. ], batch size: 53, lr: 9.83e-03, grad_scale: 32.0 2023-11-19 02:33:34,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=525326.6666666666, ans=0.5 2023-11-19 02:33:45,518 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.80 vs. limit=10.0 2023-11-19 02:33:49,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=525460.0, ans=0.1 2023-11-19 02:34:10,665 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6700, loss[loss=0.0926, simple_loss=0.1018, pruned_loss=0.02963, audio_tagging_loss=0.01208, over 15464.00 frames. ], tot_loss[loss=0.09231, simple_loss=0.1094, pruned_loss=0.02678, audio_tagging_loss=0.01085, over 3049998.16 frames. ], batch size: 60, lr: 9.83e-03, grad_scale: 32.0 2023-11-19 02:34:27,749 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:34:33,503 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.24 vs. limit=6.0 2023-11-19 02:34:35,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=525726.6666666666, ans=0.0 2023-11-19 02:34:45,466 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.124e+01 8.532e+01 9.161e+01 1.021e+02 1.335e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 02:34:58,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=525860.0, ans=0.2 2023-11-19 02:35:04,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=525926.6666666666, ans=0.1 2023-11-19 02:35:05,523 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6750, loss[loss=0.1052, simple_loss=0.1184, pruned_loss=0.03074, audio_tagging_loss=0.01525, over 15362.00 frames. ], tot_loss[loss=0.0936, simple_loss=0.111, pruned_loss=0.02735, audio_tagging_loss=0.01075, over 3049962.68 frames. ], batch size: 57, lr: 9.83e-03, grad_scale: 32.0 2023-11-19 02:35:15,408 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=525993.3333333334, ans=0.05 2023-11-19 02:35:24,874 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=525993.3333333334, ans=0.0 2023-11-19 02:35:26,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=526060.0, ans=0.125 2023-11-19 02:35:29,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=526060.0, ans=0.0 2023-11-19 02:35:40,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=526126.6666666666, ans=0.0 2023-11-19 02:35:53,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=526193.3333333334, ans=0.125 2023-11-19 02:35:57,490 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.93 vs. limit=15.0 2023-11-19 02:36:00,010 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6800, loss[loss=0.08298, simple_loss=0.0968, pruned_loss=0.02554, audio_tagging_loss=0.009044, over 15076.00 frames. ], tot_loss[loss=0.09399, simple_loss=0.1116, pruned_loss=0.02749, audio_tagging_loss=0.01071, over 3043550.97 frames. ], batch size: 59, lr: 9.82e-03, grad_scale: 32.0 2023-11-19 02:36:27,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=526393.3333333334, ans=0.2 2023-11-19 02:36:31,708 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.47 vs. limit=22.5 2023-11-19 02:36:35,399 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.668e+01 8.914e+01 9.908e+01 1.073e+02 1.400e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-19 02:36:52,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=526526.6666666666, ans=0.0 2023-11-19 02:36:54,912 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6850, loss[loss=0.08299, simple_loss=0.09624, pruned_loss=0.02207, audio_tagging_loss=0.01279, over 14929.00 frames. ], tot_loss[loss=0.09336, simple_loss=0.1106, pruned_loss=0.02741, audio_tagging_loss=0.01065, over 3032788.39 frames. ], batch size: 55, lr: 9.82e-03, grad_scale: 32.0 2023-11-19 02:37:01,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=526593.3333333334, ans=0.2 2023-11-19 02:37:04,117 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=12.0 2023-11-19 02:37:04,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=526593.3333333334, ans=0.125 2023-11-19 02:37:27,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=526793.3333333334, ans=0.1 2023-11-19 02:37:48,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=526860.0, ans=0.125 2023-11-19 02:37:48,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=526860.0, ans=0.125 2023-11-19 02:37:50,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=526926.6666666666, ans=0.125 2023-11-19 02:37:51,104 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6900, loss[loss=0.1042, simple_loss=0.1211, pruned_loss=0.03206, audio_tagging_loss=0.01159, over 14717.00 frames. ], tot_loss[loss=0.09355, simple_loss=0.111, pruned_loss=0.02738, audio_tagging_loss=0.01065, over 3037335.42 frames. ], batch size: 56, lr: 9.82e-03, grad_scale: 32.0 2023-11-19 02:37:54,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=526926.6666666666, ans=0.2 2023-11-19 02:38:26,434 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.916e+01 8.284e+01 8.942e+01 9.819e+01 1.283e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-19 02:38:34,400 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 02:38:45,937 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 6950, loss[loss=0.1067, simple_loss=0.1169, pruned_loss=0.03454, audio_tagging_loss=0.01373, over 14751.00 frames. ], tot_loss[loss=0.09349, simple_loss=0.111, pruned_loss=0.02738, audio_tagging_loss=0.01061, over 3039764.23 frames. ], batch size: 56, lr: 9.81e-03, grad_scale: 32.0 2023-11-19 02:38:46,605 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2023-11-19 02:39:28,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=527460.0, ans=0.0 2023-11-19 02:39:40,942 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7000, loss[loss=0.108, simple_loss=0.1305, pruned_loss=0.03349, audio_tagging_loss=0.009247, over 15289.00 frames. ], tot_loss[loss=0.09329, simple_loss=0.1108, pruned_loss=0.02723, audio_tagging_loss=0.01067, over 3040300.07 frames. ], batch size: 57, lr: 9.81e-03, grad_scale: 32.0 2023-11-19 02:39:48,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=527593.3333333334, ans=0.125 2023-11-19 02:39:52,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=527660.0, ans=0.1 2023-11-19 02:40:15,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=527793.3333333334, ans=0.0 2023-11-19 02:40:16,974 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.756e+01 9.618e+01 1.077e+02 1.519e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-19 02:40:32,999 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.53 vs. limit=6.0 2023-11-19 02:40:36,020 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=527860.0, ans=0.2 2023-11-19 02:40:37,802 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7050, loss[loss=0.1014, simple_loss=0.1212, pruned_loss=0.0319, audio_tagging_loss=0.008884, over 16198.00 frames. ], tot_loss[loss=0.09323, simple_loss=0.1106, pruned_loss=0.02723, audio_tagging_loss=0.01071, over 3049851.11 frames. ], batch size: 61, lr: 9.81e-03, grad_scale: 32.0 2023-11-19 02:40:51,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=527993.3333333334, ans=0.125 2023-11-19 02:41:29,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=528193.3333333334, ans=0.025 2023-11-19 02:41:33,642 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7100, loss[loss=0.1011, simple_loss=0.1174, pruned_loss=0.03201, audio_tagging_loss=0.01042, over 15586.00 frames. ], tot_loss[loss=0.09411, simple_loss=0.1118, pruned_loss=0.02749, audio_tagging_loss=0.0107, over 3050031.13 frames. ], batch size: 58, lr: 9.81e-03, grad_scale: 32.0 2023-11-19 02:41:41,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=528260.0, ans=0.09899494936611666 2023-11-19 02:41:41,436 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.31 vs. limit=15.0 2023-11-19 02:42:09,472 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.659e+01 8.375e+01 9.204e+01 1.018e+02 1.304e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 02:42:28,583 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7150, loss[loss=0.07778, simple_loss=0.093, pruned_loss=0.02195, audio_tagging_loss=0.00933, over 15367.00 frames. ], tot_loss[loss=0.09391, simple_loss=0.1115, pruned_loss=0.02743, audio_tagging_loss=0.01072, over 3054568.87 frames. ], batch size: 58, lr: 9.80e-03, grad_scale: 32.0 2023-11-19 02:42:28,990 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.85 vs. limit=22.5 2023-11-19 02:42:31,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=528593.3333333334, ans=0.125 2023-11-19 02:42:40,059 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=9.460e-02 2023-11-19 02:42:55,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=528726.6666666666, ans=0.2 2023-11-19 02:42:59,638 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=18.30 vs. limit=15.0 2023-11-19 02:43:03,877 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.91 vs. limit=15.0 2023-11-19 02:43:04,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=528793.3333333334, ans=0.0 2023-11-19 02:43:14,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff2.min_abs, batch_count=528860.0, ans=0.1 2023-11-19 02:43:25,054 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7200, loss[loss=0.09531, simple_loss=0.1104, pruned_loss=0.02929, audio_tagging_loss=0.0108, over 16124.00 frames. ], tot_loss[loss=0.09375, simple_loss=0.1113, pruned_loss=0.02732, audio_tagging_loss=0.01079, over 3053550.82 frames. ], batch size: 60, lr: 9.80e-03, grad_scale: 32.0 2023-11-19 02:43:29,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=528926.6666666666, ans=0.125 2023-11-19 02:43:40,045 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-19 02:43:44,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=528993.3333333334, ans=0.125 2023-11-19 02:43:44,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=528993.3333333334, ans=0.2 2023-11-19 02:43:55,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=529060.0, ans=0.125 2023-11-19 02:43:59,943 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.223e+01 8.683e+01 9.470e+01 1.039e+02 1.567e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-19 02:44:04,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=529126.6666666666, ans=0.0 2023-11-19 02:44:08,525 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:44:20,483 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7250, loss[loss=0.07552, simple_loss=0.09013, pruned_loss=0.02323, audio_tagging_loss=0.007223, over 14114.00 frames. ], tot_loss[loss=0.09489, simple_loss=0.1127, pruned_loss=0.02771, audio_tagging_loss=0.01083, over 3055360.83 frames. ], batch size: 56, lr: 9.80e-03, grad_scale: 32.0 2023-11-19 02:44:29,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=529260.0, ans=0.125 2023-11-19 02:44:38,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=529326.6666666666, ans=0.125 2023-11-19 02:44:51,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=529393.3333333334, ans=0.125 2023-11-19 02:44:54,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=529460.0, ans=0.125 2023-11-19 02:44:57,795 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-11-19 02:45:15,827 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7300, loss[loss=0.06952, simple_loss=0.08335, pruned_loss=0.01689, audio_tagging_loss=0.01096, over 15198.00 frames. ], tot_loss[loss=0.09449, simple_loss=0.1124, pruned_loss=0.0276, audio_tagging_loss=0.01069, over 3052917.70 frames. ], batch size: 56, lr: 9.79e-03, grad_scale: 32.0 2023-11-19 02:45:28,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=529660.0, ans=0.1 2023-11-19 02:45:31,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=529660.0, ans=0.0 2023-11-19 02:45:40,772 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=529726.6666666666, ans=0.125 2023-11-19 02:45:48,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=529793.3333333334, ans=0.0 2023-11-19 02:45:51,137 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.31 vs. limit=15.0 2023-11-19 02:45:51,672 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.930e+01 9.656e+01 1.070e+02 1.553e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-19 02:45:55,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=529793.3333333334, ans=0.125 2023-11-19 02:45:58,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=529793.3333333334, ans=0.5 2023-11-19 02:46:00,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=529860.0, ans=0.2 2023-11-19 02:46:12,160 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7350, loss[loss=0.1136, simple_loss=0.142, pruned_loss=0.03598, audio_tagging_loss=0.006628, over 15535.00 frames. ], tot_loss[loss=0.09393, simple_loss=0.1118, pruned_loss=0.02751, audio_tagging_loss=0.01053, over 3046490.25 frames. ], batch size: 56, lr: 9.79e-03, grad_scale: 32.0 2023-11-19 02:46:12,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=529926.6666666666, ans=0.1 2023-11-19 02:46:15,956 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.55 vs. limit=10.0 2023-11-19 02:46:22,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=529993.3333333334, ans=0.2 2023-11-19 02:46:27,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=529993.3333333334, ans=0.0 2023-11-19 02:47:07,074 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7400, loss[loss=0.08657, simple_loss=0.1073, pruned_loss=0.02335, audio_tagging_loss=0.009574, over 14287.00 frames. ], tot_loss[loss=0.09281, simple_loss=0.1104, pruned_loss=0.02701, audio_tagging_loss=0.01059, over 3038663.21 frames. ], batch size: 53, lr: 9.79e-03, grad_scale: 32.0 2023-11-19 02:47:16,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=530260.0, ans=0.07 2023-11-19 02:47:17,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=530326.6666666666, ans=0.07 2023-11-19 02:47:24,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=530326.6666666666, ans=0.0 2023-11-19 02:47:31,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=530393.3333333334, ans=0.125 2023-11-19 02:47:34,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=530393.3333333334, ans=0.0 2023-11-19 02:47:38,225 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.53 vs. limit=12.0 2023-11-19 02:47:41,159 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=530460.0, ans=0.125 2023-11-19 02:47:41,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=530460.0, ans=0.2 2023-11-19 02:47:43,077 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.078e+01 8.685e+01 9.299e+01 1.013e+02 1.325e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 02:47:53,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=530526.6666666666, ans=0.09899494936611666 2023-11-19 02:47:54,048 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.37 vs. limit=15.0 2023-11-19 02:47:57,503 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2023-11-19 02:48:00,197 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.83 vs. limit=15.0 2023-11-19 02:48:02,731 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7450, loss[loss=0.07104, simple_loss=0.08746, pruned_loss=0.01873, audio_tagging_loss=0.008583, over 15059.00 frames. ], tot_loss[loss=0.09305, simple_loss=0.1103, pruned_loss=0.02724, audio_tagging_loss=0.01065, over 3033130.75 frames. ], batch size: 57, lr: 9.78e-03, grad_scale: 32.0 2023-11-19 02:48:11,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=530593.3333333334, ans=0.0 2023-11-19 02:48:19,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=530660.0, ans=0.0 2023-11-19 02:48:38,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=530793.3333333334, ans=0.0 2023-11-19 02:48:59,262 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7500, loss[loss=0.09687, simple_loss=0.1079, pruned_loss=0.03029, audio_tagging_loss=0.01264, over 15716.00 frames. ], tot_loss[loss=0.09278, simple_loss=0.1098, pruned_loss=0.02712, audio_tagging_loss=0.01078, over 3038091.41 frames. ], batch size: 58, lr: 9.78e-03, grad_scale: 32.0 2023-11-19 02:49:08,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=530993.3333333334, ans=0.1 2023-11-19 02:49:21,788 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.12 vs. limit=15.0 2023-11-19 02:49:26,421 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.48 vs. limit=15.0 2023-11-19 02:49:32,353 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.11 vs. limit=15.0 2023-11-19 02:49:33,664 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.745e+01 8.748e+01 9.301e+01 1.047e+02 1.348e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 02:49:33,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=531126.6666666666, ans=0.125 2023-11-19 02:49:53,801 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7550, loss[loss=0.06745, simple_loss=0.07182, pruned_loss=0.01728, audio_tagging_loss=0.01426, over 16482.00 frames. ], tot_loss[loss=0.09289, simple_loss=0.1098, pruned_loss=0.02723, audio_tagging_loss=0.01075, over 3044703.68 frames. ], batch size: 66, lr: 9.78e-03, grad_scale: 32.0 2023-11-19 02:49:53,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=531260.0, ans=0.1 2023-11-19 02:50:01,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=531260.0, ans=0.125 2023-11-19 02:50:46,086 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.59 vs. limit=15.0 2023-11-19 02:50:48,818 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7600, loss[loss=0.08178, simple_loss=0.09471, pruned_loss=0.0227, audio_tagging_loss=0.01173, over 14821.00 frames. ], tot_loss[loss=0.0922, simple_loss=0.109, pruned_loss=0.02694, audio_tagging_loss=0.01076, over 3050587.86 frames. ], batch size: 58, lr: 9.77e-03, grad_scale: 32.0 2023-11-19 02:50:50,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=531593.3333333334, ans=0.0 2023-11-19 02:51:10,478 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.19 vs. limit=12.0 2023-11-19 02:51:13,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=531726.6666666666, ans=0.0 2023-11-19 02:51:21,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=531793.3333333334, ans=0.125 2023-11-19 02:51:24,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=531793.3333333334, ans=0.125 2023-11-19 02:51:24,910 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.650e+01 9.572e+01 1.070e+02 1.390e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-19 02:51:32,819 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.19 vs. limit=22.5 2023-11-19 02:51:45,437 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7650, loss[loss=0.0874, simple_loss=0.09807, pruned_loss=0.02801, audio_tagging_loss=0.01036, over 15539.00 frames. ], tot_loss[loss=0.09228, simple_loss=0.1092, pruned_loss=0.02699, audio_tagging_loss=0.01071, over 3043826.99 frames. ], batch size: 56, lr: 9.77e-03, grad_scale: 16.0 2023-11-19 02:51:50,601 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.30 vs. limit=12.0 2023-11-19 02:52:08,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=532060.0, ans=0.125 2023-11-19 02:52:09,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=532060.0, ans=0.1 2023-11-19 02:52:10,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=532060.0, ans=0.125 2023-11-19 02:52:23,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=532126.6666666666, ans=0.0 2023-11-19 02:52:41,032 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7700, loss[loss=0.09915, simple_loss=0.1235, pruned_loss=0.02653, audio_tagging_loss=0.01086, over 15467.00 frames. ], tot_loss[loss=0.09298, simple_loss=0.1104, pruned_loss=0.02723, audio_tagging_loss=0.01057, over 3041727.03 frames. ], batch size: 56, lr: 9.77e-03, grad_scale: 16.0 2023-11-19 02:52:55,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=532326.6666666666, ans=0.125 2023-11-19 02:53:08,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=532393.3333333334, ans=0.1 2023-11-19 02:53:08,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=532393.3333333334, ans=0.0 2023-11-19 02:53:12,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=532393.3333333334, ans=0.1 2023-11-19 02:53:17,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=532460.0, ans=0.0 2023-11-19 02:53:17,950 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.485e+01 9.381e+01 1.068e+02 1.739e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-19 02:53:18,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=532460.0, ans=0.125 2023-11-19 02:53:21,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=532460.0, ans=0.0 2023-11-19 02:53:23,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=532460.0, ans=0.1 2023-11-19 02:53:28,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=532526.6666666666, ans=0.0 2023-11-19 02:53:35,823 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7750, loss[loss=0.09676, simple_loss=0.121, pruned_loss=0.02675, audio_tagging_loss=0.009536, over 15372.00 frames. ], tot_loss[loss=0.09271, simple_loss=0.1098, pruned_loss=0.02709, audio_tagging_loss=0.0107, over 3042325.85 frames. ], batch size: 56, lr: 9.77e-03, grad_scale: 16.0 2023-11-19 02:53:55,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=532660.0, ans=0.0 2023-11-19 02:53:57,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=532660.0, ans=0.125 2023-11-19 02:53:58,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=532726.6666666666, ans=0.0 2023-11-19 02:54:08,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=532793.3333333334, ans=0.125 2023-11-19 02:54:31,614 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7800, loss[loss=0.08909, simple_loss=0.1033, pruned_loss=0.02879, audio_tagging_loss=0.008663, over 14966.00 frames. ], tot_loss[loss=0.09236, simple_loss=0.1092, pruned_loss=0.02692, audio_tagging_loss=0.01082, over 3033839.17 frames. ], batch size: 56, lr: 9.76e-03, grad_scale: 16.0 2023-11-19 02:54:36,063 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:55:02,746 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.79 vs. limit=15.0 2023-11-19 02:55:07,936 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.958e+01 8.565e+01 9.591e+01 1.072e+02 1.947e+02, threshold=1.918e+02, percent-clipped=1.0 2023-11-19 02:55:19,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=533193.3333333334, ans=0.0 2023-11-19 02:55:21,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=533193.3333333334, ans=0.125 2023-11-19 02:55:27,551 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7850, loss[loss=0.1048, simple_loss=0.1176, pruned_loss=0.03273, audio_tagging_loss=0.01328, over 14866.00 frames. ], tot_loss[loss=0.09289, simple_loss=0.1098, pruned_loss=0.02712, audio_tagging_loss=0.01089, over 3043589.57 frames. ], batch size: 55, lr: 9.76e-03, grad_scale: 16.0 2023-11-19 02:55:43,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=533326.6666666666, ans=22.5 2023-11-19 02:56:03,067 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2023-11-19 02:56:11,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=533460.0, ans=0.025 2023-11-19 02:56:13,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=533526.6666666666, ans=0.025 2023-11-19 02:56:17,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=533526.6666666666, ans=0.2 2023-11-19 02:56:23,251 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.81 vs. limit=6.0 2023-11-19 02:56:24,748 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7900, loss[loss=0.0945, simple_loss=0.1295, pruned_loss=0.02353, audio_tagging_loss=0.006196, over 15244.00 frames. ], tot_loss[loss=0.09289, simple_loss=0.1096, pruned_loss=0.02708, audio_tagging_loss=0.01101, over 3050779.84 frames. ], batch size: 56, lr: 9.76e-03, grad_scale: 16.0 2023-11-19 02:56:33,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=533593.3333333334, ans=0.2 2023-11-19 02:57:01,470 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.359e+01 8.512e+01 9.298e+01 1.008e+02 1.380e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 02:57:17,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=533860.0, ans=0.5 2023-11-19 02:57:19,989 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 7950, loss[loss=0.07298, simple_loss=0.08431, pruned_loss=0.01944, audio_tagging_loss=0.01138, over 14329.00 frames. ], tot_loss[loss=0.09216, simple_loss=0.1087, pruned_loss=0.02665, audio_tagging_loss=0.01117, over 3048523.70 frames. ], batch size: 53, lr: 9.75e-03, grad_scale: 16.0 2023-11-19 02:57:28,090 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.18 vs. limit=22.5 2023-11-19 02:57:34,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=533993.3333333334, ans=0.125 2023-11-19 02:57:34,949 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 02:57:43,039 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.29 vs. limit=15.0 2023-11-19 02:58:04,521 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.41 vs. limit=22.5 2023-11-19 02:58:16,022 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8000, loss[loss=0.09018, simple_loss=0.1063, pruned_loss=0.02528, audio_tagging_loss=0.01173, over 16407.00 frames. ], tot_loss[loss=0.09164, simple_loss=0.1082, pruned_loss=0.02629, audio_tagging_loss=0.01122, over 3058045.45 frames. ], batch size: 62, lr: 9.75e-03, grad_scale: 32.0 2023-11-19 02:58:16,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=534260.0, ans=0.0 2023-11-19 02:58:17,641 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.85 vs. limit=10.0 2023-11-19 02:58:19,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=534260.0, ans=0.125 2023-11-19 02:58:22,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=534260.0, ans=0.2 2023-11-19 02:58:40,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=534393.3333333334, ans=0.125 2023-11-19 02:58:47,192 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:58:52,225 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.488e+01 9.029e+01 9.898e+01 1.404e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-19 02:59:00,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=534526.6666666666, ans=0.025 2023-11-19 02:59:10,643 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8050, loss[loss=0.0822, simple_loss=0.09776, pruned_loss=0.02226, audio_tagging_loss=0.01106, over 14982.00 frames. ], tot_loss[loss=0.092, simple_loss=0.1082, pruned_loss=0.0266, audio_tagging_loss=0.0113, over 3052317.11 frames. ], batch size: 57, lr: 9.75e-03, grad_scale: 32.0 2023-11-19 02:59:19,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=534593.3333333334, ans=0.2 2023-11-19 02:59:24,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=534660.0, ans=0.0 2023-11-19 02:59:49,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=534793.3333333334, ans=0.125 2023-11-19 03:00:06,543 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8100, loss[loss=0.102, simple_loss=0.1232, pruned_loss=0.03128, audio_tagging_loss=0.009139, over 14499.00 frames. ], tot_loss[loss=0.09285, simple_loss=0.1098, pruned_loss=0.02693, audio_tagging_loss=0.01104, over 3047694.91 frames. ], batch size: 54, lr: 9.74e-03, grad_scale: 32.0 2023-11-19 03:00:26,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=534993.3333333334, ans=0.05 2023-11-19 03:00:42,977 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.821e+01 9.637e+01 1.043e+02 1.464e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-19 03:00:45,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=535126.6666666666, ans=0.0 2023-11-19 03:00:51,255 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=535193.3333333334, ans=0.125 2023-11-19 03:01:02,687 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8150, loss[loss=0.1094, simple_loss=0.1272, pruned_loss=0.03237, audio_tagging_loss=0.01336, over 14841.00 frames. ], tot_loss[loss=0.0931, simple_loss=0.1105, pruned_loss=0.02696, audio_tagging_loss=0.01091, over 3049101.66 frames. ], batch size: 56, lr: 9.74e-03, grad_scale: 32.0 2023-11-19 03:01:04,600 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=535260.0, ans=0.125 2023-11-19 03:01:14,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=535326.6666666666, ans=0.1 2023-11-19 03:01:25,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=535393.3333333334, ans=0.125 2023-11-19 03:01:38,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=535460.0, ans=0.1 2023-11-19 03:01:41,814 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:01:43,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=535460.0, ans=0.2 2023-11-19 03:01:57,933 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8200, loss[loss=0.08909, simple_loss=0.1001, pruned_loss=0.02552, audio_tagging_loss=0.01352, over 16382.00 frames. ], tot_loss[loss=0.09233, simple_loss=0.1096, pruned_loss=0.02667, audio_tagging_loss=0.01086, over 3054983.09 frames. ], batch size: 61, lr: 9.74e-03, grad_scale: 32.0 2023-11-19 03:01:58,473 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.66 vs. limit=12.0 2023-11-19 03:01:59,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=535593.3333333334, ans=0.125 2023-11-19 03:02:00,010 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:02:34,968 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.718e+01 8.630e+01 9.276e+01 1.032e+02 1.538e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-19 03:02:48,451 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.11 vs. limit=15.0 2023-11-19 03:02:53,479 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8250, loss[loss=0.09651, simple_loss=0.1174, pruned_loss=0.02753, audio_tagging_loss=0.01027, over 15311.00 frames. ], tot_loss[loss=0.0927, simple_loss=0.1099, pruned_loss=0.02691, audio_tagging_loss=0.01085, over 3048406.08 frames. ], batch size: 57, lr: 9.74e-03, grad_scale: 32.0 2023-11-19 03:02:53,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=535926.6666666666, ans=0.07 2023-11-19 03:03:05,391 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:03:17,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=536060.0, ans=0.125 2023-11-19 03:03:39,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=536193.3333333334, ans=0.0 2023-11-19 03:03:49,904 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8300, loss[loss=0.1435, simple_loss=0.175, pruned_loss=0.0476, audio_tagging_loss=0.008375, over 16306.00 frames. ], tot_loss[loss=0.09256, simple_loss=0.1097, pruned_loss=0.02692, audio_tagging_loss=0.01081, over 3052667.15 frames. ], batch size: 56, lr: 9.73e-03, grad_scale: 32.0 2023-11-19 03:03:50,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=536260.0, ans=0.1 2023-11-19 03:04:07,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=536326.6666666666, ans=0.0 2023-11-19 03:04:14,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=536393.3333333334, ans=0.1 2023-11-19 03:04:27,420 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.802e+01 9.688e+01 1.089e+02 1.659e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-19 03:04:34,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=536526.6666666666, ans=0.0 2023-11-19 03:04:45,392 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8350, loss[loss=0.08849, simple_loss=0.1059, pruned_loss=0.02258, audio_tagging_loss=0.01299, over 15760.00 frames. ], tot_loss[loss=0.09295, simple_loss=0.1103, pruned_loss=0.02698, audio_tagging_loss=0.01082, over 3055721.64 frames. ], batch size: 59, lr: 9.73e-03, grad_scale: 16.0 2023-11-19 03:04:54,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=536593.3333333334, ans=0.1 2023-11-19 03:05:01,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=536660.0, ans=0.07 2023-11-19 03:05:15,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=536726.6666666666, ans=0.125 2023-11-19 03:05:21,277 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-19 03:05:31,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=536860.0, ans=0.125 2023-11-19 03:05:40,346 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8400, loss[loss=0.08428, simple_loss=0.1002, pruned_loss=0.02365, audio_tagging_loss=0.01055, over 14376.00 frames. ], tot_loss[loss=0.09203, simple_loss=0.1092, pruned_loss=0.0266, audio_tagging_loss=0.01083, over 3048157.40 frames. ], batch size: 56, lr: 9.73e-03, grad_scale: 32.0 2023-11-19 03:05:57,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=536993.3333333334, ans=0.125 2023-11-19 03:06:05,911 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.61 vs. limit=15.0 2023-11-19 03:06:14,383 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=537126.6666666666, ans=0.1 2023-11-19 03:06:18,484 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.875e+01 8.679e+01 9.349e+01 1.017e+02 2.307e+02, threshold=1.870e+02, percent-clipped=1.0 2023-11-19 03:06:31,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=537193.3333333334, ans=0.125 2023-11-19 03:06:32,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=537193.3333333334, ans=0.1 2023-11-19 03:06:36,880 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8450, loss[loss=0.1012, simple_loss=0.1221, pruned_loss=0.02978, audio_tagging_loss=0.01038, over 15454.00 frames. ], tot_loss[loss=0.09321, simple_loss=0.1103, pruned_loss=0.02726, audio_tagging_loss=0.01082, over 3049256.89 frames. ], batch size: 56, lr: 9.72e-03, grad_scale: 32.0 2023-11-19 03:06:45,681 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.66 vs. limit=10.0 2023-11-19 03:06:58,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=537393.3333333334, ans=0.125 2023-11-19 03:07:06,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=537393.3333333334, ans=0.125 2023-11-19 03:07:27,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=537526.6666666666, ans=0.1 2023-11-19 03:07:31,466 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8500, loss[loss=0.08187, simple_loss=0.0943, pruned_loss=0.02265, audio_tagging_loss=0.01207, over 14846.00 frames. ], tot_loss[loss=0.09258, simple_loss=0.1097, pruned_loss=0.0269, audio_tagging_loss=0.01084, over 3043536.77 frames. ], batch size: 55, lr: 9.72e-03, grad_scale: 32.0 2023-11-19 03:07:37,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=537593.3333333334, ans=0.0 2023-11-19 03:07:39,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=537593.3333333334, ans=0.1 2023-11-19 03:08:02,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=537726.6666666666, ans=0.125 2023-11-19 03:08:09,032 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.775e+01 8.768e+01 9.309e+01 1.039e+02 1.379e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-19 03:08:20,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=537860.0, ans=22.5 2023-11-19 03:08:26,575 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8550, loss[loss=0.08784, simple_loss=0.1093, pruned_loss=0.02414, audio_tagging_loss=0.009076, over 14634.00 frames. ], tot_loss[loss=0.09349, simple_loss=0.111, pruned_loss=0.02711, audio_tagging_loss=0.01086, over 3048819.06 frames. ], batch size: 53, lr: 9.72e-03, grad_scale: 32.0 2023-11-19 03:08:35,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=537926.6666666666, ans=0.1 2023-11-19 03:08:44,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=537993.3333333334, ans=0.1 2023-11-19 03:08:44,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=537993.3333333334, ans=0.0 2023-11-19 03:08:51,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=538060.0, ans=0.0 2023-11-19 03:09:05,845 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=538126.6666666666, ans=0.125 2023-11-19 03:09:22,934 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8600, loss[loss=0.09092, simple_loss=0.1156, pruned_loss=0.02307, audio_tagging_loss=0.01006, over 16685.00 frames. ], tot_loss[loss=0.09317, simple_loss=0.1105, pruned_loss=0.02697, audio_tagging_loss=0.01096, over 3049589.44 frames. ], batch size: 62, lr: 9.71e-03, grad_scale: 32.0 2023-11-19 03:09:27,718 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.76 vs. limit=15.0 2023-11-19 03:09:41,540 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.92 vs. limit=12.0 2023-11-19 03:09:53,166 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:09:53,630 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.38 vs. limit=22.5 2023-11-19 03:09:59,803 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.620e+01 8.913e+01 9.582e+01 1.068e+02 1.371e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-19 03:10:10,688 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=538526.6666666666, ans=0.0 2023-11-19 03:10:17,868 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8650, loss[loss=0.1183, simple_loss=0.1351, pruned_loss=0.04144, audio_tagging_loss=0.009273, over 14707.00 frames. ], tot_loss[loss=0.09378, simple_loss=0.1115, pruned_loss=0.02717, audio_tagging_loss=0.01086, over 3045422.83 frames. ], batch size: 54, lr: 9.71e-03, grad_scale: 32.0 2023-11-19 03:10:31,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=538660.0, ans=0.1 2023-11-19 03:10:35,079 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.75 vs. limit=15.0 2023-11-19 03:10:45,124 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.88 vs. limit=10.0 2023-11-19 03:10:49,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=538726.6666666666, ans=0.0 2023-11-19 03:10:56,908 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.29 vs. limit=15.0 2023-11-19 03:11:02,048 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.76 vs. limit=12.0 2023-11-19 03:11:13,648 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8700, loss[loss=0.07465, simple_loss=0.08838, pruned_loss=0.01738, audio_tagging_loss=0.01308, over 14378.00 frames. ], tot_loss[loss=0.09364, simple_loss=0.1114, pruned_loss=0.02698, audio_tagging_loss=0.01096, over 3042713.02 frames. ], batch size: 54, lr: 9.71e-03, grad_scale: 32.0 2023-11-19 03:11:21,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=538926.6666666666, ans=0.125 2023-11-19 03:11:27,368 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=538993.3333333334, ans=0.125 2023-11-19 03:11:43,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=539060.0, ans=0.09899494936611666 2023-11-19 03:11:50,664 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.247e+01 8.784e+01 9.683e+01 1.064e+02 1.511e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-19 03:11:57,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=539193.3333333334, ans=0.0 2023-11-19 03:12:00,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=539193.3333333334, ans=0.125 2023-11-19 03:12:09,096 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8750, loss[loss=0.07668, simple_loss=0.08912, pruned_loss=0.01801, audio_tagging_loss=0.01411, over 13941.00 frames. ], tot_loss[loss=0.09379, simple_loss=0.1118, pruned_loss=0.02687, audio_tagging_loss=0.01102, over 3050104.14 frames. ], batch size: 52, lr: 9.71e-03, grad_scale: 32.0 2023-11-19 03:12:20,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=539326.6666666666, ans=0.1 2023-11-19 03:12:29,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=539393.3333333334, ans=0.125 2023-11-19 03:12:31,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=539393.3333333334, ans=0.07 2023-11-19 03:13:00,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=539526.6666666666, ans=0.1 2023-11-19 03:13:04,385 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8800, loss[loss=0.1012, simple_loss=0.1211, pruned_loss=0.03065, audio_tagging_loss=0.01, over 16005.00 frames. ], tot_loss[loss=0.09413, simple_loss=0.1126, pruned_loss=0.02682, audio_tagging_loss=0.01102, over 3058980.72 frames. ], batch size: 58, lr: 9.70e-03, grad_scale: 32.0 2023-11-19 03:13:05,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=539593.3333333334, ans=0.125 2023-11-19 03:13:10,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=539593.3333333334, ans=0.125 2023-11-19 03:13:29,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=539726.6666666666, ans=0.125 2023-11-19 03:13:42,593 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.511e+01 8.574e+01 9.508e+01 1.041e+02 1.765e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 03:13:44,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=539793.3333333334, ans=0.05 2023-11-19 03:13:59,443 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8850, loss[loss=0.0807, simple_loss=0.08847, pruned_loss=0.02429, audio_tagging_loss=0.01217, over 14257.00 frames. ], tot_loss[loss=0.09355, simple_loss=0.1117, pruned_loss=0.02665, audio_tagging_loss=0.01107, over 3056722.66 frames. ], batch size: 56, lr: 9.70e-03, grad_scale: 32.0 2023-11-19 03:14:02,296 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.14 vs. limit=22.5 2023-11-19 03:14:12,450 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:14:29,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=540060.0, ans=6.0 2023-11-19 03:14:36,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=540126.6666666666, ans=0.125 2023-11-19 03:14:44,447 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=15.0 2023-11-19 03:14:45,252 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:14:48,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=540193.3333333334, ans=0.125 2023-11-19 03:14:49,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=540193.3333333334, ans=0.0 2023-11-19 03:14:55,121 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8900, loss[loss=0.1052, simple_loss=0.1336, pruned_loss=0.02971, audio_tagging_loss=0.008704, over 15068.00 frames. ], tot_loss[loss=0.09301, simple_loss=0.1109, pruned_loss=0.02658, audio_tagging_loss=0.011, over 3049686.88 frames. ], batch size: 56, lr: 9.70e-03, grad_scale: 32.0 2023-11-19 03:15:27,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=540460.0, ans=0.125 2023-11-19 03:15:32,227 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.117e+01 8.732e+01 9.510e+01 1.041e+02 1.883e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 03:15:38,513 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.18 vs. limit=6.0 2023-11-19 03:15:40,466 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:15:40,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=540526.6666666666, ans=15.0 2023-11-19 03:15:42,805 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.30 vs. limit=22.5 2023-11-19 03:15:49,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=540593.3333333334, ans=0.125 2023-11-19 03:15:50,759 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 8950, loss[loss=0.1214, simple_loss=0.1488, pruned_loss=0.0373, audio_tagging_loss=0.009699, over 15945.00 frames. ], tot_loss[loss=0.09364, simple_loss=0.112, pruned_loss=0.02686, audio_tagging_loss=0.01079, over 3049166.19 frames. ], batch size: 57, lr: 9.69e-03, grad_scale: 32.0 2023-11-19 03:16:45,796 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9000, loss[loss=0.1094, simple_loss=0.129, pruned_loss=0.03534, audio_tagging_loss=0.009523, over 15669.00 frames. ], tot_loss[loss=0.09324, simple_loss=0.1114, pruned_loss=0.02678, audio_tagging_loss=0.01073, over 3048047.26 frames. ], batch size: 57, lr: 9.69e-03, grad_scale: 32.0 2023-11-19 03:16:45,797 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 03:17:18,036 INFO [train_asr.py:1147] (2/4) Epoch 7, validation: loss=0.06875, simple_loss=0.05761, pruned_loss=0.007498, audio_tagging_loss=0.03244, over 4681554.00 frames. 2023-11-19 03:17:18,036 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 03:17:33,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=540993.3333333334, ans=0.1 2023-11-19 03:17:43,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=541060.0, ans=0.0 2023-11-19 03:17:44,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=541060.0, ans=0.5 2023-11-19 03:17:53,334 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.37 vs. limit=15.0 2023-11-19 03:17:54,265 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 8.732e+01 9.313e+01 1.034e+02 1.719e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 03:17:55,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=541126.6666666666, ans=0.0 2023-11-19 03:18:02,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=541193.3333333334, ans=0.5 2023-11-19 03:18:12,232 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9050, loss[loss=0.07352, simple_loss=0.0865, pruned_loss=0.01861, audio_tagging_loss=0.01166, over 14941.00 frames. ], tot_loss[loss=0.09342, simple_loss=0.1114, pruned_loss=0.02691, audio_tagging_loss=0.0108, over 3051284.98 frames. ], batch size: 61, lr: 9.69e-03, grad_scale: 32.0 2023-11-19 03:18:13,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=541260.0, ans=0.0 2023-11-19 03:18:19,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=541260.0, ans=0.0 2023-11-19 03:18:22,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=541326.6666666666, ans=0.125 2023-11-19 03:18:23,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=541326.6666666666, ans=0.09899494936611666 2023-11-19 03:18:34,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541393.3333333334, ans=0.1 2023-11-19 03:18:48,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541460.0, ans=0.1 2023-11-19 03:18:52,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=541460.0, ans=0.1 2023-11-19 03:19:06,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541593.3333333334, ans=0.1 2023-11-19 03:19:07,364 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9100, loss[loss=0.07918, simple_loss=0.09803, pruned_loss=0.02007, audio_tagging_loss=0.0101, over 15384.00 frames. ], tot_loss[loss=0.09365, simple_loss=0.1118, pruned_loss=0.02707, audio_tagging_loss=0.0107, over 3058704.50 frames. ], batch size: 58, lr: 9.68e-03, grad_scale: 32.0 2023-11-19 03:19:25,571 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=541660.0, ans=15.0 2023-11-19 03:19:31,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=541726.6666666666, ans=0.125 2023-11-19 03:19:38,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=541726.6666666666, ans=0.1 2023-11-19 03:19:44,939 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.588e+01 9.392e+01 1.039e+02 1.289e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-19 03:19:48,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=541793.3333333334, ans=0.035 2023-11-19 03:20:02,265 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9150, loss[loss=0.08296, simple_loss=0.09619, pruned_loss=0.02489, audio_tagging_loss=0.00998, over 14466.00 frames. ], tot_loss[loss=0.0931, simple_loss=0.1109, pruned_loss=0.02693, audio_tagging_loss=0.01074, over 3051748.58 frames. ], batch size: 54, lr: 9.68e-03, grad_scale: 32.0 2023-11-19 03:20:05,102 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=541926.6666666666, ans=0.1 2023-11-19 03:20:29,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=542060.0, ans=0.125 2023-11-19 03:20:33,762 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.83 vs. limit=22.5 2023-11-19 03:20:35,541 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=542126.6666666666, ans=0.0 2023-11-19 03:20:38,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=542126.6666666666, ans=0.125 2023-11-19 03:20:57,894 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9200, loss[loss=0.09474, simple_loss=0.1086, pruned_loss=0.0287, audio_tagging_loss=0.01173, over 15077.00 frames. ], tot_loss[loss=0.09319, simple_loss=0.1109, pruned_loss=0.02708, audio_tagging_loss=0.01067, over 3051364.86 frames. ], batch size: 56, lr: 9.68e-03, grad_scale: 32.0 2023-11-19 03:21:29,957 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2023-11-19 03:21:36,293 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.578e+01 9.333e+01 1.009e+02 1.862e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-19 03:21:52,038 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9250, loss[loss=0.1099, simple_loss=0.1333, pruned_loss=0.0318, audio_tagging_loss=0.01142, over 15544.00 frames. ], tot_loss[loss=0.09277, simple_loss=0.1106, pruned_loss=0.02684, audio_tagging_loss=0.01062, over 3055869.27 frames. ], batch size: 57, lr: 9.68e-03, grad_scale: 32.0 2023-11-19 03:21:52,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=542593.3333333334, ans=0.125 2023-11-19 03:22:10,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=542660.0, ans=0.125 2023-11-19 03:22:36,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=542860.0, ans=0.125 2023-11-19 03:22:47,236 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9300, loss[loss=0.1267, simple_loss=0.1619, pruned_loss=0.0385, audio_tagging_loss=0.007207, over 15264.00 frames. ], tot_loss[loss=0.09229, simple_loss=0.11, pruned_loss=0.02661, audio_tagging_loss=0.01069, over 3059183.00 frames. ], batch size: 55, lr: 9.67e-03, grad_scale: 32.0 2023-11-19 03:22:58,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=542993.3333333334, ans=0.0 2023-11-19 03:23:09,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=543060.0, ans=0.125 2023-11-19 03:23:17,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=543060.0, ans=0.0 2023-11-19 03:23:20,775 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.28 vs. limit=15.0 2023-11-19 03:23:26,427 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 8.462e+01 9.179e+01 9.907e+01 1.156e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-19 03:23:42,862 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9350, loss[loss=0.08054, simple_loss=0.09567, pruned_loss=0.02256, audio_tagging_loss=0.01014, over 15260.00 frames. ], tot_loss[loss=0.09189, simple_loss=0.1095, pruned_loss=0.02644, audio_tagging_loss=0.0107, over 3056475.25 frames. ], batch size: 57, lr: 9.67e-03, grad_scale: 16.0 2023-11-19 03:23:43,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=543260.0, ans=0.0 2023-11-19 03:23:44,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=543260.0, ans=0.125 2023-11-19 03:23:44,545 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.21 vs. limit=15.0 2023-11-19 03:23:48,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=543260.0, ans=0.1 2023-11-19 03:24:30,175 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=12.0 2023-11-19 03:24:37,157 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9400, loss[loss=0.07395, simple_loss=0.09548, pruned_loss=0.01581, audio_tagging_loss=0.0104, over 13846.00 frames. ], tot_loss[loss=0.09186, simple_loss=0.1098, pruned_loss=0.02612, audio_tagging_loss=0.01085, over 3057195.75 frames. ], batch size: 53, lr: 9.67e-03, grad_scale: 16.0 2023-11-19 03:24:57,825 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2023-11-19 03:24:59,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=543726.6666666666, ans=0.125 2023-11-19 03:25:06,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=543726.6666666666, ans=0.125 2023-11-19 03:25:13,723 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:25:16,703 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.406e+01 9.126e+01 1.047e+02 1.267e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 03:25:31,571 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9450, loss[loss=0.094, simple_loss=0.1098, pruned_loss=0.02843, audio_tagging_loss=0.01065, over 16386.00 frames. ], tot_loss[loss=0.09239, simple_loss=0.1103, pruned_loss=0.02634, audio_tagging_loss=0.01092, over 3059388.33 frames. ], batch size: 59, lr: 9.66e-03, grad_scale: 16.0 2023-11-19 03:25:31,579 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:25:34,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=543926.6666666666, ans=15.0 2023-11-19 03:25:37,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=543926.6666666666, ans=0.125 2023-11-19 03:26:01,489 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.99 vs. limit=22.5 2023-11-19 03:26:04,846 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.70 vs. limit=15.0 2023-11-19 03:26:18,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=544193.3333333334, ans=0.0 2023-11-19 03:26:28,044 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9500, loss[loss=0.1078, simple_loss=0.1218, pruned_loss=0.0351, audio_tagging_loss=0.0118, over 14716.00 frames. ], tot_loss[loss=0.09308, simple_loss=0.1107, pruned_loss=0.02672, audio_tagging_loss=0.01102, over 3058530.28 frames. ], batch size: 55, lr: 9.66e-03, grad_scale: 16.0 2023-11-19 03:26:43,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=544326.6666666666, ans=0.125 2023-11-19 03:26:55,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=544393.3333333334, ans=0.125 2023-11-19 03:27:08,271 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.649e+01 9.463e+01 1.058e+02 1.966e+02, threshold=1.893e+02, percent-clipped=1.0 2023-11-19 03:27:20,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=544526.6666666666, ans=0.025 2023-11-19 03:27:21,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=544526.6666666666, ans=0.2 2023-11-19 03:27:23,681 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9550, loss[loss=0.09459, simple_loss=0.1214, pruned_loss=0.02329, audio_tagging_loss=0.01059, over 16579.00 frames. ], tot_loss[loss=0.09362, simple_loss=0.1109, pruned_loss=0.02713, audio_tagging_loss=0.01102, over 3050552.69 frames. ], batch size: 64, lr: 9.66e-03, grad_scale: 16.0 2023-11-19 03:27:27,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=544593.3333333334, ans=0.125 2023-11-19 03:27:31,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=544593.3333333334, ans=0.0 2023-11-19 03:27:40,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=544660.0, ans=0.125 2023-11-19 03:27:45,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=544726.6666666666, ans=0.0 2023-11-19 03:27:51,181 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.28 vs. limit=12.0 2023-11-19 03:28:08,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=544860.0, ans=0.125 2023-11-19 03:28:18,803 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9600, loss[loss=0.09137, simple_loss=0.1094, pruned_loss=0.02331, audio_tagging_loss=0.01334, over 14360.00 frames. ], tot_loss[loss=0.09328, simple_loss=0.1105, pruned_loss=0.02693, audio_tagging_loss=0.0111, over 3060178.86 frames. ], batch size: 56, lr: 9.66e-03, grad_scale: 32.0 2023-11-19 03:28:21,196 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:28:39,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=544993.3333333334, ans=0.0 2023-11-19 03:28:48,977 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.53 vs. limit=15.0 2023-11-19 03:28:59,135 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.567e+01 8.431e+01 9.173e+01 1.006e+02 1.337e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 03:29:11,135 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:29:13,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=545193.3333333334, ans=0.07 2023-11-19 03:29:15,057 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9650, loss[loss=0.08447, simple_loss=0.1044, pruned_loss=0.01962, audio_tagging_loss=0.01267, over 14844.00 frames. ], tot_loss[loss=0.09284, simple_loss=0.1102, pruned_loss=0.0267, audio_tagging_loss=0.01107, over 3057117.87 frames. ], batch size: 54, lr: 9.65e-03, grad_scale: 32.0 2023-11-19 03:29:16,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=545260.0, ans=0.025 2023-11-19 03:29:28,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=545326.6666666666, ans=0.125 2023-11-19 03:29:29,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=545326.6666666666, ans=0.125 2023-11-19 03:30:10,038 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9700, loss[loss=0.0749, simple_loss=0.08806, pruned_loss=0.02006, audio_tagging_loss=0.01082, over 15259.00 frames. ], tot_loss[loss=0.09253, simple_loss=0.1095, pruned_loss=0.02676, audio_tagging_loss=0.011, over 3051286.95 frames. ], batch size: 58, lr: 9.65e-03, grad_scale: 32.0 2023-11-19 03:30:20,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=545660.0, ans=0.125 2023-11-19 03:30:50,515 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.966e+01 8.564e+01 9.508e+01 1.033e+02 1.418e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 03:31:05,787 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9750, loss[loss=0.1005, simple_loss=0.1267, pruned_loss=0.02729, audio_tagging_loss=0.009871, over 13987.00 frames. ], tot_loss[loss=0.09196, simple_loss=0.1093, pruned_loss=0.02639, audio_tagging_loss=0.01093, over 3042607.49 frames. ], batch size: 53, lr: 9.65e-03, grad_scale: 32.0 2023-11-19 03:31:10,616 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.65 vs. limit=15.0 2023-11-19 03:31:16,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=545926.6666666666, ans=0.125 2023-11-19 03:31:17,657 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2023-11-19 03:31:18,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=545993.3333333334, ans=0.125 2023-11-19 03:31:31,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=546060.0, ans=0.125 2023-11-19 03:32:02,946 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9800, loss[loss=0.08666, simple_loss=0.1019, pruned_loss=0.0273, audio_tagging_loss=0.008408, over 15161.00 frames. ], tot_loss[loss=0.09207, simple_loss=0.1092, pruned_loss=0.02663, audio_tagging_loss=0.01085, over 3046092.85 frames. ], batch size: 56, lr: 9.64e-03, grad_scale: 32.0 2023-11-19 03:32:04,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=546260.0, ans=0.0 2023-11-19 03:32:13,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=546326.6666666666, ans=0.125 2023-11-19 03:32:26,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=546393.3333333334, ans=0.125 2023-11-19 03:32:43,158 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.045e+01 8.602e+01 9.393e+01 1.096e+02 1.685e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 03:32:52,707 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:32:57,950 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9850, loss[loss=0.07936, simple_loss=0.0985, pruned_loss=0.02018, audio_tagging_loss=0.009928, over 15820.00 frames. ], tot_loss[loss=0.09261, simple_loss=0.1102, pruned_loss=0.02688, audio_tagging_loss=0.01061, over 3043290.89 frames. ], batch size: 61, lr: 9.64e-03, grad_scale: 32.0 2023-11-19 03:33:11,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=546660.0, ans=0.125 2023-11-19 03:33:30,701 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.11 vs. limit=22.5 2023-11-19 03:33:33,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=546793.3333333334, ans=0.0 2023-11-19 03:33:33,598 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=546793.3333333334, ans=0.1 2023-11-19 03:33:34,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=546793.3333333334, ans=0.0 2023-11-19 03:33:47,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=546860.0, ans=0.125 2023-11-19 03:33:51,978 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.11 vs. limit=22.5 2023-11-19 03:33:53,970 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9900, loss[loss=0.07883, simple_loss=0.09054, pruned_loss=0.02123, audio_tagging_loss=0.01233, over 15206.00 frames. ], tot_loss[loss=0.09224, simple_loss=0.1099, pruned_loss=0.02659, audio_tagging_loss=0.0107, over 3040448.93 frames. ], batch size: 57, lr: 9.64e-03, grad_scale: 32.0 2023-11-19 03:34:02,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=546926.6666666666, ans=0.125 2023-11-19 03:34:14,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=546993.3333333334, ans=0.125 2023-11-19 03:34:18,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=547060.0, ans=0.5 2023-11-19 03:34:28,580 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=547126.6666666666, ans=0.025 2023-11-19 03:34:34,603 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.700e+01 9.311e+01 1.023e+02 1.421e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-19 03:34:40,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=547193.3333333334, ans=0.0 2023-11-19 03:34:50,609 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 9950, loss[loss=0.07647, simple_loss=0.0857, pruned_loss=0.02233, audio_tagging_loss=0.01129, over 13624.00 frames. ], tot_loss[loss=0.09219, simple_loss=0.1099, pruned_loss=0.02654, audio_tagging_loss=0.01069, over 3041498.62 frames. ], batch size: 53, lr: 9.64e-03, grad_scale: 16.0 2023-11-19 03:35:20,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=547393.3333333334, ans=0.0 2023-11-19 03:35:23,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=547460.0, ans=0.2 2023-11-19 03:35:37,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=547526.6666666666, ans=0.0 2023-11-19 03:35:40,797 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.11 vs. limit=15.0 2023-11-19 03:35:45,518 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10000, loss[loss=0.0895, simple_loss=0.1044, pruned_loss=0.02727, audio_tagging_loss=0.01004, over 15407.00 frames. ], tot_loss[loss=0.09247, simple_loss=0.1101, pruned_loss=0.02667, audio_tagging_loss=0.01074, over 3039600.54 frames. ], batch size: 56, lr: 9.63e-03, grad_scale: 32.0 2023-11-19 03:35:49,461 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.33 vs. limit=15.0 2023-11-19 03:35:54,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=547593.3333333334, ans=0.125 2023-11-19 03:35:55,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=547660.0, ans=0.125 2023-11-19 03:36:26,888 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.651e+01 9.520e+01 1.034e+02 1.455e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 03:36:27,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=547793.3333333334, ans=0.1 2023-11-19 03:36:31,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=547860.0, ans=0.125 2023-11-19 03:36:34,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=547860.0, ans=0.0 2023-11-19 03:36:40,556 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10050, loss[loss=0.1214, simple_loss=0.1431, pruned_loss=0.04202, audio_tagging_loss=0.007869, over 15071.00 frames. ], tot_loss[loss=0.09286, simple_loss=0.1103, pruned_loss=0.02693, audio_tagging_loss=0.01077, over 3044634.21 frames. ], batch size: 54, lr: 9.63e-03, grad_scale: 32.0 2023-11-19 03:36:43,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=547926.6666666666, ans=0.125 2023-11-19 03:36:52,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=547993.3333333334, ans=0.125 2023-11-19 03:37:00,132 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=547993.3333333334, ans=0.95 2023-11-19 03:37:19,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=548126.6666666666, ans=0.09899494936611666 2023-11-19 03:37:27,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=548193.3333333334, ans=0.0 2023-11-19 03:37:37,699 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10100, loss[loss=0.1163, simple_loss=0.1514, pruned_loss=0.03063, audio_tagging_loss=0.01001, over 15520.00 frames. ], tot_loss[loss=0.09253, simple_loss=0.1099, pruned_loss=0.0267, audio_tagging_loss=0.01086, over 3048965.48 frames. ], batch size: 56, lr: 9.63e-03, grad_scale: 32.0 2023-11-19 03:38:05,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=548393.3333333334, ans=0.0 2023-11-19 03:38:06,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=548393.3333333334, ans=0.0 2023-11-19 03:38:18,422 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.585e+01 9.588e+01 1.090e+02 1.708e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-19 03:38:23,260 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:38:24,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=548526.6666666666, ans=0.2 2023-11-19 03:38:28,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=548526.6666666666, ans=0.125 2023-11-19 03:38:32,758 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10150, loss[loss=0.09401, simple_loss=0.1167, pruned_loss=0.02586, audio_tagging_loss=0.009805, over 15437.00 frames. ], tot_loss[loss=0.09357, simple_loss=0.1111, pruned_loss=0.02709, audio_tagging_loss=0.01094, over 3058464.62 frames. ], batch size: 56, lr: 9.62e-03, grad_scale: 32.0 2023-11-19 03:38:33,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=548593.3333333334, ans=0.04949747468305833 2023-11-19 03:38:35,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=548593.3333333334, ans=0.1 2023-11-19 03:38:36,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=548593.3333333334, ans=0.125 2023-11-19 03:38:37,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=548593.3333333334, ans=0.0 2023-11-19 03:38:47,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=548660.0, ans=0.125 2023-11-19 03:38:56,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=548726.6666666666, ans=0.125 2023-11-19 03:38:59,169 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:39:12,434 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.57 vs. limit=12.0 2023-11-19 03:39:13,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=548793.3333333334, ans=0.0 2023-11-19 03:39:21,522 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=548860.0, ans=0.125 2023-11-19 03:39:27,557 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10200, loss[loss=0.08245, simple_loss=0.09473, pruned_loss=0.02161, audio_tagging_loss=0.01348, over 14786.00 frames. ], tot_loss[loss=0.09366, simple_loss=0.1112, pruned_loss=0.02716, audio_tagging_loss=0.0109, over 3056472.03 frames. ], batch size: 55, lr: 9.62e-03, grad_scale: 32.0 2023-11-19 03:39:31,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=548926.6666666666, ans=0.0 2023-11-19 03:39:49,364 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:40:00,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=549126.6666666666, ans=0.0 2023-11-19 03:40:02,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=549126.6666666666, ans=0.125 2023-11-19 03:40:08,254 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.328e+01 8.839e+01 9.897e+01 1.124e+02 1.590e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-19 03:40:13,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=549193.3333333334, ans=0.0 2023-11-19 03:40:23,220 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10250, loss[loss=0.1206, simple_loss=0.1388, pruned_loss=0.04134, audio_tagging_loss=0.009905, over 14673.00 frames. ], tot_loss[loss=0.09285, simple_loss=0.11, pruned_loss=0.02679, audio_tagging_loss=0.01104, over 3059754.54 frames. ], batch size: 53, lr: 9.62e-03, grad_scale: 32.0 2023-11-19 03:40:28,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=549260.0, ans=0.0 2023-11-19 03:40:46,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=549393.3333333334, ans=0.2 2023-11-19 03:40:49,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=549393.3333333334, ans=0.0 2023-11-19 03:41:12,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=549526.6666666666, ans=0.125 2023-11-19 03:41:19,413 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10300, loss[loss=0.09778, simple_loss=0.1294, pruned_loss=0.02408, audio_tagging_loss=0.008992, over 15233.00 frames. ], tot_loss[loss=0.09355, simple_loss=0.1109, pruned_loss=0.02705, audio_tagging_loss=0.01104, over 3065400.29 frames. ], batch size: 56, lr: 9.61e-03, grad_scale: 32.0 2023-11-19 03:41:20,122 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2023-11-19 03:41:23,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=549593.3333333334, ans=0.125 2023-11-19 03:41:35,784 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=14.29 vs. limit=15.0 2023-11-19 03:41:51,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=549793.3333333334, ans=0.0 2023-11-19 03:41:57,328 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:42:00,389 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.361e+01 8.478e+01 9.203e+01 9.958e+01 1.173e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 03:42:10,262 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.38 vs. limit=15.0 2023-11-19 03:42:14,017 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10350, loss[loss=0.07166, simple_loss=0.08318, pruned_loss=0.01651, audio_tagging_loss=0.01356, over 16341.00 frames. ], tot_loss[loss=0.09294, simple_loss=0.1101, pruned_loss=0.02675, audio_tagging_loss=0.01111, over 3063587.90 frames. ], batch size: 63, lr: 9.61e-03, grad_scale: 32.0 2023-11-19 03:42:22,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=549926.6666666666, ans=0.2 2023-11-19 03:42:44,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=550060.0, ans=0.125 2023-11-19 03:42:57,188 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.27 vs. limit=15.0 2023-11-19 03:43:06,627 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.47 vs. limit=15.0 2023-11-19 03:43:08,766 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10400, loss[loss=0.07673, simple_loss=0.09333, pruned_loss=0.01903, audio_tagging_loss=0.01104, over 15920.00 frames. ], tot_loss[loss=0.09249, simple_loss=0.1092, pruned_loss=0.02665, audio_tagging_loss=0.01122, over 3050285.10 frames. ], batch size: 62, lr: 9.61e-03, grad_scale: 32.0 2023-11-19 03:43:14,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=550260.0, ans=0.07 2023-11-19 03:43:16,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=550260.0, ans=0.07 2023-11-19 03:43:22,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=550326.6666666666, ans=0.0 2023-11-19 03:43:34,361 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.654e-01 2023-11-19 03:43:37,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=550393.3333333334, ans=0.0 2023-11-19 03:43:43,247 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2023-11-19 03:43:51,065 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.078e+01 8.573e+01 9.410e+01 1.023e+02 1.490e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-19 03:43:57,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=550526.6666666666, ans=0.2 2023-11-19 03:44:04,823 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10450, loss[loss=0.09749, simple_loss=0.1185, pruned_loss=0.02677, audio_tagging_loss=0.01147, over 15728.00 frames. ], tot_loss[loss=0.09156, simple_loss=0.108, pruned_loss=0.02634, audio_tagging_loss=0.01123, over 3051684.55 frames. ], batch size: 57, lr: 9.61e-03, grad_scale: 32.0 2023-11-19 03:44:19,109 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=550660.0, ans=0.0 2023-11-19 03:44:40,494 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:44:40,845 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.82 vs. limit=10.0 2023-11-19 03:44:48,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=550860.0, ans=0.0 2023-11-19 03:44:59,702 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10500, loss[loss=0.09488, simple_loss=0.1164, pruned_loss=0.02644, audio_tagging_loss=0.01023, over 16480.00 frames. ], tot_loss[loss=0.09199, simple_loss=0.1084, pruned_loss=0.02668, audio_tagging_loss=0.01113, over 3052424.68 frames. ], batch size: 61, lr: 9.60e-03, grad_scale: 32.0 2023-11-19 03:45:00,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=550926.6666666666, ans=0.0 2023-11-19 03:45:22,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=551060.0, ans=0.0 2023-11-19 03:45:26,531 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.22 vs. limit=5.0 2023-11-19 03:45:41,937 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.670e+01 8.439e+01 9.051e+01 1.036e+02 1.339e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 03:45:44,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=551193.3333333334, ans=0.0 2023-11-19 03:45:47,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=551193.3333333334, ans=0.125 2023-11-19 03:45:55,184 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10550, loss[loss=0.08649, simple_loss=0.1122, pruned_loss=0.02047, audio_tagging_loss=0.009907, over 15879.00 frames. ], tot_loss[loss=0.0915, simple_loss=0.1082, pruned_loss=0.02648, audio_tagging_loss=0.01094, over 3063265.00 frames. ], batch size: 57, lr: 9.60e-03, grad_scale: 32.0 2023-11-19 03:46:22,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=551393.3333333334, ans=0.125 2023-11-19 03:46:28,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=551460.0, ans=0.125 2023-11-19 03:46:29,096 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2023-11-19 03:46:51,180 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10600, loss[loss=0.09661, simple_loss=0.1157, pruned_loss=0.02825, audio_tagging_loss=0.01049, over 14985.00 frames. ], tot_loss[loss=0.09143, simple_loss=0.1082, pruned_loss=0.02651, audio_tagging_loss=0.01083, over 3059369.51 frames. ], batch size: 58, lr: 9.60e-03, grad_scale: 32.0 2023-11-19 03:47:18,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=551726.6666666666, ans=0.125 2023-11-19 03:47:19,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=551726.6666666666, ans=0.0 2023-11-19 03:47:33,794 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.037e+01 8.515e+01 9.253e+01 1.023e+02 1.317e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-19 03:47:37,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=551860.0, ans=0.125 2023-11-19 03:47:40,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=551860.0, ans=0.05 2023-11-19 03:47:45,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=551860.0, ans=0.0 2023-11-19 03:47:45,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=551860.0, ans=0.125 2023-11-19 03:47:47,251 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10650, loss[loss=0.09385, simple_loss=0.1211, pruned_loss=0.02645, audio_tagging_loss=0.006844, over 14903.00 frames. ], tot_loss[loss=0.09193, simple_loss=0.1089, pruned_loss=0.02673, audio_tagging_loss=0.01076, over 3050593.63 frames. ], batch size: 56, lr: 9.59e-03, grad_scale: 32.0 2023-11-19 03:47:52,202 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.20 vs. limit=15.0 2023-11-19 03:47:53,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=551926.6666666666, ans=0.125 2023-11-19 03:47:57,464 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=13.31 vs. limit=15.0 2023-11-19 03:48:20,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=552126.6666666666, ans=0.07 2023-11-19 03:48:38,197 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.94 vs. limit=15.0 2023-11-19 03:48:43,086 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10700, loss[loss=0.06344, simple_loss=0.0755, pruned_loss=0.01557, audio_tagging_loss=0.01011, over 14415.00 frames. ], tot_loss[loss=0.09174, simple_loss=0.1086, pruned_loss=0.02663, audio_tagging_loss=0.01079, over 3047239.71 frames. ], batch size: 57, lr: 9.59e-03, grad_scale: 32.0 2023-11-19 03:49:00,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=552326.6666666666, ans=0.125 2023-11-19 03:49:07,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=552393.3333333334, ans=0.1 2023-11-19 03:49:15,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=552460.0, ans=0.05 2023-11-19 03:49:22,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=552460.0, ans=0.125 2023-11-19 03:49:23,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=552460.0, ans=0.0 2023-11-19 03:49:24,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552460.0, ans=0.1 2023-11-19 03:49:25,173 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.213e+01 8.579e+01 9.318e+01 1.032e+02 2.166e+02, threshold=1.864e+02, percent-clipped=1.0 2023-11-19 03:49:29,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=552526.6666666666, ans=0.125 2023-11-19 03:49:35,799 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-11-19 03:49:38,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=552593.3333333334, ans=0.125 2023-11-19 03:49:39,635 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10750, loss[loss=0.1127, simple_loss=0.1353, pruned_loss=0.03514, audio_tagging_loss=0.009877, over 15404.00 frames. ], tot_loss[loss=0.09198, simple_loss=0.109, pruned_loss=0.02676, audio_tagging_loss=0.01072, over 3054099.49 frames. ], batch size: 55, lr: 9.59e-03, grad_scale: 32.0 2023-11-19 03:49:46,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=552593.3333333334, ans=0.1 2023-11-19 03:49:50,724 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.93 vs. limit=22.5 2023-11-19 03:49:56,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=552660.0, ans=0.125 2023-11-19 03:50:14,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=552793.3333333334, ans=0.0 2023-11-19 03:50:27,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=552860.0, ans=0.125 2023-11-19 03:50:34,598 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10800, loss[loss=0.0856, simple_loss=0.09916, pruned_loss=0.02194, audio_tagging_loss=0.01408, over 14748.00 frames. ], tot_loss[loss=0.09175, simple_loss=0.109, pruned_loss=0.02652, audio_tagging_loss=0.01074, over 3055499.23 frames. ], batch size: 55, lr: 9.59e-03, grad_scale: 32.0 2023-11-19 03:50:39,028 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.03 vs. limit=22.5 2023-11-19 03:51:02,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=553060.0, ans=0.125 2023-11-19 03:51:16,709 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.479e+01 8.594e+01 9.337e+01 1.055e+02 1.336e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-19 03:51:22,793 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=553193.3333333334, ans=0.2 2023-11-19 03:51:30,087 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10850, loss[loss=0.08219, simple_loss=0.1018, pruned_loss=0.02083, audio_tagging_loss=0.01048, over 15018.00 frames. ], tot_loss[loss=0.09191, simple_loss=0.1094, pruned_loss=0.02649, audio_tagging_loss=0.0107, over 3060479.52 frames. ], batch size: 57, lr: 9.58e-03, grad_scale: 32.0 2023-11-19 03:51:50,758 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.24 vs. limit=22.5 2023-11-19 03:51:59,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=553393.3333333334, ans=0.125 2023-11-19 03:52:00,278 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.26 vs. limit=15.0 2023-11-19 03:52:05,484 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.87 vs. limit=15.0 2023-11-19 03:52:24,132 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:52:26,799 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2023-11-19 03:52:27,319 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10900, loss[loss=0.08808, simple_loss=0.1091, pruned_loss=0.02296, audio_tagging_loss=0.01058, over 16439.00 frames. ], tot_loss[loss=0.09203, simple_loss=0.1096, pruned_loss=0.02643, audio_tagging_loss=0.0108, over 3065066.25 frames. ], batch size: 61, lr: 9.58e-03, grad_scale: 32.0 2023-11-19 03:52:30,157 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.50 vs. limit=15.0 2023-11-19 03:52:33,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=553593.3333333334, ans=0.125 2023-11-19 03:53:05,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=553793.3333333334, ans=0.07 2023-11-19 03:53:09,521 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.290e+01 8.895e+01 9.757e+01 1.197e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-19 03:53:09,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=553793.3333333334, ans=0.125 2023-11-19 03:53:15,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2023-11-19 03:53:22,210 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 10950, loss[loss=0.09359, simple_loss=0.1087, pruned_loss=0.02923, audio_tagging_loss=0.01001, over 15686.00 frames. ], tot_loss[loss=0.09176, simple_loss=0.1094, pruned_loss=0.0262, audio_tagging_loss=0.01089, over 3062628.79 frames. ], batch size: 58, lr: 9.58e-03, grad_scale: 32.0 2023-11-19 03:53:22,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=553926.6666666666, ans=0.1 2023-11-19 03:53:32,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=553993.3333333334, ans=12.0 2023-11-19 03:53:34,854 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.26 vs. limit=22.5 2023-11-19 03:53:44,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=554060.0, ans=0.035 2023-11-19 03:53:56,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=554126.6666666666, ans=0.125 2023-11-19 03:53:57,730 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.44 vs. limit=15.0 2023-11-19 03:53:59,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=554126.6666666666, ans=0.1 2023-11-19 03:54:02,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=554126.6666666666, ans=0.125 2023-11-19 03:54:08,165 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.23 vs. limit=22.5 2023-11-19 03:54:17,503 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11000, loss[loss=0.09174, simple_loss=0.1154, pruned_loss=0.02684, audio_tagging_loss=0.007214, over 15149.00 frames. ], tot_loss[loss=0.09273, simple_loss=0.1104, pruned_loss=0.02663, audio_tagging_loss=0.01092, over 3054451.14 frames. ], batch size: 57, lr: 9.57e-03, grad_scale: 32.0 2023-11-19 03:54:26,558 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:54:43,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=554393.3333333334, ans=0.0 2023-11-19 03:54:48,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=554393.3333333334, ans=0.125 2023-11-19 03:54:57,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=554460.0, ans=0.5 2023-11-19 03:54:58,748 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.653e+01 8.672e+01 9.432e+01 1.068e+02 1.333e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-19 03:55:06,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=554526.6666666666, ans=0.1 2023-11-19 03:55:08,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=554526.6666666666, ans=0.125 2023-11-19 03:55:09,131 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=554526.6666666666, ans=0.125 2023-11-19 03:55:13,618 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11050, loss[loss=0.1102, simple_loss=0.1328, pruned_loss=0.03362, audio_tagging_loss=0.01016, over 15774.00 frames. ], tot_loss[loss=0.09226, simple_loss=0.1099, pruned_loss=0.02633, audio_tagging_loss=0.01096, over 3051170.10 frames. ], batch size: 57, lr: 9.57e-03, grad_scale: 32.0 2023-11-19 03:55:34,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=554726.6666666666, ans=0.0 2023-11-19 03:55:38,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=554726.6666666666, ans=0.125 2023-11-19 03:55:48,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=554793.3333333334, ans=0.1 2023-11-19 03:55:59,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=554860.0, ans=0.125 2023-11-19 03:56:01,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=554860.0, ans=0.1 2023-11-19 03:56:08,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=554926.6666666666, ans=0.1 2023-11-19 03:56:08,847 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11100, loss[loss=0.08779, simple_loss=0.1015, pruned_loss=0.02471, audio_tagging_loss=0.01233, over 15784.00 frames. ], tot_loss[loss=0.09239, simple_loss=0.1098, pruned_loss=0.02643, audio_tagging_loss=0.01105, over 3052011.76 frames. ], batch size: 60, lr: 9.57e-03, grad_scale: 32.0 2023-11-19 03:56:13,632 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.57 vs. limit=15.0 2023-11-19 03:56:23,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=554993.3333333334, ans=0.0 2023-11-19 03:56:25,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=554993.3333333334, ans=0.125 2023-11-19 03:56:31,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=555060.0, ans=0.1 2023-11-19 03:56:39,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=555060.0, ans=0.2 2023-11-19 03:56:41,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=555126.6666666666, ans=0.125 2023-11-19 03:56:46,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=555126.6666666666, ans=0.125 2023-11-19 03:56:47,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=555126.6666666666, ans=0.125 2023-11-19 03:56:51,124 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.615e+01 9.620e+01 1.023e+02 1.432e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-19 03:56:53,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=555193.3333333334, ans=0.09899494936611666 2023-11-19 03:57:03,796 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11150, loss[loss=0.09457, simple_loss=0.1046, pruned_loss=0.02834, audio_tagging_loss=0.01395, over 14596.00 frames. ], tot_loss[loss=0.09281, simple_loss=0.1102, pruned_loss=0.02654, audio_tagging_loss=0.01116, over 3050186.32 frames. ], batch size: 55, lr: 9.57e-03, grad_scale: 32.0 2023-11-19 03:57:19,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=555326.6666666666, ans=0.125 2023-11-19 03:57:23,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=555326.6666666666, ans=0.125 2023-11-19 03:57:29,858 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:57:56,003 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.23 vs. limit=15.0 2023-11-19 03:57:59,487 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11200, loss[loss=0.09107, simple_loss=0.1139, pruned_loss=0.02283, audio_tagging_loss=0.0113, over 14877.00 frames. ], tot_loss[loss=0.09267, simple_loss=0.1097, pruned_loss=0.02655, audio_tagging_loss=0.01124, over 3052354.78 frames. ], batch size: 53, lr: 9.56e-03, grad_scale: 32.0 2023-11-19 03:58:12,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=555660.0, ans=0.125 2023-11-19 03:58:31,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=555793.3333333334, ans=0.07 2023-11-19 03:58:37,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=555793.3333333334, ans=0.125 2023-11-19 03:58:39,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=555793.3333333334, ans=0.5 2023-11-19 03:58:41,677 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.070e+01 8.552e+01 9.021e+01 1.004e+02 1.285e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 03:58:55,028 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11250, loss[loss=0.07899, simple_loss=0.08983, pruned_loss=0.02326, audio_tagging_loss=0.01082, over 13990.00 frames. ], tot_loss[loss=0.09182, simple_loss=0.1088, pruned_loss=0.02622, audio_tagging_loss=0.01121, over 3054843.81 frames. ], batch size: 55, lr: 9.56e-03, grad_scale: 32.0 2023-11-19 03:58:58,856 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2023-11-19 03:59:00,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=555926.6666666666, ans=0.0 2023-11-19 03:59:04,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=555993.3333333334, ans=0.0 2023-11-19 03:59:21,355 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.12 vs. limit=15.0 2023-11-19 03:59:35,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=556126.6666666666, ans=0.0 2023-11-19 03:59:35,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=556126.6666666666, ans=0.125 2023-11-19 03:59:50,343 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11300, loss[loss=0.08722, simple_loss=0.111, pruned_loss=0.0226, audio_tagging_loss=0.00911, over 15479.00 frames. ], tot_loss[loss=0.09123, simple_loss=0.1083, pruned_loss=0.02606, audio_tagging_loss=0.01101, over 3054682.97 frames. ], batch size: 56, lr: 9.56e-03, grad_scale: 32.0 2023-11-19 04:00:10,615 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.10 vs. limit=12.0 2023-11-19 04:00:18,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=556393.3333333334, ans=0.125 2023-11-19 04:00:32,261 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 8.772e+01 9.510e+01 1.073e+02 1.316e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 04:00:35,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=556526.6666666666, ans=0.125 2023-11-19 04:00:39,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=556526.6666666666, ans=0.125 2023-11-19 04:00:46,038 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11350, loss[loss=0.1016, simple_loss=0.1224, pruned_loss=0.02908, audio_tagging_loss=0.01131, over 15478.00 frames. ], tot_loss[loss=0.09236, simple_loss=0.1099, pruned_loss=0.02648, audio_tagging_loss=0.01092, over 3046239.45 frames. ], batch size: 54, lr: 9.55e-03, grad_scale: 32.0 2023-11-19 04:00:50,122 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.18 vs. limit=15.0 2023-11-19 04:00:58,732 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.34 vs. limit=22.5 2023-11-19 04:01:09,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=556726.6666666666, ans=0.0 2023-11-19 04:01:20,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=556793.3333333334, ans=0.125 2023-11-19 04:01:30,404 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.57 vs. limit=15.0 2023-11-19 04:01:32,586 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=556860.0, ans=0.125 2023-11-19 04:01:37,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=556860.0, ans=0.125 2023-11-19 04:01:41,531 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11400, loss[loss=0.05988, simple_loss=0.07779, pruned_loss=0.01354, audio_tagging_loss=0.007441, over 15449.00 frames. ], tot_loss[loss=0.09201, simple_loss=0.1094, pruned_loss=0.02647, audio_tagging_loss=0.01083, over 3044060.48 frames. ], batch size: 59, lr: 9.55e-03, grad_scale: 32.0 2023-11-19 04:02:02,117 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=12.0 2023-11-19 04:02:11,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=557060.0, ans=0.125 2023-11-19 04:02:14,846 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=557126.6666666666, ans=0.07 2023-11-19 04:02:21,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=557126.6666666666, ans=0.0 2023-11-19 04:02:23,531 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.302e+01 8.621e+01 9.378e+01 1.036e+02 2.217e+02, threshold=1.876e+02, percent-clipped=1.0 2023-11-19 04:02:36,334 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11450, loss[loss=0.08529, simple_loss=0.1045, pruned_loss=0.02519, audio_tagging_loss=0.007829, over 15745.00 frames. ], tot_loss[loss=0.09245, simple_loss=0.1101, pruned_loss=0.02663, audio_tagging_loss=0.01076, over 3043291.87 frames. ], batch size: 58, lr: 9.55e-03, grad_scale: 32.0 2023-11-19 04:02:36,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=557260.0, ans=0.2 2023-11-19 04:02:37,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=557260.0, ans=0.125 2023-11-19 04:02:37,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=557260.0, ans=0.0 2023-11-19 04:03:07,322 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:03:09,613 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.59 vs. limit=15.0 2023-11-19 04:03:19,894 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=557526.6666666666, ans=0.0 2023-11-19 04:03:32,413 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11500, loss[loss=0.08831, simple_loss=0.109, pruned_loss=0.02249, audio_tagging_loss=0.01132, over 15557.00 frames. ], tot_loss[loss=0.09246, simple_loss=0.1101, pruned_loss=0.02666, audio_tagging_loss=0.01073, over 3046210.95 frames. ], batch size: 59, lr: 9.55e-03, grad_scale: 16.0 2023-11-19 04:03:39,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=557593.3333333334, ans=0.125 2023-11-19 04:03:59,785 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.21 vs. limit=6.0 2023-11-19 04:04:04,506 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=557793.3333333334, ans=0.125 2023-11-19 04:04:15,903 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.885e+01 8.787e+01 9.889e+01 1.125e+02 1.791e+02, threshold=1.978e+02, percent-clipped=0.0 2023-11-19 04:04:28,845 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11550, loss[loss=0.126, simple_loss=0.1533, pruned_loss=0.03817, audio_tagging_loss=0.01113, over 15468.00 frames. ], tot_loss[loss=0.09244, simple_loss=0.11, pruned_loss=0.02664, audio_tagging_loss=0.01077, over 3048170.98 frames. ], batch size: 57, lr: 9.54e-03, grad_scale: 16.0 2023-11-19 04:04:49,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=558060.0, ans=0.1 2023-11-19 04:05:02,031 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:05:18,559 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=558193.3333333334, ans=0.125 2023-11-19 04:05:23,630 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11600, loss[loss=0.1124, simple_loss=0.1382, pruned_loss=0.03144, audio_tagging_loss=0.01192, over 15249.00 frames. ], tot_loss[loss=0.09268, simple_loss=0.1105, pruned_loss=0.02671, audio_tagging_loss=0.01073, over 3056967.57 frames. ], batch size: 55, lr: 9.54e-03, grad_scale: 32.0 2023-11-19 04:05:42,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=558326.6666666666, ans=0.125 2023-11-19 04:06:00,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=558460.0, ans=0.1 2023-11-19 04:06:07,037 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.676e+01 9.320e+01 1.048e+02 1.345e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-19 04:06:09,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=558526.6666666666, ans=0.125 2023-11-19 04:06:18,624 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11650, loss[loss=0.06791, simple_loss=0.08836, pruned_loss=0.0144, audio_tagging_loss=0.009335, over 15416.00 frames. ], tot_loss[loss=0.09264, simple_loss=0.1104, pruned_loss=0.0267, audio_tagging_loss=0.01075, over 3058289.93 frames. ], batch size: 57, lr: 9.54e-03, grad_scale: 32.0 2023-11-19 04:06:23,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=558593.3333333334, ans=0.125 2023-11-19 04:06:31,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=558660.0, ans=0.125 2023-11-19 04:06:52,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=558793.3333333334, ans=0.2 2023-11-19 04:07:01,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=558793.3333333334, ans=0.125 2023-11-19 04:07:13,058 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.53 vs. limit=15.0 2023-11-19 04:07:14,574 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11700, loss[loss=0.1245, simple_loss=0.1472, pruned_loss=0.04185, audio_tagging_loss=0.009083, over 15412.00 frames. ], tot_loss[loss=0.09308, simple_loss=0.1106, pruned_loss=0.02692, audio_tagging_loss=0.01085, over 3054627.24 frames. ], batch size: 58, lr: 9.53e-03, grad_scale: 32.0 2023-11-19 04:07:27,964 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:07:31,404 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.56 vs. limit=12.0 2023-11-19 04:07:33,242 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=7.245e-02 2023-11-19 04:07:50,848 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2023-11-19 04:07:57,442 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.729e+01 8.689e+01 9.449e+01 1.084e+02 2.126e+02, threshold=1.890e+02, percent-clipped=1.0 2023-11-19 04:08:02,298 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2023-11-19 04:08:09,617 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11750, loss[loss=0.1139, simple_loss=0.1359, pruned_loss=0.0356, audio_tagging_loss=0.01035, over 16072.00 frames. ], tot_loss[loss=0.0935, simple_loss=0.1108, pruned_loss=0.02721, audio_tagging_loss=0.01089, over 3055252.11 frames. ], batch size: 58, lr: 9.53e-03, grad_scale: 32.0 2023-11-19 04:08:14,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=559260.0, ans=0.0 2023-11-19 04:08:19,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=559326.6666666666, ans=0.0 2023-11-19 04:08:19,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=559326.6666666666, ans=0.125 2023-11-19 04:08:20,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=559326.6666666666, ans=10.0 2023-11-19 04:08:28,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=559326.6666666666, ans=0.0 2023-11-19 04:08:44,946 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.48 vs. limit=10.0 2023-11-19 04:08:55,693 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=559526.6666666666, ans=0.0 2023-11-19 04:08:55,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=559526.6666666666, ans=0.2 2023-11-19 04:08:56,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=559526.6666666666, ans=0.125 2023-11-19 04:09:03,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=559593.3333333334, ans=0.125 2023-11-19 04:09:03,946 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11800, loss[loss=0.1142, simple_loss=0.1379, pruned_loss=0.03504, audio_tagging_loss=0.0102, over 15443.00 frames. ], tot_loss[loss=0.09284, simple_loss=0.11, pruned_loss=0.02692, audio_tagging_loss=0.01091, over 3054391.87 frames. ], batch size: 55, lr: 9.53e-03, grad_scale: 32.0 2023-11-19 04:09:04,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=559593.3333333334, ans=0.0 2023-11-19 04:09:35,670 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.77 vs. limit=15.0 2023-11-19 04:09:44,837 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.55 vs. limit=22.5 2023-11-19 04:09:46,602 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.768e+01 9.397e+01 1.015e+02 1.463e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 04:09:59,776 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11850, loss[loss=0.0841, simple_loss=0.1017, pruned_loss=0.02331, audio_tagging_loss=0.009922, over 14774.00 frames. ], tot_loss[loss=0.09298, simple_loss=0.1103, pruned_loss=0.02694, audio_tagging_loss=0.01091, over 3062625.00 frames. ], batch size: 54, lr: 9.53e-03, grad_scale: 32.0 2023-11-19 04:10:15,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=559993.3333333334, ans=0.125 2023-11-19 04:10:24,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=560060.0, ans=0.125 2023-11-19 04:10:35,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=560126.6666666666, ans=0.125 2023-11-19 04:10:35,480 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=560126.6666666666, ans=0.1 2023-11-19 04:10:42,448 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=560126.6666666666, ans=0.2 2023-11-19 04:10:45,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=560193.3333333334, ans=0.125 2023-11-19 04:10:56,859 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11900, loss[loss=0.0844, simple_loss=0.1018, pruned_loss=0.02205, audio_tagging_loss=0.01146, over 14359.00 frames. ], tot_loss[loss=0.0931, simple_loss=0.1104, pruned_loss=0.02691, audio_tagging_loss=0.01096, over 3060480.00 frames. ], batch size: 54, lr: 9.52e-03, grad_scale: 32.0 2023-11-19 04:11:09,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=560326.6666666666, ans=15.0 2023-11-19 04:11:13,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=560326.6666666666, ans=0.015 2023-11-19 04:11:16,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=560326.6666666666, ans=0.0 2023-11-19 04:11:18,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=560393.3333333334, ans=0.1 2023-11-19 04:11:23,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=560393.3333333334, ans=0.125 2023-11-19 04:11:24,459 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.085e-02 2023-11-19 04:11:26,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=560393.3333333334, ans=0.2 2023-11-19 04:11:28,995 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.21 vs. limit=22.5 2023-11-19 04:11:40,154 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.784e+01 9.525e+01 1.032e+02 1.390e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-19 04:11:47,532 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.68 vs. limit=12.0 2023-11-19 04:11:52,358 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 11950, loss[loss=0.1189, simple_loss=0.1413, pruned_loss=0.03927, audio_tagging_loss=0.008946, over 15931.00 frames. ], tot_loss[loss=0.0928, simple_loss=0.1099, pruned_loss=0.02678, audio_tagging_loss=0.01105, over 3051732.99 frames. ], batch size: 59, lr: 9.52e-03, grad_scale: 32.0 2023-11-19 04:12:41,754 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=12.0 2023-11-19 04:12:46,239 INFO [train_asr.py:1115] (2/4) Epoch 7, batch 12000, loss[loss=0.08265, simple_loss=0.09325, pruned_loss=0.02363, audio_tagging_loss=0.01239, over 14432.00 frames. ], tot_loss[loss=0.09321, simple_loss=0.1103, pruned_loss=0.02691, audio_tagging_loss=0.01114, over 3052225.51 frames. ], batch size: 55, lr: 9.52e-03, grad_scale: 32.0 2023-11-19 04:12:46,240 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 04:13:19,251 INFO [train_asr.py:1147] (2/4) Epoch 7, validation: loss=0.0682, simple_loss=0.05751, pruned_loss=0.007422, audio_tagging_loss=0.03202, over 4681554.00 frames. 2023-11-19 04:13:19,251 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 04:13:36,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=560993.3333333334, ans=0.0 2023-11-19 04:14:20,120 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 0, loss[loss=0.1015, simple_loss=0.106, pruned_loss=0.02147, audio_tagging_loss=0.02706, over 14602.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.106, pruned_loss=0.02147, audio_tagging_loss=0.02706, over 14602.00 frames. ], batch size: 53, lr: 8.97e-03, grad_scale: 32.0 2023-11-19 04:14:20,120 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 04:14:44,859 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8915, 4.9987, 4.8780, 4.9835], device='cuda:2') 2023-11-19 04:14:46,086 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5704, 2.6380, 3.9012, 2.9942], device='cuda:2') 2023-11-19 04:14:51,784 INFO [train_asr.py:1147] (2/4) Epoch 8, validation: loss=0.06722, simple_loss=0.05736, pruned_loss=0.007334, audio_tagging_loss=0.0312, over 4681554.00 frames. 2023-11-19 04:14:51,785 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 04:15:10,723 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 9.365e+01 1.076e+02 1.160e+02 2.715e+02, threshold=2.151e+02, percent-clipped=1.0 2023-11-19 04:15:13,624 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.95 vs. limit=15.0 2023-11-19 04:15:23,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=561213.3333333334, ans=0.125 2023-11-19 04:15:32,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=561280.0, ans=0.0 2023-11-19 04:15:40,935 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=561346.6666666666, ans=0.0 2023-11-19 04:15:47,580 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 50, loss[loss=0.08458, simple_loss=0.08952, pruned_loss=0.01928, audio_tagging_loss=0.02054, over 16330.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1089, pruned_loss=0.02624, audio_tagging_loss=0.02115, over 683589.40 frames. ], batch size: 62, lr: 8.96e-03, grad_scale: 32.0 2023-11-19 04:16:11,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=561546.6666666666, ans=0.125 2023-11-19 04:16:41,072 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.87 vs. limit=22.5 2023-11-19 04:16:43,670 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 100, loss[loss=0.06251, simple_loss=0.06444, pruned_loss=0.01164, audio_tagging_loss=0.01865, over 14975.00 frames. ], tot_loss[loss=0.1004, simple_loss=0.1083, pruned_loss=0.02578, audio_tagging_loss=0.02047, over 1204472.49 frames. ], batch size: 56, lr: 8.96e-03, grad_scale: 32.0 2023-11-19 04:16:54,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=561813.3333333334, ans=0.0 2023-11-19 04:17:02,331 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.971e+01 9.025e+01 9.629e+01 1.101e+02 1.552e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-19 04:17:17,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=561946.6666666666, ans=0.2 2023-11-19 04:17:22,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=561946.6666666666, ans=0.125 2023-11-19 04:17:33,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=562013.3333333334, ans=0.125 2023-11-19 04:17:39,011 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 150, loss[loss=0.08677, simple_loss=0.1004, pruned_loss=0.02159, audio_tagging_loss=0.015, over 15633.00 frames. ], tot_loss[loss=0.09953, simple_loss=0.11, pruned_loss=0.02637, audio_tagging_loss=0.01814, over 1619323.86 frames. ], batch size: 60, lr: 8.96e-03, grad_scale: 32.0 2023-11-19 04:17:41,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=562080.0, ans=0.125 2023-11-19 04:18:03,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=562213.3333333334, ans=0.125 2023-11-19 04:18:04,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=562213.3333333334, ans=0.0 2023-11-19 04:18:15,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=562280.0, ans=0.125 2023-11-19 04:18:31,751 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=562346.6666666666, ans=0.07 2023-11-19 04:18:35,265 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 200, loss[loss=0.07777, simple_loss=0.09661, pruned_loss=0.02155, audio_tagging_loss=0.007918, over 15098.00 frames. ], tot_loss[loss=0.0991, simple_loss=0.1128, pruned_loss=0.027, audio_tagging_loss=0.01569, over 1943986.55 frames. ], batch size: 54, lr: 8.96e-03, grad_scale: 32.0 2023-11-19 04:18:37,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=562413.3333333334, ans=0.125 2023-11-19 04:18:37,793 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.75 vs. limit=15.0 2023-11-19 04:18:54,363 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.820e+01 8.627e+01 9.281e+01 9.933e+01 1.355e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-19 04:18:56,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=562546.6666666666, ans=0.95 2023-11-19 04:19:19,431 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.11 vs. limit=15.0 2023-11-19 04:19:31,617 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 250, loss[loss=0.1153, simple_loss=0.1412, pruned_loss=0.03208, audio_tagging_loss=0.01259, over 14604.00 frames. ], tot_loss[loss=0.09773, simple_loss=0.1129, pruned_loss=0.02724, audio_tagging_loss=0.01406, over 2189544.11 frames. ], batch size: 53, lr: 8.95e-03, grad_scale: 32.0 2023-11-19 04:19:50,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=562813.3333333334, ans=0.2 2023-11-19 04:20:06,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=562946.6666666666, ans=0.0 2023-11-19 04:20:26,696 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 300, loss[loss=0.08225, simple_loss=0.08893, pruned_loss=0.02666, audio_tagging_loss=0.01113, over 14668.00 frames. ], tot_loss[loss=0.0965, simple_loss=0.1125, pruned_loss=0.0271, audio_tagging_loss=0.01314, over 2375101.76 frames. ], batch size: 56, lr: 8.95e-03, grad_scale: 32.0 2023-11-19 04:20:33,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=563080.0, ans=0.0 2023-11-19 04:20:45,216 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2023-11-19 04:20:45,697 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.292e+01 8.672e+01 9.179e+01 1.018e+02 1.268e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-19 04:20:50,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=563213.3333333334, ans=0.1 2023-11-19 04:21:00,109 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=12.0 2023-11-19 04:21:10,201 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:21:22,021 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 350, loss[loss=0.1076, simple_loss=0.1382, pruned_loss=0.03139, audio_tagging_loss=0.007121, over 16426.00 frames. ], tot_loss[loss=0.09575, simple_loss=0.1127, pruned_loss=0.02706, audio_tagging_loss=0.01236, over 2517310.23 frames. ], batch size: 61, lr: 8.95e-03, grad_scale: 32.0 2023-11-19 04:22:01,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=563613.3333333334, ans=0.1 2023-11-19 04:22:17,839 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=563746.6666666666, ans=0.125 2023-11-19 04:22:18,674 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 400, loss[loss=0.09397, simple_loss=0.1048, pruned_loss=0.02879, audio_tagging_loss=0.01278, over 15170.00 frames. ], tot_loss[loss=0.09478, simple_loss=0.1118, pruned_loss=0.02688, audio_tagging_loss=0.01202, over 2633543.92 frames. ], batch size: 56, lr: 8.94e-03, grad_scale: 32.0 2023-11-19 04:22:20,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=563746.6666666666, ans=0.1 2023-11-19 04:22:20,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=563746.6666666666, ans=0.0 2023-11-19 04:22:30,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=563813.3333333334, ans=0.125 2023-11-19 04:22:32,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=563813.3333333334, ans=0.125 2023-11-19 04:22:34,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=563813.3333333334, ans=0.04949747468305833 2023-11-19 04:22:36,771 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.154e+01 8.502e+01 9.440e+01 1.057e+02 1.683e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-19 04:22:41,571 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.47 vs. limit=22.5 2023-11-19 04:22:42,631 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2023-11-19 04:23:09,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=564013.3333333334, ans=0.0 2023-11-19 04:23:12,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=564080.0, ans=0.0 2023-11-19 04:23:13,480 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 450, loss[loss=0.08642, simple_loss=0.1035, pruned_loss=0.02332, audio_tagging_loss=0.01137, over 15034.00 frames. ], tot_loss[loss=0.09412, simple_loss=0.1114, pruned_loss=0.02678, audio_tagging_loss=0.01165, over 2726985.54 frames. ], batch size: 55, lr: 8.94e-03, grad_scale: 32.0 2023-11-19 04:23:27,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=564146.6666666666, ans=0.0 2023-11-19 04:23:48,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=564280.0, ans=0.2 2023-11-19 04:23:48,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=564280.0, ans=0.125 2023-11-19 04:23:55,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=564280.0, ans=0.1 2023-11-19 04:24:08,562 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 500, loss[loss=0.09378, simple_loss=0.1026, pruned_loss=0.02944, audio_tagging_loss=0.01306, over 13952.00 frames. ], tot_loss[loss=0.09312, simple_loss=0.1103, pruned_loss=0.02645, audio_tagging_loss=0.0115, over 2796726.42 frames. ], batch size: 54, lr: 8.94e-03, grad_scale: 32.0 2023-11-19 04:24:13,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=564413.3333333334, ans=0.07 2023-11-19 04:24:28,809 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.084e+01 8.601e+01 9.237e+01 1.002e+02 1.241e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 04:24:44,028 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.71 vs. limit=15.0 2023-11-19 04:24:45,278 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2023-11-19 04:25:04,836 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 550, loss[loss=0.07807, simple_loss=0.09971, pruned_loss=0.02012, audio_tagging_loss=0.008097, over 14342.00 frames. ], tot_loss[loss=0.09208, simple_loss=0.1089, pruned_loss=0.02619, audio_tagging_loss=0.01142, over 2844125.02 frames. ], batch size: 55, lr: 8.94e-03, grad_scale: 32.0 2023-11-19 04:25:13,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=564746.6666666666, ans=0.0 2023-11-19 04:25:21,232 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.45 vs. limit=5.0 2023-11-19 04:25:26,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=564880.0, ans=0.125 2023-11-19 04:26:00,656 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 600, loss[loss=0.0952, simple_loss=0.1227, pruned_loss=0.02372, audio_tagging_loss=0.01014, over 15786.00 frames. ], tot_loss[loss=0.09294, simple_loss=0.1103, pruned_loss=0.0266, audio_tagging_loss=0.0112, over 2890968.33 frames. ], batch size: 57, lr: 8.93e-03, grad_scale: 32.0 2023-11-19 04:26:18,642 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.289e+01 8.437e+01 9.383e+01 9.998e+01 1.583e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-19 04:26:46,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=565346.6666666666, ans=0.0 2023-11-19 04:26:56,082 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 650, loss[loss=0.065, simple_loss=0.07376, pruned_loss=0.01846, audio_tagging_loss=0.009661, over 14039.00 frames. ], tot_loss[loss=0.09262, simple_loss=0.1098, pruned_loss=0.02654, audio_tagging_loss=0.01116, over 2923007.12 frames. ], batch size: 53, lr: 8.93e-03, grad_scale: 32.0 2023-11-19 04:27:33,453 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:27:50,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=565680.0, ans=0.125 2023-11-19 04:27:52,235 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 700, loss[loss=0.07875, simple_loss=0.104, pruned_loss=0.01674, audio_tagging_loss=0.01003, over 15491.00 frames. ], tot_loss[loss=0.09204, simple_loss=0.1095, pruned_loss=0.02617, audio_tagging_loss=0.01113, over 2956441.28 frames. ], batch size: 58, lr: 8.93e-03, grad_scale: 32.0 2023-11-19 04:28:10,736 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.869e+01 8.560e+01 9.295e+01 1.024e+02 1.604e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 04:28:22,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=565880.0, ans=0.0 2023-11-19 04:28:47,738 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 750, loss[loss=0.1077, simple_loss=0.13, pruned_loss=0.0356, audio_tagging_loss=0.007132, over 15232.00 frames. ], tot_loss[loss=0.09224, simple_loss=0.1098, pruned_loss=0.02635, audio_tagging_loss=0.011, over 2978671.35 frames. ], batch size: 55, lr: 8.93e-03, grad_scale: 16.0 2023-11-19 04:28:57,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=566146.6666666666, ans=0.0 2023-11-19 04:29:01,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=566146.6666666666, ans=0.0 2023-11-19 04:29:42,383 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 800, loss[loss=0.1074, simple_loss=0.1187, pruned_loss=0.03702, audio_tagging_loss=0.01102, over 15219.00 frames. ], tot_loss[loss=0.09214, simple_loss=0.1098, pruned_loss=0.02619, audio_tagging_loss=0.01104, over 2998495.25 frames. ], batch size: 56, lr: 8.92e-03, grad_scale: 32.0 2023-11-19 04:29:46,983 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.47 vs. limit=15.0 2023-11-19 04:30:02,734 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.557e+01 9.435e+01 1.048e+02 1.522e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-19 04:30:06,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=566546.6666666666, ans=0.125 2023-11-19 04:30:35,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=566680.0, ans=0.125 2023-11-19 04:30:35,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=566680.0, ans=0.2 2023-11-19 04:30:36,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=566680.0, ans=0.0 2023-11-19 04:30:38,427 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 850, loss[loss=0.09996, simple_loss=0.122, pruned_loss=0.02854, audio_tagging_loss=0.01041, over 14956.00 frames. ], tot_loss[loss=0.09259, simple_loss=0.1104, pruned_loss=0.02635, audio_tagging_loss=0.01103, over 3009249.66 frames. ], batch size: 56, lr: 8.92e-03, grad_scale: 32.0 2023-11-19 04:30:38,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=566746.6666666666, ans=0.0 2023-11-19 04:30:39,096 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.58 vs. limit=15.0 2023-11-19 04:30:51,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=566813.3333333334, ans=0.125 2023-11-19 04:30:51,321 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=566813.3333333334, ans=0.2 2023-11-19 04:31:01,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=566880.0, ans=0.2 2023-11-19 04:31:02,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=566880.0, ans=0.2 2023-11-19 04:31:03,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=566880.0, ans=0.2 2023-11-19 04:31:03,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=566880.0, ans=0.125 2023-11-19 04:31:03,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=566880.0, ans=0.0 2023-11-19 04:31:06,417 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.73 vs. limit=22.5 2023-11-19 04:31:08,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=566880.0, ans=0.05 2023-11-19 04:31:10,611 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2023-11-19 04:31:25,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=567013.3333333334, ans=0.125 2023-11-19 04:31:26,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=567013.3333333334, ans=0.09899494936611666 2023-11-19 04:31:34,305 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 900, loss[loss=0.06987, simple_loss=0.07164, pruned_loss=0.01642, audio_tagging_loss=0.01763, over 16217.00 frames. ], tot_loss[loss=0.09274, simple_loss=0.1106, pruned_loss=0.02635, audio_tagging_loss=0.01106, over 3017382.98 frames. ], batch size: 63, lr: 8.92e-03, grad_scale: 32.0 2023-11-19 04:31:35,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=567080.0, ans=0.125 2023-11-19 04:31:35,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=567080.0, ans=0.125 2023-11-19 04:31:38,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=567080.0, ans=0.125 2023-11-19 04:31:46,757 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2023-11-19 04:31:49,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=567146.6666666666, ans=0.1 2023-11-19 04:31:53,696 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.715e+01 8.468e+01 9.134e+01 1.003e+02 1.510e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 04:31:54,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=567213.3333333334, ans=0.05 2023-11-19 04:31:58,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=567213.3333333334, ans=0.0 2023-11-19 04:32:20,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=567346.6666666666, ans=0.2 2023-11-19 04:32:29,744 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 950, loss[loss=0.1168, simple_loss=0.1513, pruned_loss=0.03339, audio_tagging_loss=0.007771, over 15564.00 frames. ], tot_loss[loss=0.09327, simple_loss=0.1115, pruned_loss=0.02667, audio_tagging_loss=0.01086, over 3020500.98 frames. ], batch size: 57, lr: 8.92e-03, grad_scale: 32.0 2023-11-19 04:32:35,209 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=567413.3333333334, ans=0.1 2023-11-19 04:32:35,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=567413.3333333334, ans=0.125 2023-11-19 04:32:37,003 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.09 vs. limit=5.0 2023-11-19 04:32:53,093 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2023-11-19 04:32:53,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=567546.6666666666, ans=0.0 2023-11-19 04:33:25,107 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1000, loss[loss=0.09643, simple_loss=0.1127, pruned_loss=0.03156, audio_tagging_loss=0.008499, over 14260.00 frames. ], tot_loss[loss=0.09239, simple_loss=0.1107, pruned_loss=0.02642, audio_tagging_loss=0.01064, over 3023065.15 frames. ], batch size: 54, lr: 8.91e-03, grad_scale: 16.0 2023-11-19 04:33:28,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=567746.6666666666, ans=0.0 2023-11-19 04:33:33,890 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=567746.6666666666, ans=0.125 2023-11-19 04:33:35,168 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=22.5 2023-11-19 04:33:39,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=567813.3333333334, ans=0.1 2023-11-19 04:33:46,208 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:33:47,008 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.670e+01 9.529e+01 1.041e+02 1.429e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-19 04:33:47,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=567880.0, ans=0.125 2023-11-19 04:33:49,215 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:33:56,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=567880.0, ans=0.07 2023-11-19 04:34:03,361 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.13 vs. limit=15.0 2023-11-19 04:34:10,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=568013.3333333334, ans=0.125 2023-11-19 04:34:21,668 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1050, loss[loss=0.1068, simple_loss=0.1215, pruned_loss=0.03535, audio_tagging_loss=0.01071, over 15610.00 frames. ], tot_loss[loss=0.09237, simple_loss=0.1107, pruned_loss=0.02638, audio_tagging_loss=0.01064, over 3023468.28 frames. ], batch size: 58, lr: 8.91e-03, grad_scale: 16.0 2023-11-19 04:34:35,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=568146.6666666666, ans=0.125 2023-11-19 04:34:36,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=568146.6666666666, ans=0.125 2023-11-19 04:34:53,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=568280.0, ans=0.0 2023-11-19 04:34:55,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=568280.0, ans=0.0 2023-11-19 04:34:58,950 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=15.0 2023-11-19 04:35:04,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=568346.6666666666, ans=0.125 2023-11-19 04:35:06,150 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=568346.6666666666, ans=0.07 2023-11-19 04:35:12,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=568346.6666666666, ans=0.1 2023-11-19 04:35:16,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=568413.3333333334, ans=0.125 2023-11-19 04:35:17,063 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1100, loss[loss=0.06005, simple_loss=0.07126, pruned_loss=0.01455, audio_tagging_loss=0.009863, over 15456.00 frames. ], tot_loss[loss=0.09161, simple_loss=0.1098, pruned_loss=0.02615, audio_tagging_loss=0.01054, over 3028167.05 frames. ], batch size: 60, lr: 8.91e-03, grad_scale: 16.0 2023-11-19 04:35:19,229 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:35:23,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=12.0 2023-11-19 04:35:32,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=568480.0, ans=0.05 2023-11-19 04:35:32,802 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=22.5 2023-11-19 04:35:38,084 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.818e+01 9.664e+01 1.074e+02 1.667e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-19 04:35:46,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=568546.6666666666, ans=0.125 2023-11-19 04:35:49,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=568613.3333333334, ans=0.0 2023-11-19 04:35:56,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=568613.3333333334, ans=0.1 2023-11-19 04:36:08,276 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.00 vs. limit=15.0 2023-11-19 04:36:10,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=568680.0, ans=0.2 2023-11-19 04:36:12,567 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1150, loss[loss=0.1049, simple_loss=0.1211, pruned_loss=0.03499, audio_tagging_loss=0.009395, over 15219.00 frames. ], tot_loss[loss=0.09249, simple_loss=0.1109, pruned_loss=0.02664, audio_tagging_loss=0.0104, over 3031715.66 frames. ], batch size: 57, lr: 8.91e-03, grad_scale: 16.0 2023-11-19 04:36:35,266 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.16 vs. limit=22.5 2023-11-19 04:37:08,817 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1200, loss[loss=0.08859, simple_loss=0.1137, pruned_loss=0.0246, audio_tagging_loss=0.007151, over 15200.00 frames. ], tot_loss[loss=0.09179, simple_loss=0.11, pruned_loss=0.02633, audio_tagging_loss=0.01044, over 3035123.81 frames. ], batch size: 55, lr: 8.90e-03, grad_scale: 32.0 2023-11-19 04:37:15,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=569080.0, ans=0.125 2023-11-19 04:37:29,361 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.648e+01 9.273e+01 1.051e+02 1.425e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-19 04:37:38,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=569213.3333333334, ans=0.1 2023-11-19 04:37:48,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=569280.0, ans=0.0 2023-11-19 04:37:55,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=569346.6666666666, ans=0.2 2023-11-19 04:37:58,084 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=569346.6666666666, ans=0.1 2023-11-19 04:38:04,302 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1250, loss[loss=0.1199, simple_loss=0.1431, pruned_loss=0.0364, audio_tagging_loss=0.0119, over 15686.00 frames. ], tot_loss[loss=0.09147, simple_loss=0.1095, pruned_loss=0.02628, audio_tagging_loss=0.01047, over 3033493.24 frames. ], batch size: 56, lr: 8.90e-03, grad_scale: 32.0 2023-11-19 04:38:09,643 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.50 vs. limit=22.5 2023-11-19 04:38:29,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=569546.6666666666, ans=0.1 2023-11-19 04:38:43,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=569613.3333333334, ans=0.125 2023-11-19 04:38:44,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=569613.3333333334, ans=0.0 2023-11-19 04:38:44,880 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.74 vs. limit=15.0 2023-11-19 04:38:58,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=569680.0, ans=0.2 2023-11-19 04:38:59,914 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1300, loss[loss=0.07892, simple_loss=0.08698, pruned_loss=0.02396, audio_tagging_loss=0.01148, over 14514.00 frames. ], tot_loss[loss=0.09076, simple_loss=0.1087, pruned_loss=0.02588, audio_tagging_loss=0.01055, over 3022284.44 frames. ], batch size: 56, lr: 8.90e-03, grad_scale: 32.0 2023-11-19 04:39:21,656 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.412e+01 8.385e+01 9.003e+01 9.844e+01 1.320e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-19 04:39:22,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=569880.0, ans=0.125 2023-11-19 04:39:41,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=569946.6666666666, ans=0.125 2023-11-19 04:39:56,373 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1350, loss[loss=0.1103, simple_loss=0.13, pruned_loss=0.03595, audio_tagging_loss=0.009311, over 15938.00 frames. ], tot_loss[loss=0.09016, simple_loss=0.1079, pruned_loss=0.02563, audio_tagging_loss=0.01057, over 3030659.61 frames. ], batch size: 57, lr: 8.90e-03, grad_scale: 32.0 2023-11-19 04:40:06,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=570146.6666666666, ans=0.125 2023-11-19 04:40:11,902 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=6.0 2023-11-19 04:40:20,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=570213.3333333334, ans=0.125 2023-11-19 04:40:26,831 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.36 vs. limit=15.0 2023-11-19 04:40:31,232 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=570280.0, ans=0.0 2023-11-19 04:40:36,368 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:40:36,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=570280.0, ans=0.1 2023-11-19 04:40:51,820 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1400, loss[loss=0.1113, simple_loss=0.1343, pruned_loss=0.03496, audio_tagging_loss=0.009156, over 15395.00 frames. ], tot_loss[loss=0.0902, simple_loss=0.1078, pruned_loss=0.02566, audio_tagging_loss=0.01062, over 3040546.36 frames. ], batch size: 59, lr: 8.89e-03, grad_scale: 32.0 2023-11-19 04:41:13,427 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.757e+01 9.593e+01 1.066e+02 1.571e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-19 04:41:16,862 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=570546.6666666666, ans=0.1 2023-11-19 04:41:16,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=570546.6666666666, ans=0.125 2023-11-19 04:41:20,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=570546.6666666666, ans=0.0 2023-11-19 04:41:44,800 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2023-11-19 04:41:47,094 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.87 vs. limit=6.0 2023-11-19 04:41:47,472 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1450, loss[loss=0.09689, simple_loss=0.111, pruned_loss=0.02793, audio_tagging_loss=0.01347, over 14744.00 frames. ], tot_loss[loss=0.08991, simple_loss=0.1076, pruned_loss=0.02542, audio_tagging_loss=0.01069, over 3035798.06 frames. ], batch size: 57, lr: 8.89e-03, grad_scale: 32.0 2023-11-19 04:42:23,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=570946.6666666666, ans=0.125 2023-11-19 04:42:24,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=570946.6666666666, ans=0.2 2023-11-19 04:42:43,688 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1500, loss[loss=0.1011, simple_loss=0.1286, pruned_loss=0.02716, audio_tagging_loss=0.009654, over 16312.00 frames. ], tot_loss[loss=0.08997, simple_loss=0.1077, pruned_loss=0.02538, audio_tagging_loss=0.01072, over 3037505.71 frames. ], batch size: 58, lr: 8.89e-03, grad_scale: 32.0 2023-11-19 04:42:49,740 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:42:52,793 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:43:00,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=571146.6666666666, ans=0.125 2023-11-19 04:43:03,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=571146.6666666666, ans=0.125 2023-11-19 04:43:04,351 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 8.390e+01 9.200e+01 9.780e+01 1.571e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-19 04:43:39,286 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1550, loss[loss=0.08159, simple_loss=0.09342, pruned_loss=0.02429, audio_tagging_loss=0.01059, over 14730.00 frames. ], tot_loss[loss=0.09045, simple_loss=0.1082, pruned_loss=0.02556, audio_tagging_loss=0.01078, over 3038596.79 frames. ], batch size: 56, lr: 8.88e-03, grad_scale: 32.0 2023-11-19 04:43:39,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=571413.3333333334, ans=0.125 2023-11-19 04:43:48,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=571480.0, ans=0.0 2023-11-19 04:44:03,378 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:44:05,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=571546.6666666666, ans=0.125 2023-11-19 04:44:20,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=571613.3333333334, ans=0.2 2023-11-19 04:44:26,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=571680.0, ans=0.2 2023-11-19 04:44:34,453 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1600, loss[loss=0.1226, simple_loss=0.1464, pruned_loss=0.03995, audio_tagging_loss=0.009434, over 16402.00 frames. ], tot_loss[loss=0.09103, simple_loss=0.1083, pruned_loss=0.02594, audio_tagging_loss=0.01096, over 3049517.78 frames. ], batch size: 56, lr: 8.88e-03, grad_scale: 32.0 2023-11-19 04:44:34,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=571746.6666666666, ans=0.125 2023-11-19 04:44:35,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=571746.6666666666, ans=0.125 2023-11-19 04:44:41,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=571746.6666666666, ans=0.125 2023-11-19 04:44:44,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=571746.6666666666, ans=0.0 2023-11-19 04:44:56,105 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.975e+01 9.863e+01 1.094e+02 1.850e+02, threshold=1.973e+02, percent-clipped=1.0 2023-11-19 04:45:04,180 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.10 vs. limit=15.0 2023-11-19 04:45:25,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=572013.3333333334, ans=0.2 2023-11-19 04:45:31,002 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1650, loss[loss=0.07776, simple_loss=0.0975, pruned_loss=0.01988, audio_tagging_loss=0.009129, over 14306.00 frames. ], tot_loss[loss=0.09129, simple_loss=0.109, pruned_loss=0.02587, audio_tagging_loss=0.01093, over 3048741.24 frames. ], batch size: 53, lr: 8.88e-03, grad_scale: 32.0 2023-11-19 04:45:31,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=572080.0, ans=0.125 2023-11-19 04:45:36,466 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.13 vs. limit=15.0 2023-11-19 04:45:56,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=572213.3333333334, ans=0.125 2023-11-19 04:45:57,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=572213.3333333334, ans=0.1 2023-11-19 04:46:20,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=572346.6666666666, ans=0.0 2023-11-19 04:46:26,832 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1700, loss[loss=0.084, simple_loss=0.1012, pruned_loss=0.02321, audio_tagging_loss=0.01018, over 14936.00 frames. ], tot_loss[loss=0.0921, simple_loss=0.11, pruned_loss=0.02627, audio_tagging_loss=0.01083, over 3048716.50 frames. ], batch size: 57, lr: 8.88e-03, grad_scale: 32.0 2023-11-19 04:46:31,700 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2023-11-19 04:46:35,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=572413.3333333334, ans=0.1 2023-11-19 04:46:40,826 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2023-11-19 04:46:47,239 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.408e+01 9.171e+01 1.022e+02 1.332e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-19 04:46:56,875 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.29 vs. limit=15.0 2023-11-19 04:47:21,142 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1750, loss[loss=0.08608, simple_loss=0.09883, pruned_loss=0.02514, audio_tagging_loss=0.01153, over 15264.00 frames. ], tot_loss[loss=0.0913, simple_loss=0.1091, pruned_loss=0.02592, audio_tagging_loss=0.01083, over 3058075.07 frames. ], batch size: 59, lr: 8.87e-03, grad_scale: 32.0 2023-11-19 04:47:26,324 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=12.0 2023-11-19 04:47:32,093 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.22 vs. limit=15.0 2023-11-19 04:47:44,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=572880.0, ans=0.0 2023-11-19 04:47:56,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=572946.6666666666, ans=0.2 2023-11-19 04:48:05,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=573013.3333333334, ans=0.0 2023-11-19 04:48:17,884 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1800, loss[loss=0.09482, simple_loss=0.124, pruned_loss=0.02493, audio_tagging_loss=0.007912, over 15771.00 frames. ], tot_loss[loss=0.09181, simple_loss=0.1101, pruned_loss=0.02604, audio_tagging_loss=0.01071, over 3055443.83 frames. ], batch size: 58, lr: 8.87e-03, grad_scale: 32.0 2023-11-19 04:48:18,114 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=573080.0, ans=0.125 2023-11-19 04:48:20,510 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.07 vs. limit=22.5 2023-11-19 04:48:38,397 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.757e+01 8.476e+01 9.222e+01 1.009e+02 1.227e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 04:48:38,626 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=573213.3333333334, ans=0.0 2023-11-19 04:48:49,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=573280.0, ans=0.125 2023-11-19 04:49:06,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=573346.6666666666, ans=0.1 2023-11-19 04:49:07,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=573346.6666666666, ans=0.1 2023-11-19 04:49:09,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=573346.6666666666, ans=0.035 2023-11-19 04:49:13,801 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1850, loss[loss=0.08085, simple_loss=0.09323, pruned_loss=0.02443, audio_tagging_loss=0.009801, over 15590.00 frames. ], tot_loss[loss=0.09258, simple_loss=0.1111, pruned_loss=0.02632, audio_tagging_loss=0.0107, over 3053204.15 frames. ], batch size: 60, lr: 8.87e-03, grad_scale: 32.0 2023-11-19 04:49:30,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=573480.0, ans=0.125 2023-11-19 04:49:49,377 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2023-11-19 04:49:58,151 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.63 vs. limit=10.0 2023-11-19 04:50:00,287 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.79 vs. limit=15.0 2023-11-19 04:50:09,106 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1900, loss[loss=0.08534, simple_loss=0.09057, pruned_loss=0.02703, audio_tagging_loss=0.01302, over 14657.00 frames. ], tot_loss[loss=0.09212, simple_loss=0.1106, pruned_loss=0.02619, audio_tagging_loss=0.01063, over 3051077.26 frames. ], batch size: 56, lr: 8.87e-03, grad_scale: 32.0 2023-11-19 04:50:20,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=573813.3333333334, ans=0.1 2023-11-19 04:50:26,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=573813.3333333334, ans=0.125 2023-11-19 04:50:29,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=573813.3333333334, ans=0.0 2023-11-19 04:50:31,201 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.045e+01 8.643e+01 9.371e+01 1.051e+02 1.561e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 04:50:36,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=573880.0, ans=0.0 2023-11-19 04:50:39,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=573880.0, ans=0.125 2023-11-19 04:51:05,291 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 1950, loss[loss=0.07133, simple_loss=0.08547, pruned_loss=0.01825, audio_tagging_loss=0.01034, over 15318.00 frames. ], tot_loss[loss=0.09171, simple_loss=0.1098, pruned_loss=0.02617, audio_tagging_loss=0.01063, over 3043037.26 frames. ], batch size: 59, lr: 8.86e-03, grad_scale: 32.0 2023-11-19 04:51:12,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=574080.0, ans=0.125 2023-11-19 04:51:27,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=574213.3333333334, ans=0.0 2023-11-19 04:51:35,305 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.22 vs. limit=12.0 2023-11-19 04:51:37,197 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:52:00,652 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=574413.3333333334, ans=0.125 2023-11-19 04:52:01,539 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2000, loss[loss=0.08155, simple_loss=0.09986, pruned_loss=0.02004, audio_tagging_loss=0.01157, over 15402.00 frames. ], tot_loss[loss=0.09079, simple_loss=0.1085, pruned_loss=0.02587, audio_tagging_loss=0.01067, over 3040318.46 frames. ], batch size: 55, lr: 8.86e-03, grad_scale: 32.0 2023-11-19 04:52:06,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=574413.3333333334, ans=10.0 2023-11-19 04:52:21,799 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.904e+01 9.748e+01 1.142e+02 1.614e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-19 04:52:25,918 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.59 vs. limit=12.0 2023-11-19 04:52:30,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=574546.6666666666, ans=0.125 2023-11-19 04:52:42,343 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=574613.3333333334, ans=0.0 2023-11-19 04:52:44,490 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=574613.3333333334, ans=0.125 2023-11-19 04:52:45,037 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=5.97 vs. limit=15.0 2023-11-19 04:52:47,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=574680.0, ans=0.125 2023-11-19 04:52:53,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=574680.0, ans=0.125 2023-11-19 04:52:57,086 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2050, loss[loss=0.1099, simple_loss=0.1365, pruned_loss=0.03304, audio_tagging_loss=0.008597, over 15055.00 frames. ], tot_loss[loss=0.09109, simple_loss=0.1089, pruned_loss=0.02602, audio_tagging_loss=0.01064, over 3038466.24 frames. ], batch size: 56, lr: 8.86e-03, grad_scale: 32.0 2023-11-19 04:53:05,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=574746.6666666666, ans=0.0 2023-11-19 04:53:09,630 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.23 vs. limit=15.0 2023-11-19 04:53:17,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=574813.3333333334, ans=0.125 2023-11-19 04:53:38,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=574946.6666666666, ans=0.1 2023-11-19 04:53:38,785 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2023-11-19 04:53:41,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=575013.3333333334, ans=0.1 2023-11-19 04:53:45,910 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=575013.3333333334, ans=0.025 2023-11-19 04:53:52,672 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2100, loss[loss=0.09264, simple_loss=0.111, pruned_loss=0.02708, audio_tagging_loss=0.01006, over 15472.00 frames. ], tot_loss[loss=0.09174, simple_loss=0.1099, pruned_loss=0.02622, audio_tagging_loss=0.0106, over 3039697.51 frames. ], batch size: 58, lr: 8.86e-03, grad_scale: 32.0 2023-11-19 04:53:57,929 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.56 vs. limit=15.0 2023-11-19 04:53:59,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=575080.0, ans=0.125 2023-11-19 04:54:03,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=575146.6666666666, ans=0.1 2023-11-19 04:54:05,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=575146.6666666666, ans=0.07 2023-11-19 04:54:09,721 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.92 vs. limit=22.5 2023-11-19 04:54:14,253 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.268e+01 8.570e+01 9.138e+01 1.001e+02 1.384e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 04:54:32,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=575280.0, ans=0.0 2023-11-19 04:54:48,406 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2150, loss[loss=0.1157, simple_loss=0.148, pruned_loss=0.03356, audio_tagging_loss=0.008183, over 15097.00 frames. ], tot_loss[loss=0.09169, simple_loss=0.1097, pruned_loss=0.02626, audio_tagging_loss=0.01059, over 3037600.26 frames. ], batch size: 57, lr: 8.85e-03, grad_scale: 32.0 2023-11-19 04:55:04,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=575480.0, ans=0.125 2023-11-19 04:55:09,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=575546.6666666666, ans=0.125 2023-11-19 04:55:10,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=575546.6666666666, ans=0.125 2023-11-19 04:55:20,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=575546.6666666666, ans=0.125 2023-11-19 04:55:20,943 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:55:24,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=575613.3333333334, ans=0.05 2023-11-19 04:55:24,561 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2023-11-19 04:55:43,945 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2200, loss[loss=0.08236, simple_loss=0.09574, pruned_loss=0.02434, audio_tagging_loss=0.01015, over 14197.00 frames. ], tot_loss[loss=0.09193, simple_loss=0.1098, pruned_loss=0.02632, audio_tagging_loss=0.0107, over 3036252.20 frames. ], batch size: 56, lr: 8.85e-03, grad_scale: 32.0 2023-11-19 04:56:04,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=575813.3333333334, ans=0.125 2023-11-19 04:56:04,955 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.523e+01 9.283e+01 9.995e+01 1.354e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-19 04:56:16,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=575946.6666666666, ans=0.125 2023-11-19 04:56:29,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=576013.3333333334, ans=0.1 2023-11-19 04:56:31,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=576013.3333333334, ans=0.0 2023-11-19 04:56:39,877 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2250, loss[loss=0.08642, simple_loss=0.1083, pruned_loss=0.0228, audio_tagging_loss=0.009491, over 15434.00 frames. ], tot_loss[loss=0.09172, simple_loss=0.1097, pruned_loss=0.02618, audio_tagging_loss=0.01067, over 3036104.36 frames. ], batch size: 56, lr: 8.85e-03, grad_scale: 32.0 2023-11-19 04:56:48,250 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.76 vs. limit=22.5 2023-11-19 04:56:51,788 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:57:06,385 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2023-11-19 04:57:12,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=576280.0, ans=0.2 2023-11-19 04:57:19,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=576280.0, ans=0.125 2023-11-19 04:57:29,424 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=15.0 2023-11-19 04:57:35,791 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2300, loss[loss=0.07462, simple_loss=0.08632, pruned_loss=0.01823, audio_tagging_loss=0.01323, over 15425.00 frames. ], tot_loss[loss=0.09148, simple_loss=0.1092, pruned_loss=0.02605, audio_tagging_loss=0.01081, over 3037263.26 frames. ], batch size: 58, lr: 8.85e-03, grad_scale: 16.0 2023-11-19 04:57:57,357 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.832e+01 8.335e+01 9.344e+01 1.048e+02 1.433e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-19 04:58:01,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=576546.6666666666, ans=0.1 2023-11-19 04:58:10,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=576613.3333333334, ans=0.125 2023-11-19 04:58:13,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=576613.3333333334, ans=10.0 2023-11-19 04:58:23,144 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:58:26,666 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.47 vs. limit=10.0 2023-11-19 04:58:30,448 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.21 vs. limit=22.5 2023-11-19 04:58:30,956 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2350, loss[loss=0.1093, simple_loss=0.1302, pruned_loss=0.03038, audio_tagging_loss=0.01378, over 15440.00 frames. ], tot_loss[loss=0.09279, simple_loss=0.1112, pruned_loss=0.02643, audio_tagging_loss=0.01075, over 3047533.14 frames. ], batch size: 57, lr: 8.84e-03, grad_scale: 16.0 2023-11-19 04:58:34,598 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=12.0 2023-11-19 04:58:47,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=576813.3333333334, ans=0.125 2023-11-19 04:59:15,086 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.94 vs. limit=22.5 2023-11-19 04:59:22,178 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=577013.3333333334, ans=0.125 2023-11-19 04:59:26,755 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2400, loss[loss=0.09934, simple_loss=0.1156, pruned_loss=0.02801, audio_tagging_loss=0.01351, over 14623.00 frames. ], tot_loss[loss=0.09246, simple_loss=0.1105, pruned_loss=0.02633, audio_tagging_loss=0.01087, over 3048111.72 frames. ], batch size: 53, lr: 8.84e-03, grad_scale: 32.0 2023-11-19 04:59:27,267 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=71.04 vs. limit=15.0 2023-11-19 04:59:31,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=577080.0, ans=0.1 2023-11-19 04:59:40,210 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=15.0 2023-11-19 04:59:41,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=577146.6666666666, ans=0.015 2023-11-19 04:59:48,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=577213.3333333334, ans=0.0 2023-11-19 04:59:48,960 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.521e+01 9.139e+01 1.013e+02 1.981e+02, threshold=1.828e+02, percent-clipped=1.0 2023-11-19 04:59:59,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=577280.0, ans=0.07 2023-11-19 05:00:00,885 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.07 vs. limit=22.5 2023-11-19 05:00:23,019 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2450, loss[loss=0.09112, simple_loss=0.1056, pruned_loss=0.02999, audio_tagging_loss=0.008342, over 14918.00 frames. ], tot_loss[loss=0.09223, simple_loss=0.1099, pruned_loss=0.02629, audio_tagging_loss=0.01101, over 3044911.59 frames. ], batch size: 62, lr: 8.84e-03, grad_scale: 32.0 2023-11-19 05:00:46,466 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.30 vs. limit=22.5 2023-11-19 05:00:52,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=577546.6666666666, ans=0.07 2023-11-19 05:00:53,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=577546.6666666666, ans=0.125 2023-11-19 05:01:08,097 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.22 vs. limit=15.0 2023-11-19 05:01:15,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=577680.0, ans=0.125 2023-11-19 05:01:18,020 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2500, loss[loss=0.0767, simple_loss=0.08892, pruned_loss=0.02058, audio_tagging_loss=0.01165, over 14883.00 frames. ], tot_loss[loss=0.09249, simple_loss=0.1103, pruned_loss=0.02633, audio_tagging_loss=0.011, over 3042154.54 frames. ], batch size: 55, lr: 8.84e-03, grad_scale: 16.0 2023-11-19 05:01:22,540 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=577746.6666666666, ans=0.0 2023-11-19 05:01:29,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=577813.3333333334, ans=0.2 2023-11-19 05:01:32,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=577813.3333333334, ans=0.125 2023-11-19 05:01:33,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=577813.3333333334, ans=0.125 2023-11-19 05:01:41,717 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.089e+01 8.392e+01 9.155e+01 9.880e+01 1.151e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 05:01:52,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=577946.6666666666, ans=0.125 2023-11-19 05:01:52,929 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=577946.6666666666, ans=0.1 2023-11-19 05:02:13,218 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2550, loss[loss=0.1142, simple_loss=0.1463, pruned_loss=0.03073, audio_tagging_loss=0.01034, over 15607.00 frames. ], tot_loss[loss=0.09192, simple_loss=0.1096, pruned_loss=0.02623, audio_tagging_loss=0.01088, over 3037639.23 frames. ], batch size: 55, lr: 8.83e-03, grad_scale: 16.0 2023-11-19 05:02:33,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=578146.6666666666, ans=0.125 2023-11-19 05:02:40,464 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:02:50,455 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.81 vs. limit=12.0 2023-11-19 05:03:01,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=578346.6666666666, ans=0.125 2023-11-19 05:03:07,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=578346.6666666666, ans=0.125 2023-11-19 05:03:09,591 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2600, loss[loss=0.07605, simple_loss=0.08873, pruned_loss=0.01946, audio_tagging_loss=0.01223, over 15192.00 frames. ], tot_loss[loss=0.09054, simple_loss=0.1079, pruned_loss=0.02577, audio_tagging_loss=0.01082, over 3044015.30 frames. ], batch size: 60, lr: 8.83e-03, grad_scale: 16.0 2023-11-19 05:03:32,524 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.956e+01 8.609e+01 9.579e+01 1.039e+02 1.650e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-19 05:03:46,100 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-19 05:04:02,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=578680.0, ans=0.1 2023-11-19 05:04:05,561 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2650, loss[loss=0.09686, simple_loss=0.1246, pruned_loss=0.02266, audio_tagging_loss=0.01192, over 14302.00 frames. ], tot_loss[loss=0.09092, simple_loss=0.1084, pruned_loss=0.02588, audio_tagging_loss=0.01087, over 3043743.75 frames. ], batch size: 57, lr: 8.83e-03, grad_scale: 16.0 2023-11-19 05:04:20,138 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.42 vs. limit=12.0 2023-11-19 05:04:31,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=578880.0, ans=0.025 2023-11-19 05:04:39,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=578946.6666666666, ans=0.125 2023-11-19 05:04:51,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=579013.3333333334, ans=0.125 2023-11-19 05:04:53,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=579013.3333333334, ans=0.125 2023-11-19 05:05:00,355 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2700, loss[loss=0.06749, simple_loss=0.07332, pruned_loss=0.01874, audio_tagging_loss=0.01209, over 14870.00 frames. ], tot_loss[loss=0.09087, simple_loss=0.1084, pruned_loss=0.02595, audio_tagging_loss=0.01073, over 3037353.70 frames. ], batch size: 56, lr: 8.83e-03, grad_scale: 16.0 2023-11-19 05:05:06,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=579080.0, ans=0.2 2023-11-19 05:05:15,775 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=579146.6666666666, ans=0.1 2023-11-19 05:05:15,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=579146.6666666666, ans=0.125 2023-11-19 05:05:23,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=579213.3333333334, ans=0.2 2023-11-19 05:05:24,702 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.328e+01 8.403e+01 9.195e+01 1.002e+02 1.372e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-19 05:05:34,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=579280.0, ans=0.125 2023-11-19 05:05:36,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=579280.0, ans=0.1 2023-11-19 05:05:37,158 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.53 vs. limit=22.5 2023-11-19 05:05:37,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=579280.0, ans=0.0 2023-11-19 05:05:56,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=579413.3333333334, ans=0.125 2023-11-19 05:05:57,145 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2750, loss[loss=0.07335, simple_loss=0.08923, pruned_loss=0.01877, audio_tagging_loss=0.009961, over 15008.00 frames. ], tot_loss[loss=0.0903, simple_loss=0.1075, pruned_loss=0.02579, audio_tagging_loss=0.01078, over 3033951.08 frames. ], batch size: 57, lr: 8.82e-03, grad_scale: 16.0 2023-11-19 05:06:03,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=579413.3333333334, ans=0.125 2023-11-19 05:06:10,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=579480.0, ans=0.0 2023-11-19 05:06:14,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=579480.0, ans=0.125 2023-11-19 05:06:22,486 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=579546.6666666666, ans=0.125 2023-11-19 05:06:34,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=579613.3333333334, ans=0.2 2023-11-19 05:06:44,671 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:06:52,972 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2800, loss[loss=0.05864, simple_loss=0.06764, pruned_loss=0.01444, audio_tagging_loss=0.01039, over 14338.00 frames. ], tot_loss[loss=0.08995, simple_loss=0.1074, pruned_loss=0.0256, audio_tagging_loss=0.01064, over 3033330.18 frames. ], batch size: 56, lr: 8.82e-03, grad_scale: 32.0 2023-11-19 05:06:53,261 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=579746.6666666666, ans=0.125 2023-11-19 05:06:55,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=579746.6666666666, ans=0.125 2023-11-19 05:07:16,554 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 5.973e+01 8.959e+01 9.930e+01 1.123e+02 1.609e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-19 05:07:26,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=579946.6666666666, ans=0.125 2023-11-19 05:07:36,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=580013.3333333334, ans=0.125 2023-11-19 05:07:42,273 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.39 vs. limit=6.0 2023-11-19 05:07:48,250 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2850, loss[loss=0.07387, simple_loss=0.08012, pruned_loss=0.02301, audio_tagging_loss=0.0108, over 14668.00 frames. ], tot_loss[loss=0.08867, simple_loss=0.1058, pruned_loss=0.02517, audio_tagging_loss=0.01062, over 3032779.48 frames. ], batch size: 56, lr: 8.82e-03, grad_scale: 32.0 2023-11-19 05:07:57,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=580080.0, ans=15.0 2023-11-19 05:08:01,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=580146.6666666666, ans=0.0 2023-11-19 05:08:13,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=580213.3333333334, ans=10.0 2023-11-19 05:08:14,297 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.03 vs. limit=22.5 2023-11-19 05:08:24,832 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=580280.0, ans=0.1 2023-11-19 05:08:31,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=580280.0, ans=0.125 2023-11-19 05:08:33,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=580346.6666666666, ans=10.0 2023-11-19 05:08:33,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=580346.6666666666, ans=0.07 2023-11-19 05:08:44,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=580413.3333333334, ans=0.125 2023-11-19 05:08:45,314 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2900, loss[loss=0.08094, simple_loss=0.09074, pruned_loss=0.02401, audio_tagging_loss=0.01155, over 14684.00 frames. ], tot_loss[loss=0.0896, simple_loss=0.1071, pruned_loss=0.02547, audio_tagging_loss=0.0106, over 3040712.63 frames. ], batch size: 56, lr: 8.82e-03, grad_scale: 32.0 2023-11-19 05:08:45,753 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2023-11-19 05:08:47,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=580413.3333333334, ans=0.0 2023-11-19 05:08:55,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=580480.0, ans=0.0 2023-11-19 05:09:01,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=580480.0, ans=0.0 2023-11-19 05:09:08,429 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.767e+01 8.526e+01 9.240e+01 1.001e+02 1.332e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 05:09:15,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=580546.6666666666, ans=0.125 2023-11-19 05:09:29,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=580680.0, ans=0.2 2023-11-19 05:09:30,992 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.58 vs. limit=22.5 2023-11-19 05:09:41,493 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 2950, loss[loss=0.0754, simple_loss=0.07518, pruned_loss=0.02304, audio_tagging_loss=0.01477, over 15307.00 frames. ], tot_loss[loss=0.08931, simple_loss=0.1066, pruned_loss=0.02526, audio_tagging_loss=0.01075, over 3041180.01 frames. ], batch size: 62, lr: 8.81e-03, grad_scale: 32.0 2023-11-19 05:09:48,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=580746.6666666666, ans=0.0 2023-11-19 05:10:30,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=581013.3333333334, ans=0.125 2023-11-19 05:10:36,735 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3000, loss[loss=0.08705, simple_loss=0.115, pruned_loss=0.02236, audio_tagging_loss=0.007178, over 16060.00 frames. ], tot_loss[loss=0.09049, simple_loss=0.1079, pruned_loss=0.0257, audio_tagging_loss=0.01086, over 3038636.92 frames. ], batch size: 60, lr: 8.81e-03, grad_scale: 32.0 2023-11-19 05:10:36,736 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 05:11:08,553 INFO [train_asr.py:1147] (2/4) Epoch 8, validation: loss=0.06637, simple_loss=0.05694, pruned_loss=0.00724, audio_tagging_loss=0.03066, over 4681554.00 frames. 2023-11-19 05:11:08,554 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 05:11:10,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=581080.0, ans=0.1 2023-11-19 05:11:31,375 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.149e+01 8.419e+01 9.133e+01 9.790e+01 1.190e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 05:12:04,550 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3050, loss[loss=0.09096, simple_loss=0.1208, pruned_loss=0.02004, audio_tagging_loss=0.0105, over 16126.00 frames. ], tot_loss[loss=0.0913, simple_loss=0.1088, pruned_loss=0.02608, audio_tagging_loss=0.01083, over 3043401.24 frames. ], batch size: 60, lr: 8.81e-03, grad_scale: 32.0 2023-11-19 05:12:11,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=581413.3333333334, ans=0.0 2023-11-19 05:12:25,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=581546.6666666666, ans=0.125 2023-11-19 05:12:30,245 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=581546.6666666666, ans=0.1 2023-11-19 05:12:36,520 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:12:45,364 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=7.97 vs. limit=15.0 2023-11-19 05:12:59,882 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3100, loss[loss=0.1074, simple_loss=0.1233, pruned_loss=0.03329, audio_tagging_loss=0.01242, over 15585.00 frames. ], tot_loss[loss=0.09149, simple_loss=0.1091, pruned_loss=0.02594, audio_tagging_loss=0.011, over 3038817.85 frames. ], batch size: 57, lr: 8.81e-03, grad_scale: 32.0 2023-11-19 05:13:01,485 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2023-11-19 05:13:04,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=581746.6666666666, ans=0.5 2023-11-19 05:13:24,918 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.266e+01 8.767e+01 9.718e+01 1.090e+02 1.747e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-19 05:13:37,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=581946.6666666666, ans=0.125 2023-11-19 05:13:55,665 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3150, loss[loss=0.08636, simple_loss=0.1033, pruned_loss=0.02537, audio_tagging_loss=0.009355, over 15333.00 frames. ], tot_loss[loss=0.09146, simple_loss=0.1092, pruned_loss=0.02591, audio_tagging_loss=0.01096, over 3037493.34 frames. ], batch size: 58, lr: 8.80e-03, grad_scale: 16.0 2023-11-19 05:13:57,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=582080.0, ans=0.2 2023-11-19 05:13:58,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=582080.0, ans=0.0 2023-11-19 05:14:51,793 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3200, loss[loss=0.0868, simple_loss=0.1044, pruned_loss=0.02406, audio_tagging_loss=0.01056, over 16094.00 frames. ], tot_loss[loss=0.09106, simple_loss=0.1085, pruned_loss=0.02568, audio_tagging_loss=0.01115, over 3033070.89 frames. ], batch size: 59, lr: 8.80e-03, grad_scale: 32.0 2023-11-19 05:15:15,768 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.369e+01 9.165e+01 1.003e+02 1.359e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-19 05:15:26,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=582613.3333333334, ans=0.2 2023-11-19 05:15:40,442 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.38 vs. limit=10.0 2023-11-19 05:15:42,145 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=582680.0, ans=0.2 2023-11-19 05:15:47,233 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3250, loss[loss=0.0844, simple_loss=0.1066, pruned_loss=0.02056, audio_tagging_loss=0.01056, over 15484.00 frames. ], tot_loss[loss=0.09113, simple_loss=0.1085, pruned_loss=0.02566, audio_tagging_loss=0.0112, over 3039434.81 frames. ], batch size: 60, lr: 8.80e-03, grad_scale: 32.0 2023-11-19 05:15:54,801 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=582746.6666666666, ans=0.0 2023-11-19 05:16:03,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=582813.3333333334, ans=0.0 2023-11-19 05:16:20,919 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.58 vs. limit=15.0 2023-11-19 05:16:28,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=582946.6666666666, ans=0.125 2023-11-19 05:16:30,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=582946.6666666666, ans=0.125 2023-11-19 05:16:42,947 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3300, loss[loss=0.09629, simple_loss=0.1135, pruned_loss=0.03025, audio_tagging_loss=0.009303, over 14280.00 frames. ], tot_loss[loss=0.09112, simple_loss=0.1087, pruned_loss=0.02564, audio_tagging_loss=0.01113, over 3033832.41 frames. ], batch size: 54, lr: 8.80e-03, grad_scale: 32.0 2023-11-19 05:16:45,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=583080.0, ans=0.09899494936611666 2023-11-19 05:16:53,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=583146.6666666666, ans=0.1 2023-11-19 05:17:01,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=583146.6666666666, ans=0.125 2023-11-19 05:17:06,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=583213.3333333334, ans=0.125 2023-11-19 05:17:08,699 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.018e+01 8.571e+01 9.137e+01 1.011e+02 1.807e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 05:17:08,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=583213.3333333334, ans=0.1 2023-11-19 05:17:08,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=583213.3333333334, ans=0.1 2023-11-19 05:17:09,200 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.70 vs. limit=15.0 2023-11-19 05:17:16,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=583280.0, ans=0.125 2023-11-19 05:17:17,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=583280.0, ans=0.125 2023-11-19 05:17:22,629 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.52 vs. limit=15.0 2023-11-19 05:17:38,943 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3350, loss[loss=0.1067, simple_loss=0.1309, pruned_loss=0.03065, audio_tagging_loss=0.01059, over 14510.00 frames. ], tot_loss[loss=0.09119, simple_loss=0.1091, pruned_loss=0.02565, audio_tagging_loss=0.011, over 3034640.41 frames. ], batch size: 54, lr: 8.79e-03, grad_scale: 16.0 2023-11-19 05:17:54,883 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.57 vs. limit=6.0 2023-11-19 05:18:04,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=583546.6666666666, ans=0.125 2023-11-19 05:18:17,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=583613.3333333334, ans=0.125 2023-11-19 05:18:34,563 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3400, loss[loss=0.09692, simple_loss=0.125, pruned_loss=0.02728, audio_tagging_loss=0.007148, over 15505.00 frames. ], tot_loss[loss=0.09026, simple_loss=0.1081, pruned_loss=0.02532, audio_tagging_loss=0.01091, over 3040410.67 frames. ], batch size: 57, lr: 8.79e-03, grad_scale: 16.0 2023-11-19 05:19:00,607 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.104e+01 8.466e+01 9.111e+01 1.006e+02 1.690e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-19 05:19:07,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=583946.6666666666, ans=0.0 2023-11-19 05:19:11,893 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=583946.6666666666, ans=0.125 2023-11-19 05:19:29,703 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=584080.0, ans=0.0 2023-11-19 05:19:30,493 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3450, loss[loss=0.09879, simple_loss=0.1219, pruned_loss=0.02999, audio_tagging_loss=0.00788, over 15727.00 frames. ], tot_loss[loss=0.08998, simple_loss=0.1077, pruned_loss=0.02524, audio_tagging_loss=0.01088, over 3041751.60 frames. ], batch size: 57, lr: 8.79e-03, grad_scale: 16.0 2023-11-19 05:19:35,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=584080.0, ans=0.125 2023-11-19 05:19:41,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=584146.6666666666, ans=0.125 2023-11-19 05:19:43,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=584146.6666666666, ans=0.5 2023-11-19 05:20:04,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=584280.0, ans=0.2 2023-11-19 05:20:13,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=584280.0, ans=0.125 2023-11-19 05:20:27,037 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3500, loss[loss=0.08266, simple_loss=0.09877, pruned_loss=0.02303, audio_tagging_loss=0.01025, over 14621.00 frames. ], tot_loss[loss=0.08906, simple_loss=0.1067, pruned_loss=0.02484, audio_tagging_loss=0.01088, over 3044631.94 frames. ], batch size: 56, lr: 8.79e-03, grad_scale: 16.0 2023-11-19 05:20:46,343 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.62 vs. limit=15.0 2023-11-19 05:20:52,006 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.420e+01 9.270e+01 9.989e+01 2.188e+02, threshold=1.854e+02, percent-clipped=1.0 2023-11-19 05:20:54,689 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:20:59,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=584613.3333333334, ans=0.2 2023-11-19 05:21:20,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=584680.0, ans=0.2 2023-11-19 05:21:23,081 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3550, loss[loss=0.08677, simple_loss=0.09671, pruned_loss=0.02805, audio_tagging_loss=0.01036, over 14719.00 frames. ], tot_loss[loss=0.08834, simple_loss=0.1057, pruned_loss=0.02459, audio_tagging_loss=0.01092, over 3047902.75 frames. ], batch size: 57, lr: 8.78e-03, grad_scale: 16.0 2023-11-19 05:21:24,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=584746.6666666666, ans=0.1 2023-11-19 05:21:25,483 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:21:38,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=584813.3333333334, ans=0.125 2023-11-19 05:21:51,483 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=584880.0, ans=0.0 2023-11-19 05:22:19,040 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3600, loss[loss=0.06283, simple_loss=0.07758, pruned_loss=0.01347, audio_tagging_loss=0.01056, over 14643.00 frames. ], tot_loss[loss=0.08802, simple_loss=0.1053, pruned_loss=0.02453, audio_tagging_loss=0.01086, over 3053805.80 frames. ], batch size: 56, lr: 8.78e-03, grad_scale: 32.0 2023-11-19 05:22:26,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=585080.0, ans=0.125 2023-11-19 05:22:38,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=585146.6666666666, ans=0.2 2023-11-19 05:22:44,449 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.417e+01 9.022e+01 1.001e+02 1.493e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 05:22:59,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=585280.0, ans=0.125 2023-11-19 05:23:15,472 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3650, loss[loss=0.08403, simple_loss=0.1041, pruned_loss=0.02102, audio_tagging_loss=0.01096, over 15056.00 frames. ], tot_loss[loss=0.08895, simple_loss=0.1067, pruned_loss=0.02489, audio_tagging_loss=0.01071, over 3049654.65 frames. ], batch size: 57, lr: 8.78e-03, grad_scale: 32.0 2023-11-19 05:23:40,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=585546.6666666666, ans=0.125 2023-11-19 05:24:08,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=585680.0, ans=0.1 2023-11-19 05:24:10,706 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3700, loss[loss=0.07133, simple_loss=0.09042, pruned_loss=0.01584, audio_tagging_loss=0.01028, over 15410.00 frames. ], tot_loss[loss=0.0891, simple_loss=0.107, pruned_loss=0.02494, audio_tagging_loss=0.01064, over 3052063.70 frames. ], batch size: 57, lr: 8.78e-03, grad_scale: 32.0 2023-11-19 05:24:12,554 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.97 vs. limit=15.0 2023-11-19 05:24:13,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=585746.6666666666, ans=0.0 2023-11-19 05:24:14,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=585746.6666666666, ans=0.125 2023-11-19 05:24:15,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=585746.6666666666, ans=0.5 2023-11-19 05:24:37,262 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.128e+01 8.524e+01 9.099e+01 9.813e+01 1.282e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-19 05:24:55,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=586013.3333333334, ans=0.125 2023-11-19 05:25:06,477 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3750, loss[loss=0.09703, simple_loss=0.1192, pruned_loss=0.02946, audio_tagging_loss=0.007942, over 14452.00 frames. ], tot_loss[loss=0.08965, simple_loss=0.1078, pruned_loss=0.02501, audio_tagging_loss=0.01074, over 3050460.79 frames. ], batch size: 54, lr: 8.77e-03, grad_scale: 32.0 2023-11-19 05:25:07,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=586080.0, ans=0.09899494936611666 2023-11-19 05:25:09,745 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.96 vs. limit=22.5 2023-11-19 05:25:12,489 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:25:21,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=586146.6666666666, ans=0.125 2023-11-19 05:25:27,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=586146.6666666666, ans=0.1 2023-11-19 05:25:44,616 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:26:02,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=586413.3333333334, ans=0.0 2023-11-19 05:26:03,045 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3800, loss[loss=0.09366, simple_loss=0.1071, pruned_loss=0.02737, audio_tagging_loss=0.01274, over 15141.00 frames. ], tot_loss[loss=0.09003, simple_loss=0.1081, pruned_loss=0.02515, audio_tagging_loss=0.01084, over 3051043.80 frames. ], batch size: 57, lr: 8.77e-03, grad_scale: 32.0 2023-11-19 05:26:11,632 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:26:16,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=586480.0, ans=0.125 2023-11-19 05:26:19,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=586480.0, ans=0.0 2023-11-19 05:26:27,759 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.211e+01 8.964e+01 1.013e+02 1.295e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 05:26:52,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=586680.0, ans=0.125 2023-11-19 05:26:52,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=586680.0, ans=0.125 2023-11-19 05:26:53,846 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.56 vs. limit=15.0 2023-11-19 05:27:00,380 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3850, loss[loss=0.1073, simple_loss=0.141, pruned_loss=0.02805, audio_tagging_loss=0.008745, over 16287.00 frames. ], tot_loss[loss=0.09087, simple_loss=0.1088, pruned_loss=0.02557, audio_tagging_loss=0.0109, over 3045333.22 frames. ], batch size: 58, lr: 8.77e-03, grad_scale: 32.0 2023-11-19 05:27:00,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=586746.6666666666, ans=0.125 2023-11-19 05:27:45,375 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:27:46,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=587013.3333333334, ans=0.125 2023-11-19 05:27:48,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=587013.3333333334, ans=0.125 2023-11-19 05:27:56,350 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3900, loss[loss=0.09064, simple_loss=0.1169, pruned_loss=0.02493, audio_tagging_loss=0.007247, over 15744.00 frames. ], tot_loss[loss=0.09082, simple_loss=0.1088, pruned_loss=0.02553, audio_tagging_loss=0.01092, over 3045328.17 frames. ], batch size: 58, lr: 8.77e-03, grad_scale: 32.0 2023-11-19 05:28:04,976 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:28:22,402 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.241e+01 8.532e+01 9.315e+01 9.987e+01 1.482e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 05:28:34,513 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=12.0 2023-11-19 05:28:40,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=587346.6666666666, ans=0.1 2023-11-19 05:28:52,869 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 3950, loss[loss=0.06826, simple_loss=0.07967, pruned_loss=0.0151, audio_tagging_loss=0.01332, over 14654.00 frames. ], tot_loss[loss=0.09092, simple_loss=0.1086, pruned_loss=0.02551, audio_tagging_loss=0.01112, over 3040847.28 frames. ], batch size: 56, lr: 8.76e-03, grad_scale: 32.0 2023-11-19 05:28:53,360 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.97 vs. limit=15.0 2023-11-19 05:29:01,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=587413.3333333334, ans=0.0 2023-11-19 05:29:08,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=587480.0, ans=0.125 2023-11-19 05:29:38,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=587680.0, ans=0.2 2023-11-19 05:29:42,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=587680.0, ans=0.125 2023-11-19 05:29:44,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=587680.0, ans=0.1 2023-11-19 05:29:45,579 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=587680.0, ans=0.125 2023-11-19 05:29:48,577 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4000, loss[loss=0.08365, simple_loss=0.08877, pruned_loss=0.02275, audio_tagging_loss=0.01651, over 15749.00 frames. ], tot_loss[loss=0.09138, simple_loss=0.109, pruned_loss=0.02572, audio_tagging_loss=0.01115, over 3043733.12 frames. ], batch size: 59, lr: 8.76e-03, grad_scale: 32.0 2023-11-19 05:29:49,156 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.42 vs. limit=15.0 2023-11-19 05:29:55,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=587746.6666666666, ans=0.125 2023-11-19 05:30:09,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=587813.3333333334, ans=0.125 2023-11-19 05:30:14,660 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.556e+01 9.306e+01 1.024e+02 1.346e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-19 05:30:17,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=587880.0, ans=0.125 2023-11-19 05:30:44,083 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4050, loss[loss=0.1034, simple_loss=0.1294, pruned_loss=0.02888, audio_tagging_loss=0.009795, over 16052.00 frames. ], tot_loss[loss=0.09192, simple_loss=0.1098, pruned_loss=0.02594, audio_tagging_loss=0.01109, over 3038624.43 frames. ], batch size: 58, lr: 8.76e-03, grad_scale: 32.0 2023-11-19 05:30:46,215 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:30:47,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=588080.0, ans=0.0 2023-11-19 05:31:11,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=588213.3333333334, ans=15.0 2023-11-19 05:31:17,240 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.22 vs. limit=15.0 2023-11-19 05:31:21,501 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.52 vs. limit=15.0 2023-11-19 05:31:24,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=588280.0, ans=0.07 2023-11-19 05:31:26,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=588280.0, ans=0.025 2023-11-19 05:31:40,826 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4100, loss[loss=0.09176, simple_loss=0.1227, pruned_loss=0.02339, audio_tagging_loss=0.007028, over 15702.00 frames. ], tot_loss[loss=0.09093, simple_loss=0.1088, pruned_loss=0.02547, audio_tagging_loss=0.01106, over 3040534.51 frames. ], batch size: 56, lr: 8.76e-03, grad_scale: 32.0 2023-11-19 05:31:41,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=588413.3333333334, ans=0.2 2023-11-19 05:31:54,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=588480.0, ans=0.2 2023-11-19 05:32:01,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=588546.6666666666, ans=0.1 2023-11-19 05:32:05,811 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.777e+01 9.323e+01 1.010e+02 1.338e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 05:32:14,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=588613.3333333334, ans=0.015 2023-11-19 05:32:27,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=588680.0, ans=0.125 2023-11-19 05:32:36,884 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4150, loss[loss=0.0614, simple_loss=0.0807, pruned_loss=0.01165, audio_tagging_loss=0.009399, over 15365.00 frames. ], tot_loss[loss=0.09035, simple_loss=0.1081, pruned_loss=0.02533, audio_tagging_loss=0.01096, over 3037161.71 frames. ], batch size: 58, lr: 8.75e-03, grad_scale: 32.0 2023-11-19 05:32:38,707 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.23 vs. limit=6.0 2023-11-19 05:32:47,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=588813.3333333334, ans=0.125 2023-11-19 05:33:02,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=588880.0, ans=0.125 2023-11-19 05:33:05,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=588880.0, ans=0.2 2023-11-19 05:33:06,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=588880.0, ans=0.0 2023-11-19 05:33:16,881 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:33:29,804 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=589013.3333333334, ans=0.1 2023-11-19 05:33:31,686 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4200, loss[loss=0.05361, simple_loss=0.06033, pruned_loss=0.01029, audio_tagging_loss=0.01316, over 16515.00 frames. ], tot_loss[loss=0.09043, simple_loss=0.1087, pruned_loss=0.02533, audio_tagging_loss=0.01076, over 3040921.91 frames. ], batch size: 64, lr: 8.75e-03, grad_scale: 32.0 2023-11-19 05:33:37,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=589080.0, ans=0.0 2023-11-19 05:33:54,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=589213.3333333334, ans=0.125 2023-11-19 05:33:54,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=589213.3333333334, ans=0.125 2023-11-19 05:33:58,310 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.731e+01 8.836e+01 9.609e+01 1.061e+02 1.544e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-19 05:34:10,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=589280.0, ans=0.0 2023-11-19 05:34:12,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=589280.0, ans=0.0 2023-11-19 05:34:28,201 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4250, loss[loss=0.09592, simple_loss=0.1165, pruned_loss=0.02944, audio_tagging_loss=0.008219, over 14908.00 frames. ], tot_loss[loss=0.09113, simple_loss=0.1097, pruned_loss=0.02561, audio_tagging_loss=0.0107, over 3035851.46 frames. ], batch size: 56, lr: 8.75e-03, grad_scale: 32.0 2023-11-19 05:34:39,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=589480.0, ans=0.07 2023-11-19 05:34:40,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=589480.0, ans=0.125 2023-11-19 05:34:48,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=589480.0, ans=0.2 2023-11-19 05:34:56,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=589546.6666666666, ans=0.0 2023-11-19 05:34:56,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=589546.6666666666, ans=0.0 2023-11-19 05:34:57,388 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2023-11-19 05:35:07,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=589613.3333333334, ans=0.2 2023-11-19 05:35:07,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=589613.3333333334, ans=0.0 2023-11-19 05:35:24,460 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4300, loss[loss=0.1041, simple_loss=0.1277, pruned_loss=0.03287, audio_tagging_loss=0.007375, over 16027.00 frames. ], tot_loss[loss=0.09131, simple_loss=0.11, pruned_loss=0.02574, audio_tagging_loss=0.01056, over 3038643.75 frames. ], batch size: 58, lr: 8.75e-03, grad_scale: 32.0 2023-11-19 05:35:40,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=589813.3333333334, ans=0.07 2023-11-19 05:35:40,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=589813.3333333334, ans=0.0 2023-11-19 05:35:49,994 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 8.689e+01 9.452e+01 1.019e+02 1.393e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-19 05:36:18,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=590080.0, ans=0.125 2023-11-19 05:36:18,937 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4350, loss[loss=0.09249, simple_loss=0.09994, pruned_loss=0.03007, audio_tagging_loss=0.01245, over 13851.00 frames. ], tot_loss[loss=0.09136, simple_loss=0.1096, pruned_loss=0.02598, audio_tagging_loss=0.01058, over 3035309.98 frames. ], batch size: 53, lr: 8.74e-03, grad_scale: 16.0 2023-11-19 05:36:42,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=590213.3333333334, ans=6.0 2023-11-19 05:36:46,449 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=590213.3333333334, ans=0.1 2023-11-19 05:36:59,124 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=590280.0, ans=0.125 2023-11-19 05:37:01,455 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.99 vs. limit=15.0 2023-11-19 05:37:14,720 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4400, loss[loss=0.08824, simple_loss=0.1033, pruned_loss=0.02745, audio_tagging_loss=0.009134, over 14398.00 frames. ], tot_loss[loss=0.09147, simple_loss=0.11, pruned_loss=0.026, audio_tagging_loss=0.01049, over 3033918.02 frames. ], batch size: 57, lr: 8.74e-03, grad_scale: 32.0 2023-11-19 05:37:33,478 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.71 vs. limit=5.0 2023-11-19 05:37:33,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=590480.0, ans=0.125 2023-11-19 05:37:39,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=590546.6666666666, ans=0.07 2023-11-19 05:37:41,048 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.028e+01 8.473e+01 9.223e+01 1.006e+02 1.257e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 05:37:45,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=590546.6666666666, ans=0.125 2023-11-19 05:37:55,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=590613.3333333334, ans=0.05 2023-11-19 05:38:05,369 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=590680.0, ans=0.1 2023-11-19 05:38:05,789 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=15.0 2023-11-19 05:38:11,086 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4450, loss[loss=0.09562, simple_loss=0.1204, pruned_loss=0.0248, audio_tagging_loss=0.0106, over 15919.00 frames. ], tot_loss[loss=0.09156, simple_loss=0.1105, pruned_loss=0.02582, audio_tagging_loss=0.01049, over 3037369.77 frames. ], batch size: 57, lr: 8.74e-03, grad_scale: 32.0 2023-11-19 05:38:23,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=590813.3333333334, ans=0.04949747468305833 2023-11-19 05:38:41,164 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.71 vs. limit=15.0 2023-11-19 05:38:54,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=591013.3333333334, ans=0.0 2023-11-19 05:39:05,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=591080.0, ans=0.1 2023-11-19 05:39:06,188 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4500, loss[loss=0.08462, simple_loss=0.09368, pruned_loss=0.02557, audio_tagging_loss=0.01221, over 14550.00 frames. ], tot_loss[loss=0.09162, simple_loss=0.1103, pruned_loss=0.0259, audio_tagging_loss=0.01055, over 3044142.19 frames. ], batch size: 56, lr: 8.74e-03, grad_scale: 32.0 2023-11-19 05:39:06,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=591080.0, ans=0.07 2023-11-19 05:39:25,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=591146.6666666666, ans=0.2 2023-11-19 05:39:33,306 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.438e+01 9.340e+01 1.045e+02 1.489e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 05:39:33,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=591213.3333333334, ans=0.04949747468305833 2023-11-19 05:39:55,890 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.98 vs. limit=15.0 2023-11-19 05:39:56,687 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.01 vs. limit=6.0 2023-11-19 05:40:01,049 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=591413.3333333334, ans=0.125 2023-11-19 05:40:02,479 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4550, loss[loss=0.1129, simple_loss=0.1315, pruned_loss=0.03663, audio_tagging_loss=0.01048, over 15224.00 frames. ], tot_loss[loss=0.0922, simple_loss=0.1112, pruned_loss=0.02607, audio_tagging_loss=0.01051, over 3039759.46 frames. ], batch size: 56, lr: 8.73e-03, grad_scale: 32.0 2023-11-19 05:40:02,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=591413.3333333334, ans=0.125 2023-11-19 05:40:07,321 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.24 vs. limit=10.0 2023-11-19 05:40:22,815 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=591480.0, ans=0.1 2023-11-19 05:40:44,872 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:40:47,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=591680.0, ans=0.1 2023-11-19 05:40:58,035 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4600, loss[loss=0.0924, simple_loss=0.1171, pruned_loss=0.02262, audio_tagging_loss=0.01123, over 15459.00 frames. ], tot_loss[loss=0.09184, simple_loss=0.1106, pruned_loss=0.02599, audio_tagging_loss=0.01056, over 3040629.28 frames. ], batch size: 57, lr: 8.73e-03, grad_scale: 16.0 2023-11-19 05:41:04,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=591746.6666666666, ans=0.125 2023-11-19 05:41:12,933 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.73 vs. limit=15.0 2023-11-19 05:41:16,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=591813.3333333334, ans=0.125 2023-11-19 05:41:25,351 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.649e+01 8.473e+01 9.207e+01 1.048e+02 1.617e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 05:41:37,409 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.41 vs. limit=15.0 2023-11-19 05:41:46,718 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.63 vs. limit=15.0 2023-11-19 05:41:48,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=592013.3333333334, ans=0.125 2023-11-19 05:41:53,858 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4650, loss[loss=0.07581, simple_loss=0.08833, pruned_loss=0.01916, audio_tagging_loss=0.01249, over 15025.00 frames. ], tot_loss[loss=0.09187, simple_loss=0.1104, pruned_loss=0.02606, audio_tagging_loss=0.01062, over 3044309.04 frames. ], batch size: 55, lr: 8.73e-03, grad_scale: 16.0 2023-11-19 05:41:56,795 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.65 vs. limit=15.0 2023-11-19 05:42:15,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=592213.3333333334, ans=0.1 2023-11-19 05:42:17,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=592213.3333333334, ans=0.125 2023-11-19 05:42:35,676 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.62 vs. limit=22.5 2023-11-19 05:42:42,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=592346.6666666666, ans=0.0 2023-11-19 05:42:43,946 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=592346.6666666666, ans=0.0 2023-11-19 05:42:46,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=592346.6666666666, ans=0.1 2023-11-19 05:42:49,077 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4700, loss[loss=0.06902, simple_loss=0.08056, pruned_loss=0.01494, audio_tagging_loss=0.0138, over 14483.00 frames. ], tot_loss[loss=0.09088, simple_loss=0.1088, pruned_loss=0.02565, audio_tagging_loss=0.01085, over 3045912.47 frames. ], batch size: 55, lr: 8.73e-03, grad_scale: 16.0 2023-11-19 05:43:17,543 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.009e+01 8.535e+01 9.340e+01 1.049e+02 1.659e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 05:43:29,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=592613.3333333334, ans=0.0 2023-11-19 05:43:37,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=592680.0, ans=0.0 2023-11-19 05:43:40,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=592680.0, ans=0.0 2023-11-19 05:43:40,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=592680.0, ans=0.125 2023-11-19 05:43:42,404 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2023-11-19 05:43:45,575 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4750, loss[loss=0.08667, simple_loss=0.1052, pruned_loss=0.02102, audio_tagging_loss=0.01304, over 15097.00 frames. ], tot_loss[loss=0.09024, simple_loss=0.1076, pruned_loss=0.02543, audio_tagging_loss=0.01103, over 3043591.72 frames. ], batch size: 57, lr: 8.72e-03, grad_scale: 16.0 2023-11-19 05:43:54,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=592746.6666666666, ans=0.0 2023-11-19 05:44:14,622 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.77 vs. limit=10.0 2023-11-19 05:44:41,342 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4800, loss[loss=0.08409, simple_loss=0.1004, pruned_loss=0.02309, audio_tagging_loss=0.01078, over 15867.00 frames. ], tot_loss[loss=0.09113, simple_loss=0.1087, pruned_loss=0.02578, audio_tagging_loss=0.01098, over 3042137.62 frames. ], batch size: 58, lr: 8.72e-03, grad_scale: 32.0 2023-11-19 05:45:03,568 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.52 vs. limit=22.5 2023-11-19 05:45:05,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=593213.3333333334, ans=0.1 2023-11-19 05:45:08,997 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.537e+01 9.216e+01 9.797e+01 1.751e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-19 05:45:29,637 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.92 vs. limit=15.0 2023-11-19 05:45:36,329 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4850, loss[loss=0.1033, simple_loss=0.1283, pruned_loss=0.02887, audio_tagging_loss=0.01024, over 14164.00 frames. ], tot_loss[loss=0.09188, simple_loss=0.1097, pruned_loss=0.02599, audio_tagging_loss=0.01102, over 3042965.40 frames. ], batch size: 53, lr: 8.72e-03, grad_scale: 32.0 2023-11-19 05:46:19,413 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.90 vs. limit=15.0 2023-11-19 05:46:19,508 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.72 vs. limit=15.0 2023-11-19 05:46:25,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=593680.0, ans=0.125 2023-11-19 05:46:31,474 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4900, loss[loss=0.09885, simple_loss=0.1172, pruned_loss=0.03122, audio_tagging_loss=0.009044, over 14853.00 frames. ], tot_loss[loss=0.09192, simple_loss=0.1102, pruned_loss=0.02598, audio_tagging_loss=0.01085, over 3046140.41 frames. ], batch size: 57, lr: 8.72e-03, grad_scale: 32.0 2023-11-19 05:46:31,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=593746.6666666666, ans=0.2 2023-11-19 05:46:33,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=593746.6666666666, ans=0.125 2023-11-19 05:46:35,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=593746.6666666666, ans=10.0 2023-11-19 05:46:41,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=593813.3333333334, ans=0.5 2023-11-19 05:46:52,897 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.62 vs. limit=15.0 2023-11-19 05:46:55,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=593880.0, ans=0.1 2023-11-19 05:46:56,088 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.00 vs. limit=10.0 2023-11-19 05:46:58,716 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.397e+01 9.302e+01 1.001e+02 1.305e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 05:47:11,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=593946.6666666666, ans=0.125 2023-11-19 05:47:26,551 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 4950, loss[loss=0.09937, simple_loss=0.111, pruned_loss=0.03388, audio_tagging_loss=0.009982, over 14652.00 frames. ], tot_loss[loss=0.092, simple_loss=0.1105, pruned_loss=0.02615, audio_tagging_loss=0.01059, over 3045929.33 frames. ], batch size: 57, lr: 8.71e-03, grad_scale: 32.0 2023-11-19 05:47:31,973 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=594080.0, ans=0.125 2023-11-19 05:48:22,031 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5000, loss[loss=0.09609, simple_loss=0.1119, pruned_loss=0.02688, audio_tagging_loss=0.01325, over 14949.00 frames. ], tot_loss[loss=0.09193, simple_loss=0.1109, pruned_loss=0.0261, audio_tagging_loss=0.01039, over 3045787.90 frames. ], batch size: 57, lr: 8.71e-03, grad_scale: 16.0 2023-11-19 05:48:23,846 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.74 vs. limit=15.0 2023-11-19 05:48:27,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=594413.3333333334, ans=0.125 2023-11-19 05:48:31,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=594413.3333333334, ans=0.0 2023-11-19 05:48:44,859 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=594546.6666666666, ans=0.125 2023-11-19 05:48:51,117 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.368e+01 8.482e+01 9.400e+01 1.026e+02 1.313e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-19 05:49:18,522 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5050, loss[loss=0.06543, simple_loss=0.07884, pruned_loss=0.01366, audio_tagging_loss=0.01235, over 13787.00 frames. ], tot_loss[loss=0.09114, simple_loss=0.1102, pruned_loss=0.02572, audio_tagging_loss=0.01031, over 3048914.88 frames. ], batch size: 53, lr: 8.71e-03, grad_scale: 16.0 2023-11-19 05:49:24,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=594746.6666666666, ans=0.125 2023-11-19 05:49:28,082 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.79 vs. limit=15.0 2023-11-19 05:49:35,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=594813.3333333334, ans=0.0 2023-11-19 05:49:37,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=594813.3333333334, ans=0.125 2023-11-19 05:49:46,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=594880.0, ans=0.125 2023-11-19 05:50:04,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=595013.3333333334, ans=0.0 2023-11-19 05:50:10,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=595013.3333333334, ans=0.0 2023-11-19 05:50:14,332 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5100, loss[loss=0.08057, simple_loss=0.09805, pruned_loss=0.02165, audio_tagging_loss=0.009891, over 14902.00 frames. ], tot_loss[loss=0.09101, simple_loss=0.1101, pruned_loss=0.02569, audio_tagging_loss=0.01027, over 3052096.86 frames. ], batch size: 56, lr: 8.71e-03, grad_scale: 16.0 2023-11-19 05:50:21,383 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.64 vs. limit=22.5 2023-11-19 05:50:22,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=595080.0, ans=0.125 2023-11-19 05:50:25,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=595146.6666666666, ans=0.125 2023-11-19 05:50:26,360 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=595146.6666666666, ans=0.0 2023-11-19 05:50:35,307 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=595146.6666666666, ans=0.2 2023-11-19 05:50:41,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=595213.3333333334, ans=0.0 2023-11-19 05:50:43,362 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.142e+01 8.421e+01 9.048e+01 1.022e+02 1.450e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 05:50:45,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=595213.3333333334, ans=0.0 2023-11-19 05:50:46,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=595213.3333333334, ans=0.1 2023-11-19 05:50:57,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=595346.6666666666, ans=0.1 2023-11-19 05:51:09,263 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5150, loss[loss=0.1032, simple_loss=0.129, pruned_loss=0.03029, audio_tagging_loss=0.008419, over 15069.00 frames. ], tot_loss[loss=0.09039, simple_loss=0.1092, pruned_loss=0.02538, audio_tagging_loss=0.01042, over 3046310.82 frames. ], batch size: 55, lr: 8.71e-03, grad_scale: 16.0 2023-11-19 05:51:15,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=595413.3333333334, ans=0.025 2023-11-19 05:51:21,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=595480.0, ans=0.1 2023-11-19 05:51:55,137 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.54 vs. limit=10.0 2023-11-19 05:52:05,712 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5200, loss[loss=0.1138, simple_loss=0.144, pruned_loss=0.0342, audio_tagging_loss=0.007593, over 15933.00 frames. ], tot_loss[loss=0.09144, simple_loss=0.1104, pruned_loss=0.02584, audio_tagging_loss=0.01041, over 3045044.03 frames. ], batch size: 54, lr: 8.70e-03, grad_scale: 32.0 2023-11-19 05:52:08,659 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.58 vs. limit=5.0 2023-11-19 05:52:13,769 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:52:21,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=595813.3333333334, ans=0.1 2023-11-19 05:52:33,595 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.530e+01 8.322e+01 8.934e+01 9.832e+01 1.211e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-19 05:53:01,445 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5250, loss[loss=0.0917, simple_loss=0.1133, pruned_loss=0.02311, audio_tagging_loss=0.01193, over 15680.00 frames. ], tot_loss[loss=0.09145, simple_loss=0.1104, pruned_loss=0.02581, audio_tagging_loss=0.01045, over 3056302.15 frames. ], batch size: 60, lr: 8.70e-03, grad_scale: 32.0 2023-11-19 05:53:09,040 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:53:23,560 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-11-19 05:53:54,980 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.16 vs. limit=15.0 2023-11-19 05:53:56,337 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5300, loss[loss=0.1197, simple_loss=0.1541, pruned_loss=0.03331, audio_tagging_loss=0.009351, over 14520.00 frames. ], tot_loss[loss=0.09207, simple_loss=0.1113, pruned_loss=0.02604, audio_tagging_loss=0.01041, over 3060262.89 frames. ], batch size: 52, lr: 8.70e-03, grad_scale: 16.0 2023-11-19 05:53:57,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=596413.3333333334, ans=0.2 2023-11-19 05:54:08,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=596480.0, ans=0.125 2023-11-19 05:54:19,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=596546.6666666666, ans=0.2 2023-11-19 05:54:26,891 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.841e+01 9.894e+01 1.112e+02 1.416e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-19 05:54:36,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=596613.3333333334, ans=0.125 2023-11-19 05:54:52,765 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5350, loss[loss=0.07906, simple_loss=0.09291, pruned_loss=0.02151, audio_tagging_loss=0.01109, over 14269.00 frames. ], tot_loss[loss=0.09202, simple_loss=0.111, pruned_loss=0.02608, audio_tagging_loss=0.01046, over 3050840.47 frames. ], batch size: 55, lr: 8.70e-03, grad_scale: 16.0 2023-11-19 05:54:55,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=596746.6666666666, ans=0.035 2023-11-19 05:55:11,898 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.58 vs. limit=6.0 2023-11-19 05:55:20,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=596880.0, ans=0.0 2023-11-19 05:55:23,180 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=596880.0, ans=0.125 2023-11-19 05:55:29,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=596946.6666666666, ans=0.125 2023-11-19 05:55:47,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=597080.0, ans=0.1 2023-11-19 05:55:48,438 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5400, loss[loss=0.07038, simple_loss=0.09003, pruned_loss=0.01717, audio_tagging_loss=0.008196, over 15364.00 frames. ], tot_loss[loss=0.09148, simple_loss=0.1104, pruned_loss=0.0259, audio_tagging_loss=0.0104, over 3050290.80 frames. ], batch size: 59, lr: 8.69e-03, grad_scale: 16.0 2023-11-19 05:55:55,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=597080.0, ans=0.125 2023-11-19 05:56:02,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=597146.6666666666, ans=0.125 2023-11-19 05:56:17,856 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.569e+01 9.325e+01 1.031e+02 1.430e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 05:56:23,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=597280.0, ans=0.0 2023-11-19 05:56:31,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=597280.0, ans=0.125 2023-11-19 05:56:37,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=597346.6666666666, ans=0.0 2023-11-19 05:56:43,625 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5450, loss[loss=0.06396, simple_loss=0.07665, pruned_loss=0.01373, audio_tagging_loss=0.01191, over 15035.00 frames. ], tot_loss[loss=0.09048, simple_loss=0.1087, pruned_loss=0.02553, audio_tagging_loss=0.01059, over 3046434.96 frames. ], batch size: 59, lr: 8.69e-03, grad_scale: 16.0 2023-11-19 05:56:43,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=597413.3333333334, ans=0.0 2023-11-19 05:57:06,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=597546.6666666666, ans=0.125 2023-11-19 05:57:07,838 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.60 vs. limit=15.0 2023-11-19 05:57:08,424 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:57:35,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=597680.0, ans=0.1 2023-11-19 05:57:39,860 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5500, loss[loss=0.08676, simple_loss=0.1028, pruned_loss=0.02179, audio_tagging_loss=0.01357, over 16391.00 frames. ], tot_loss[loss=0.09124, simple_loss=0.1098, pruned_loss=0.02574, audio_tagging_loss=0.0106, over 3054647.98 frames. ], batch size: 61, lr: 8.69e-03, grad_scale: 16.0 2023-11-19 05:58:01,686 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2023-11-19 05:58:03,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=597880.0, ans=0.1 2023-11-19 05:58:04,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=597880.0, ans=0.07 2023-11-19 05:58:09,524 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.477e+01 9.706e+01 1.076e+02 1.326e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-19 05:58:35,506 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5550, loss[loss=0.09512, simple_loss=0.1121, pruned_loss=0.02945, audio_tagging_loss=0.009629, over 15358.00 frames. ], tot_loss[loss=0.09106, simple_loss=0.1093, pruned_loss=0.02569, audio_tagging_loss=0.01072, over 3048116.93 frames. ], batch size: 58, lr: 8.69e-03, grad_scale: 16.0 2023-11-19 05:58:54,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=598146.6666666666, ans=0.125 2023-11-19 05:58:55,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=598213.3333333334, ans=0.125 2023-11-19 05:59:14,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=598280.0, ans=0.1 2023-11-19 05:59:22,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=598346.6666666666, ans=0.0 2023-11-19 05:59:30,944 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5600, loss[loss=0.09437, simple_loss=0.1215, pruned_loss=0.02381, audio_tagging_loss=0.009795, over 15629.00 frames. ], tot_loss[loss=0.09091, simple_loss=0.1091, pruned_loss=0.02555, audio_tagging_loss=0.01083, over 3050139.34 frames. ], batch size: 56, lr: 8.68e-03, grad_scale: 16.0 2023-11-19 05:59:52,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=598546.6666666666, ans=0.125 2023-11-19 05:59:57,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=598546.6666666666, ans=0.07 2023-11-19 06:00:02,416 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.036e+01 8.552e+01 9.221e+01 1.020e+02 1.317e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 06:00:05,795 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=598613.3333333334, ans=0.125 2023-11-19 06:00:10,758 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:00:27,012 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5650, loss[loss=0.07855, simple_loss=0.1027, pruned_loss=0.01833, audio_tagging_loss=0.008875, over 15029.00 frames. ], tot_loss[loss=0.09074, simple_loss=0.1087, pruned_loss=0.02551, audio_tagging_loss=0.01089, over 3043132.60 frames. ], batch size: 57, lr: 8.68e-03, grad_scale: 16.0 2023-11-19 06:00:27,239 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=598746.6666666666, ans=0.0 2023-11-19 06:00:33,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=598746.6666666666, ans=0.125 2023-11-19 06:00:44,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=598813.3333333334, ans=0.0 2023-11-19 06:01:01,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=598946.6666666666, ans=0.125 2023-11-19 06:01:22,500 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5700, loss[loss=0.08103, simple_loss=0.09494, pruned_loss=0.02532, audio_tagging_loss=0.008241, over 15415.00 frames. ], tot_loss[loss=0.09034, simple_loss=0.1078, pruned_loss=0.02541, audio_tagging_loss=0.01102, over 3038319.48 frames. ], batch size: 59, lr: 8.68e-03, grad_scale: 16.0 2023-11-19 06:01:34,710 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=599146.6666666666, ans=0.125 2023-11-19 06:01:46,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=599213.3333333334, ans=0.1 2023-11-19 06:01:50,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=599213.3333333334, ans=0.0 2023-11-19 06:01:53,408 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.633e+01 8.957e+01 9.901e+01 1.097e+02 1.583e+02, threshold=1.980e+02, percent-clipped=0.0 2023-11-19 06:01:59,234 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.11 vs. limit=15.0 2023-11-19 06:02:07,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=599346.6666666666, ans=0.125 2023-11-19 06:02:17,387 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.41 vs. limit=12.0 2023-11-19 06:02:17,843 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5750, loss[loss=0.1065, simple_loss=0.1323, pruned_loss=0.03168, audio_tagging_loss=0.008636, over 15080.00 frames. ], tot_loss[loss=0.09023, simple_loss=0.1079, pruned_loss=0.02539, audio_tagging_loss=0.01086, over 3044045.77 frames. ], batch size: 56, lr: 8.68e-03, grad_scale: 16.0 2023-11-19 06:02:25,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=599413.3333333334, ans=0.0 2023-11-19 06:02:27,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=599480.0, ans=0.0 2023-11-19 06:02:33,933 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=599480.0, ans=0.1 2023-11-19 06:02:55,289 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.58 vs. limit=6.0 2023-11-19 06:03:06,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=599680.0, ans=0.125 2023-11-19 06:03:13,149 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5800, loss[loss=0.09097, simple_loss=0.119, pruned_loss=0.02489, audio_tagging_loss=0.006599, over 15621.00 frames. ], tot_loss[loss=0.09072, simple_loss=0.1087, pruned_loss=0.02574, audio_tagging_loss=0.01061, over 3041255.76 frames. ], batch size: 56, lr: 8.67e-03, grad_scale: 16.0 2023-11-19 06:03:20,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=599746.6666666666, ans=0.125 2023-11-19 06:03:32,744 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.23 vs. limit=15.0 2023-11-19 06:03:40,615 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.52 vs. limit=15.0 2023-11-19 06:03:44,337 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.060e+01 9.064e+01 9.956e+01 1.074e+02 1.617e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-19 06:03:57,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=600013.3333333334, ans=0.125 2023-11-19 06:04:04,380 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=600013.3333333334, ans=0.125 2023-11-19 06:04:06,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=600013.3333333334, ans=0.125 2023-11-19 06:04:09,091 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5850, loss[loss=0.1055, simple_loss=0.1316, pruned_loss=0.0305, audio_tagging_loss=0.009184, over 15565.00 frames. ], tot_loss[loss=0.09034, simple_loss=0.1087, pruned_loss=0.02541, audio_tagging_loss=0.0106, over 3049344.31 frames. ], batch size: 56, lr: 8.67e-03, grad_scale: 16.0 2023-11-19 06:04:43,994 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.48 vs. limit=12.0 2023-11-19 06:04:58,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=600346.6666666666, ans=0.0 2023-11-19 06:05:04,590 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5900, loss[loss=0.08627, simple_loss=0.1065, pruned_loss=0.02204, audio_tagging_loss=0.01096, over 14029.00 frames. ], tot_loss[loss=0.09083, simple_loss=0.1092, pruned_loss=0.02563, audio_tagging_loss=0.01061, over 3041994.58 frames. ], batch size: 55, lr: 8.67e-03, grad_scale: 16.0 2023-11-19 06:05:12,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=600413.3333333334, ans=0.1 2023-11-19 06:05:28,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=600546.6666666666, ans=0.5 2023-11-19 06:05:29,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=600546.6666666666, ans=0.125 2023-11-19 06:05:30,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=600546.6666666666, ans=0.125 2023-11-19 06:05:33,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=600546.6666666666, ans=0.125 2023-11-19 06:05:33,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=600546.6666666666, ans=0.1 2023-11-19 06:05:34,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=600546.6666666666, ans=0.125 2023-11-19 06:05:35,553 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.084e+01 8.269e+01 8.982e+01 9.905e+01 1.254e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 06:05:59,419 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 5950, loss[loss=0.08796, simple_loss=0.104, pruned_loss=0.02565, audio_tagging_loss=0.01028, over 16182.00 frames. ], tot_loss[loss=0.09057, simple_loss=0.109, pruned_loss=0.02539, audio_tagging_loss=0.01068, over 3054518.81 frames. ], batch size: 62, lr: 8.67e-03, grad_scale: 16.0 2023-11-19 06:06:11,460 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=12.0 2023-11-19 06:06:21,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=600880.0, ans=0.125 2023-11-19 06:06:25,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=600880.0, ans=0.0 2023-11-19 06:06:26,071 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.07 vs. limit=10.0 2023-11-19 06:06:35,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=600946.6666666666, ans=0.95 2023-11-19 06:06:36,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=600946.6666666666, ans=0.125 2023-11-19 06:06:40,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=600946.6666666666, ans=0.0 2023-11-19 06:06:46,257 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=601013.3333333334, ans=0.0 2023-11-19 06:06:55,537 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6000, loss[loss=0.0919, simple_loss=0.1141, pruned_loss=0.02292, audio_tagging_loss=0.01193, over 14894.00 frames. ], tot_loss[loss=0.08975, simple_loss=0.108, pruned_loss=0.02502, audio_tagging_loss=0.01076, over 3047682.35 frames. ], batch size: 57, lr: 8.66e-03, grad_scale: 32.0 2023-11-19 06:06:55,537 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 06:07:28,385 INFO [train_asr.py:1147] (2/4) Epoch 8, validation: loss=0.06748, simple_loss=0.0569, pruned_loss=0.007185, audio_tagging_loss=0.03185, over 4681554.00 frames. 2023-11-19 06:07:28,386 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 06:07:31,155 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.57 vs. limit=22.5 2023-11-19 06:07:37,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=601080.0, ans=0.125 2023-11-19 06:07:39,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=601146.6666666666, ans=0.2 2023-11-19 06:07:50,342 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=601213.3333333334, ans=0.0 2023-11-19 06:07:59,099 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.682e+01 9.253e+01 9.954e+01 1.321e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-19 06:08:02,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=601280.0, ans=0.1 2023-11-19 06:08:07,514 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:08:23,824 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6050, loss[loss=0.08776, simple_loss=0.1043, pruned_loss=0.0244, audio_tagging_loss=0.01118, over 14713.00 frames. ], tot_loss[loss=0.09032, simple_loss=0.1087, pruned_loss=0.02521, audio_tagging_loss=0.01077, over 3045183.77 frames. ], batch size: 55, lr: 8.66e-03, grad_scale: 16.0 2023-11-19 06:08:28,310 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:08:53,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=601546.6666666666, ans=0.125 2023-11-19 06:08:58,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=601613.3333333334, ans=0.125 2023-11-19 06:09:08,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=601680.0, ans=0.0 2023-11-19 06:09:18,655 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6100, loss[loss=0.084, simple_loss=0.09798, pruned_loss=0.02341, audio_tagging_loss=0.0116, over 15560.00 frames. ], tot_loss[loss=0.09078, simple_loss=0.1093, pruned_loss=0.02541, audio_tagging_loss=0.01073, over 3045509.12 frames. ], batch size: 58, lr: 8.66e-03, grad_scale: 16.0 2023-11-19 06:09:33,487 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.452e-01 2023-11-19 06:09:37,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=601813.3333333334, ans=0.0 2023-11-19 06:09:40,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=601880.0, ans=0.07 2023-11-19 06:09:50,159 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.701e+01 8.581e+01 9.083e+01 1.023e+02 1.492e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 06:10:02,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=602013.3333333334, ans=0.1 2023-11-19 06:10:12,865 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6150, loss[loss=0.07813, simple_loss=0.09931, pruned_loss=0.01893, audio_tagging_loss=0.009545, over 15209.00 frames. ], tot_loss[loss=0.09077, simple_loss=0.1092, pruned_loss=0.02549, audio_tagging_loss=0.01065, over 3046135.05 frames. ], batch size: 59, lr: 8.66e-03, grad_scale: 16.0 2023-11-19 06:10:25,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=602146.6666666666, ans=0.125 2023-11-19 06:10:37,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=602213.3333333334, ans=0.1 2023-11-19 06:10:59,210 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:11:05,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=602346.6666666666, ans=0.2 2023-11-19 06:11:07,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=602413.3333333334, ans=0.125 2023-11-19 06:11:08,562 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6200, loss[loss=0.08864, simple_loss=0.1126, pruned_loss=0.02255, audio_tagging_loss=0.009769, over 14775.00 frames. ], tot_loss[loss=0.08995, simple_loss=0.1081, pruned_loss=0.0251, audio_tagging_loss=0.01079, over 3046613.99 frames. ], batch size: 54, lr: 8.65e-03, grad_scale: 16.0 2023-11-19 06:11:18,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=602480.0, ans=0.125 2023-11-19 06:11:19,447 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=602480.0, ans=0.125 2023-11-19 06:11:22,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=602480.0, ans=0.125 2023-11-19 06:11:30,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=602546.6666666666, ans=0.125 2023-11-19 06:11:35,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=602546.6666666666, ans=0.125 2023-11-19 06:11:39,726 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.753e+01 8.400e+01 8.989e+01 9.962e+01 1.274e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 06:12:03,527 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6250, loss[loss=0.06911, simple_loss=0.08296, pruned_loss=0.01745, audio_tagging_loss=0.01017, over 15686.00 frames. ], tot_loss[loss=0.08951, simple_loss=0.1072, pruned_loss=0.02497, audio_tagging_loss=0.01097, over 3048507.32 frames. ], batch size: 60, lr: 8.65e-03, grad_scale: 16.0 2023-11-19 06:12:15,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=602813.3333333334, ans=0.0 2023-11-19 06:12:58,167 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6300, loss[loss=0.1052, simple_loss=0.1289, pruned_loss=0.03005, audio_tagging_loss=0.01072, over 15676.00 frames. ], tot_loss[loss=0.08999, simple_loss=0.1079, pruned_loss=0.02511, audio_tagging_loss=0.01093, over 3054372.41 frames. ], batch size: 57, lr: 8.65e-03, grad_scale: 16.0 2023-11-19 06:13:18,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=603146.6666666666, ans=0.1 2023-11-19 06:13:20,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=603213.3333333334, ans=10.0 2023-11-19 06:13:27,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=603213.3333333334, ans=0.0 2023-11-19 06:13:30,512 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 8.462e+01 9.271e+01 1.035e+02 1.313e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-19 06:13:40,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=603280.0, ans=0.125 2023-11-19 06:13:43,494 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=603346.6666666666, ans=0.125 2023-11-19 06:13:46,729 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:13:52,814 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6350, loss[loss=0.06976, simple_loss=0.07918, pruned_loss=0.01752, audio_tagging_loss=0.01265, over 14995.00 frames. ], tot_loss[loss=0.09043, simple_loss=0.1083, pruned_loss=0.02528, audio_tagging_loss=0.01099, over 3049451.69 frames. ], batch size: 57, lr: 8.65e-03, grad_scale: 16.0 2023-11-19 06:14:48,844 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6400, loss[loss=0.1103, simple_loss=0.1255, pruned_loss=0.03716, audio_tagging_loss=0.01039, over 15692.00 frames. ], tot_loss[loss=0.09071, simple_loss=0.1086, pruned_loss=0.02539, audio_tagging_loss=0.01104, over 3054788.26 frames. ], batch size: 58, lr: 8.65e-03, grad_scale: 32.0 2023-11-19 06:14:58,366 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.69 vs. limit=6.0 2023-11-19 06:15:20,943 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.708e+01 9.476e+01 1.030e+02 1.332e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-19 06:15:22,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=603946.6666666666, ans=0.1 2023-11-19 06:15:30,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=603946.6666666666, ans=0.125 2023-11-19 06:15:38,612 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.11 vs. limit=15.0 2023-11-19 06:15:44,367 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6450, loss[loss=0.06589, simple_loss=0.07723, pruned_loss=0.01397, audio_tagging_loss=0.01331, over 15594.00 frames. ], tot_loss[loss=0.09101, simple_loss=0.1092, pruned_loss=0.02543, audio_tagging_loss=0.01097, over 3060089.97 frames. ], batch size: 60, lr: 8.64e-03, grad_scale: 32.0 2023-11-19 06:15:51,288 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.03 vs. limit=22.5 2023-11-19 06:16:06,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=604213.3333333334, ans=0.0 2023-11-19 06:16:08,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=604213.3333333334, ans=0.125 2023-11-19 06:16:16,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=604213.3333333334, ans=0.125 2023-11-19 06:16:17,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=604280.0, ans=0.05 2023-11-19 06:16:32,151 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.935e-01 2023-11-19 06:16:33,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=604346.6666666666, ans=0.125 2023-11-19 06:16:38,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=604413.3333333334, ans=0.125 2023-11-19 06:16:39,329 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6500, loss[loss=0.08509, simple_loss=0.09505, pruned_loss=0.02887, audio_tagging_loss=0.008698, over 15572.00 frames. ], tot_loss[loss=0.09151, simple_loss=0.11, pruned_loss=0.02562, audio_tagging_loss=0.0109, over 3055921.14 frames. ], batch size: 64, lr: 8.64e-03, grad_scale: 32.0 2023-11-19 06:16:52,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=604480.0, ans=0.125 2023-11-19 06:17:06,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=604546.6666666666, ans=0.05 2023-11-19 06:17:07,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=604546.6666666666, ans=0.07 2023-11-19 06:17:09,497 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.60 vs. limit=15.0 2023-11-19 06:17:11,952 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.719e+01 8.554e+01 9.296e+01 1.013e+02 1.424e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 06:17:22,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=604680.0, ans=0.2 2023-11-19 06:17:25,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=604680.0, ans=0.0 2023-11-19 06:17:35,745 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6550, loss[loss=0.09591, simple_loss=0.1169, pruned_loss=0.02846, audio_tagging_loss=0.009016, over 14377.00 frames. ], tot_loss[loss=0.09137, simple_loss=0.1097, pruned_loss=0.02576, audio_tagging_loss=0.01076, over 3050458.45 frames. ], batch size: 54, lr: 8.64e-03, grad_scale: 32.0 2023-11-19 06:17:42,245 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.19 vs. limit=15.0 2023-11-19 06:17:48,703 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.97 vs. limit=22.5 2023-11-19 06:18:00,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=604880.0, ans=0.0 2023-11-19 06:18:14,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=604946.6666666666, ans=0.125 2023-11-19 06:18:15,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=604946.6666666666, ans=0.125 2023-11-19 06:18:31,331 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6600, loss[loss=0.07651, simple_loss=0.09917, pruned_loss=0.01916, audio_tagging_loss=0.00776, over 15334.00 frames. ], tot_loss[loss=0.09049, simple_loss=0.1089, pruned_loss=0.02542, audio_tagging_loss=0.01061, over 3046190.38 frames. ], batch size: 56, lr: 8.64e-03, grad_scale: 32.0 2023-11-19 06:18:32,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=605080.0, ans=0.0 2023-11-19 06:18:38,748 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=605080.0, ans=0.125 2023-11-19 06:18:38,803 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=605080.0, ans=0.0 2023-11-19 06:18:40,149 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=5.99 vs. limit=15.0 2023-11-19 06:18:45,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=605146.6666666666, ans=0.125 2023-11-19 06:19:03,867 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.408e+01 8.547e+01 9.371e+01 1.021e+02 1.350e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 06:19:17,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=605346.6666666666, ans=0.0 2023-11-19 06:19:26,466 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6650, loss[loss=0.08376, simple_loss=0.09718, pruned_loss=0.02182, audio_tagging_loss=0.01335, over 14935.00 frames. ], tot_loss[loss=0.09006, simple_loss=0.1086, pruned_loss=0.02526, audio_tagging_loss=0.01053, over 3045589.75 frames. ], batch size: 56, lr: 8.63e-03, grad_scale: 32.0 2023-11-19 06:19:36,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=605413.3333333334, ans=0.125 2023-11-19 06:19:43,722 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=605480.0, ans=0.125 2023-11-19 06:19:50,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=605546.6666666666, ans=0.0 2023-11-19 06:19:53,759 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.53 vs. limit=15.0 2023-11-19 06:19:58,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=605546.6666666666, ans=0.125 2023-11-19 06:20:22,599 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6700, loss[loss=0.09741, simple_loss=0.1075, pruned_loss=0.03348, audio_tagging_loss=0.01017, over 14118.00 frames. ], tot_loss[loss=0.09, simple_loss=0.1085, pruned_loss=0.02531, audio_tagging_loss=0.01045, over 3040665.22 frames. ], batch size: 55, lr: 8.63e-03, grad_scale: 32.0 2023-11-19 06:20:42,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=605813.3333333334, ans=0.0 2023-11-19 06:20:44,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=605880.0, ans=0.125 2023-11-19 06:20:46,856 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=605880.0, ans=0.125 2023-11-19 06:20:52,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=605880.0, ans=0.04949747468305833 2023-11-19 06:20:53,952 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.078e+01 8.187e+01 8.900e+01 9.907e+01 1.762e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 06:21:02,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=605946.6666666666, ans=0.0 2023-11-19 06:21:03,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=605946.6666666666, ans=0.025 2023-11-19 06:21:06,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=606013.3333333334, ans=0.0 2023-11-19 06:21:18,298 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6750, loss[loss=0.06006, simple_loss=0.07235, pruned_loss=0.01366, audio_tagging_loss=0.01022, over 16100.00 frames. ], tot_loss[loss=0.08938, simple_loss=0.1075, pruned_loss=0.02512, audio_tagging_loss=0.01052, over 3037605.44 frames. ], batch size: 62, lr: 8.63e-03, grad_scale: 32.0 2023-11-19 06:21:20,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=606080.0, ans=0.125 2023-11-19 06:21:39,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=606213.3333333334, ans=0.0 2023-11-19 06:21:44,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=606213.3333333334, ans=0.125 2023-11-19 06:21:45,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=606213.3333333334, ans=0.125 2023-11-19 06:21:58,297 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.92 vs. limit=10.0 2023-11-19 06:22:13,364 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6800, loss[loss=0.08645, simple_loss=0.1068, pruned_loss=0.02386, audio_tagging_loss=0.00918, over 15875.00 frames. ], tot_loss[loss=0.09095, simple_loss=0.1094, pruned_loss=0.02574, audio_tagging_loss=0.0105, over 3041826.84 frames. ], batch size: 60, lr: 8.63e-03, grad_scale: 32.0 2023-11-19 06:22:24,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=606480.0, ans=0.2 2023-11-19 06:22:39,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=606546.6666666666, ans=0.125 2023-11-19 06:22:42,098 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=606546.6666666666, ans=0.125 2023-11-19 06:22:44,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=606546.6666666666, ans=0.2 2023-11-19 06:22:45,965 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.152e+01 8.455e+01 9.265e+01 1.067e+02 1.623e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-19 06:22:54,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=606613.3333333334, ans=0.1 2023-11-19 06:22:56,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=606613.3333333334, ans=0.125 2023-11-19 06:23:01,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=606680.0, ans=0.025 2023-11-19 06:23:09,234 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6850, loss[loss=0.06832, simple_loss=0.07511, pruned_loss=0.01801, audio_tagging_loss=0.01276, over 14579.00 frames. ], tot_loss[loss=0.08992, simple_loss=0.1082, pruned_loss=0.02533, audio_tagging_loss=0.01051, over 3040176.50 frames. ], batch size: 59, lr: 8.62e-03, grad_scale: 32.0 2023-11-19 06:23:09,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=606746.6666666666, ans=0.125 2023-11-19 06:23:57,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=607013.3333333334, ans=0.125 2023-11-19 06:24:04,786 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6900, loss[loss=0.1037, simple_loss=0.1301, pruned_loss=0.03344, audio_tagging_loss=0.005205, over 15343.00 frames. ], tot_loss[loss=0.08993, simple_loss=0.1083, pruned_loss=0.02535, audio_tagging_loss=0.01044, over 3040988.59 frames. ], batch size: 56, lr: 8.62e-03, grad_scale: 32.0 2023-11-19 06:24:05,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=607080.0, ans=15.0 2023-11-19 06:24:23,687 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=607146.6666666666, ans=0.125 2023-11-19 06:24:37,156 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.004e+01 8.334e+01 9.172e+01 9.913e+01 1.941e+02, threshold=1.834e+02, percent-clipped=1.0 2023-11-19 06:24:44,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=607280.0, ans=0.1 2023-11-19 06:24:44,554 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.01 vs. limit=12.0 2023-11-19 06:24:45,583 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.84 vs. limit=15.0 2023-11-19 06:24:48,214 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:24:55,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=607346.6666666666, ans=0.2 2023-11-19 06:25:00,438 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 6950, loss[loss=0.1217, simple_loss=0.1467, pruned_loss=0.03644, audio_tagging_loss=0.01194, over 14190.00 frames. ], tot_loss[loss=0.09042, simple_loss=0.1088, pruned_loss=0.02552, audio_tagging_loss=0.01051, over 3031208.65 frames. ], batch size: 55, lr: 8.62e-03, grad_scale: 32.0 2023-11-19 06:25:07,278 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.19 vs. limit=15.0 2023-11-19 06:25:12,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=607480.0, ans=0.125 2023-11-19 06:25:13,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=607480.0, ans=0.0 2023-11-19 06:25:35,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=607613.3333333334, ans=0.0 2023-11-19 06:25:35,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=607613.3333333334, ans=0.1 2023-11-19 06:25:50,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=607680.0, ans=0.125 2023-11-19 06:25:52,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=607680.0, ans=0.125 2023-11-19 06:25:56,729 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7000, loss[loss=0.0738, simple_loss=0.08576, pruned_loss=0.01668, audio_tagging_loss=0.01425, over 13814.00 frames. ], tot_loss[loss=0.09075, simple_loss=0.1093, pruned_loss=0.02559, audio_tagging_loss=0.01051, over 3028591.65 frames. ], batch size: 53, lr: 8.62e-03, grad_scale: 32.0 2023-11-19 06:26:02,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=607746.6666666666, ans=0.125 2023-11-19 06:26:10,294 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.04 vs. limit=6.0 2023-11-19 06:26:26,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=607880.0, ans=0.125 2023-11-19 06:26:28,335 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.831e+01 8.487e+01 9.225e+01 1.011e+02 1.458e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 06:26:52,396 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7050, loss[loss=0.09453, simple_loss=0.1084, pruned_loss=0.02742, audio_tagging_loss=0.01292, over 15271.00 frames. ], tot_loss[loss=0.09087, simple_loss=0.1087, pruned_loss=0.02575, audio_tagging_loss=0.01077, over 3026998.38 frames. ], batch size: 58, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:27:11,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=608146.6666666666, ans=0.0 2023-11-19 06:27:14,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=608213.3333333334, ans=0.125 2023-11-19 06:27:23,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=608213.3333333334, ans=0.0 2023-11-19 06:27:30,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=608280.0, ans=0.1 2023-11-19 06:27:33,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=608280.0, ans=0.2 2023-11-19 06:27:37,467 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2023-11-19 06:27:42,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=608346.6666666666, ans=0.125 2023-11-19 06:27:48,155 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7100, loss[loss=0.09358, simple_loss=0.1109, pruned_loss=0.02684, audio_tagging_loss=0.01128, over 13912.00 frames. ], tot_loss[loss=0.08997, simple_loss=0.1077, pruned_loss=0.02525, audio_tagging_loss=0.01086, over 3028738.27 frames. ], batch size: 52, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:28:17,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=608546.6666666666, ans=0.1 2023-11-19 06:28:19,659 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 9.012e+01 9.917e+01 1.109e+02 1.355e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-19 06:28:43,568 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7150, loss[loss=0.0948, simple_loss=0.1145, pruned_loss=0.02745, audio_tagging_loss=0.01009, over 15583.00 frames. ], tot_loss[loss=0.09111, simple_loss=0.1095, pruned_loss=0.0256, audio_tagging_loss=0.01078, over 3029441.96 frames. ], batch size: 56, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:28:43,851 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=608746.6666666666, ans=0.125 2023-11-19 06:29:05,059 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2023-11-19 06:29:17,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=608946.6666666666, ans=0.125 2023-11-19 06:29:28,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=609013.3333333334, ans=0.0 2023-11-19 06:29:35,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=609013.3333333334, ans=0.125 2023-11-19 06:29:38,972 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7200, loss[loss=0.06265, simple_loss=0.07466, pruned_loss=0.01328, audio_tagging_loss=0.01204, over 15734.00 frames. ], tot_loss[loss=0.0913, simple_loss=0.1095, pruned_loss=0.02576, audio_tagging_loss=0.01082, over 3037045.59 frames. ], batch size: 58, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:30:06,802 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=609213.3333333334, ans=0.07 2023-11-19 06:30:10,691 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.882e+01 8.596e+01 9.352e+01 1.022e+02 1.385e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-19 06:30:33,336 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7250, loss[loss=0.05936, simple_loss=0.06672, pruned_loss=0.01267, audio_tagging_loss=0.01333, over 14846.00 frames. ], tot_loss[loss=0.09007, simple_loss=0.1078, pruned_loss=0.02513, audio_tagging_loss=0.01104, over 3041471.13 frames. ], batch size: 57, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:30:33,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=609413.3333333334, ans=0.125 2023-11-19 06:30:36,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=609413.3333333334, ans=0.2 2023-11-19 06:30:40,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=609413.3333333334, ans=0.2 2023-11-19 06:30:52,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=609480.0, ans=0.5 2023-11-19 06:31:18,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=609680.0, ans=0.2 2023-11-19 06:31:28,340 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7300, loss[loss=0.07447, simple_loss=0.08796, pruned_loss=0.01868, audio_tagging_loss=0.01181, over 16504.00 frames. ], tot_loss[loss=0.08944, simple_loss=0.1074, pruned_loss=0.02484, audio_tagging_loss=0.01092, over 3037544.06 frames. ], batch size: 65, lr: 8.60e-03, grad_scale: 32.0 2023-11-19 06:31:35,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=609746.6666666666, ans=0.2 2023-11-19 06:31:55,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=609880.0, ans=0.2 2023-11-19 06:31:59,847 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.576e+01 9.670e+01 1.044e+02 1.433e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-19 06:32:23,118 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7350, loss[loss=0.08147, simple_loss=0.08826, pruned_loss=0.02409, audio_tagging_loss=0.01325, over 15245.00 frames. ], tot_loss[loss=0.08981, simple_loss=0.1081, pruned_loss=0.02518, audio_tagging_loss=0.01057, over 3035087.88 frames. ], batch size: 59, lr: 8.60e-03, grad_scale: 32.0 2023-11-19 06:32:24,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=610080.0, ans=0.125 2023-11-19 06:32:34,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=610146.6666666666, ans=0.09899494936611666 2023-11-19 06:32:42,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=610146.6666666666, ans=0.0 2023-11-19 06:32:55,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=610213.3333333334, ans=0.125 2023-11-19 06:33:05,966 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2023-11-19 06:33:11,635 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.83 vs. limit=15.0 2023-11-19 06:33:18,533 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7400, loss[loss=0.09891, simple_loss=0.1216, pruned_loss=0.02882, audio_tagging_loss=0.00931, over 14909.00 frames. ], tot_loss[loss=0.0894, simple_loss=0.1078, pruned_loss=0.02504, audio_tagging_loss=0.01046, over 3030549.46 frames. ], batch size: 54, lr: 8.60e-03, grad_scale: 32.0 2023-11-19 06:33:31,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=610480.0, ans=0.1 2023-11-19 06:33:42,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=610546.6666666666, ans=0.1 2023-11-19 06:33:47,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=610546.6666666666, ans=0.125 2023-11-19 06:33:51,302 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.147e+01 8.574e+01 9.523e+01 1.112e+02 1.475e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-19 06:33:52,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=610613.3333333334, ans=0.0 2023-11-19 06:33:57,828 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=610613.3333333334, ans=0.1 2023-11-19 06:34:03,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=610680.0, ans=0.125 2023-11-19 06:34:07,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=610680.0, ans=0.0 2023-11-19 06:34:14,353 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7450, loss[loss=0.1192, simple_loss=0.1339, pruned_loss=0.04173, audio_tagging_loss=0.01047, over 15800.00 frames. ], tot_loss[loss=0.08948, simple_loss=0.1079, pruned_loss=0.02509, audio_tagging_loss=0.01047, over 3041934.61 frames. ], batch size: 57, lr: 8.60e-03, grad_scale: 32.0 2023-11-19 06:34:32,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=610813.3333333334, ans=0.025 2023-11-19 06:34:42,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=610880.0, ans=0.0 2023-11-19 06:34:49,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=610946.6666666666, ans=0.0 2023-11-19 06:34:53,263 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=610946.6666666666, ans=0.125 2023-11-19 06:34:56,605 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.99 vs. limit=10.0 2023-11-19 06:34:59,477 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:35:10,292 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7500, loss[loss=0.08486, simple_loss=0.09861, pruned_loss=0.02304, audio_tagging_loss=0.01252, over 15966.00 frames. ], tot_loss[loss=0.0892, simple_loss=0.1072, pruned_loss=0.02504, audio_tagging_loss=0.01057, over 3049293.01 frames. ], batch size: 63, lr: 8.59e-03, grad_scale: 32.0 2023-11-19 06:35:22,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=611146.6666666666, ans=0.125 2023-11-19 06:35:30,142 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=611146.6666666666, ans=0.125 2023-11-19 06:35:34,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=611213.3333333334, ans=0.1 2023-11-19 06:35:38,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=611213.3333333334, ans=0.2 2023-11-19 06:35:42,587 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.328e+01 8.409e+01 9.431e+01 1.041e+02 1.502e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-19 06:35:49,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=611280.0, ans=0.0 2023-11-19 06:36:05,186 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7550, loss[loss=0.08743, simple_loss=0.1104, pruned_loss=0.0234, audio_tagging_loss=0.008858, over 16116.00 frames. ], tot_loss[loss=0.08934, simple_loss=0.1075, pruned_loss=0.02512, audio_tagging_loss=0.01048, over 3051999.32 frames. ], batch size: 59, lr: 8.59e-03, grad_scale: 32.0 2023-11-19 06:36:17,471 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.98 vs. limit=10.0 2023-11-19 06:36:29,627 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=611546.6666666666, ans=0.1 2023-11-19 06:36:57,529 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:36:58,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=611746.6666666666, ans=0.0 2023-11-19 06:36:59,380 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7600, loss[loss=0.08392, simple_loss=0.09424, pruned_loss=0.02767, audio_tagging_loss=0.009133, over 14745.00 frames. ], tot_loss[loss=0.08834, simple_loss=0.1058, pruned_loss=0.02481, audio_tagging_loss=0.01063, over 3055186.50 frames. ], batch size: 56, lr: 8.59e-03, grad_scale: 32.0 2023-11-19 06:37:00,980 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.66 vs. limit=22.5 2023-11-19 06:37:26,176 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.07 vs. limit=15.0 2023-11-19 06:37:32,017 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.740e+01 8.282e+01 9.217e+01 9.907e+01 1.227e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-19 06:37:39,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=611946.6666666666, ans=0.1 2023-11-19 06:37:40,082 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.65 vs. limit=15.0 2023-11-19 06:37:56,175 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7650, loss[loss=0.0792, simple_loss=0.09306, pruned_loss=0.01904, audio_tagging_loss=0.01363, over 13672.00 frames. ], tot_loss[loss=0.08893, simple_loss=0.1066, pruned_loss=0.02492, audio_tagging_loss=0.0107, over 3052848.46 frames. ], batch size: 55, lr: 8.59e-03, grad_scale: 32.0 2023-11-19 06:37:59,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=612080.0, ans=0.0 2023-11-19 06:38:06,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=612146.6666666666, ans=0.0 2023-11-19 06:38:31,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=612280.0, ans=0.0 2023-11-19 06:38:37,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=612280.0, ans=0.0 2023-11-19 06:38:51,528 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7700, loss[loss=0.06257, simple_loss=0.07066, pruned_loss=0.01502, audio_tagging_loss=0.01222, over 15504.00 frames. ], tot_loss[loss=0.0886, simple_loss=0.1061, pruned_loss=0.0248, audio_tagging_loss=0.01075, over 3054492.50 frames. ], batch size: 60, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:38:51,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=612413.3333333334, ans=0.0 2023-11-19 06:39:03,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=612480.0, ans=0.125 2023-11-19 06:39:05,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=612480.0, ans=0.0 2023-11-19 06:39:11,611 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.94 vs. limit=22.5 2023-11-19 06:39:20,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=612546.6666666666, ans=0.0 2023-11-19 06:39:23,608 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.913e+01 9.609e+01 1.128e+02 1.741e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-19 06:39:45,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=612746.6666666666, ans=0.125 2023-11-19 06:39:46,133 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7750, loss[loss=0.08205, simple_loss=0.09622, pruned_loss=0.02184, audio_tagging_loss=0.01211, over 14507.00 frames. ], tot_loss[loss=0.08905, simple_loss=0.1067, pruned_loss=0.02498, audio_tagging_loss=0.01071, over 3057889.49 frames. ], batch size: 53, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:40:03,028 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2023-11-19 06:40:25,854 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.09 vs. limit=15.0 2023-11-19 06:40:37,220 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.04 vs. limit=15.0 2023-11-19 06:40:42,171 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7800, loss[loss=0.07026, simple_loss=0.09176, pruned_loss=0.0125, audio_tagging_loss=0.01187, over 14586.00 frames. ], tot_loss[loss=0.08946, simple_loss=0.1072, pruned_loss=0.02511, audio_tagging_loss=0.01075, over 3048244.66 frames. ], batch size: 55, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:40:48,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=613080.0, ans=0.09899494936611666 2023-11-19 06:41:03,810 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.59 vs. limit=22.5 2023-11-19 06:41:10,856 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:41:13,882 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.392e+01 9.222e+01 1.047e+02 1.457e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 06:41:16,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=613280.0, ans=0.125 2023-11-19 06:41:40,455 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7850, loss[loss=0.1179, simple_loss=0.1522, pruned_loss=0.03369, audio_tagging_loss=0.008104, over 15079.00 frames. ], tot_loss[loss=0.08938, simple_loss=0.1071, pruned_loss=0.02503, audio_tagging_loss=0.0108, over 3049707.33 frames. ], batch size: 55, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:41:53,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=613480.0, ans=0.125 2023-11-19 06:42:02,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=613546.6666666666, ans=0.0 2023-11-19 06:42:03,153 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2023-11-19 06:42:35,051 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7900, loss[loss=0.1056, simple_loss=0.1248, pruned_loss=0.03171, audio_tagging_loss=0.01152, over 15324.00 frames. ], tot_loss[loss=0.08965, simple_loss=0.1071, pruned_loss=0.02514, audio_tagging_loss=0.01097, over 3052423.28 frames. ], batch size: 54, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:42:38,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=613746.6666666666, ans=0.125 2023-11-19 06:42:38,576 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=613746.6666666666, ans=0.125 2023-11-19 06:42:41,634 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=613746.6666666666, ans=0.125 2023-11-19 06:42:43,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=613746.6666666666, ans=0.1 2023-11-19 06:42:52,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=613813.3333333334, ans=0.0 2023-11-19 06:43:07,836 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.644e+01 9.372e+01 1.085e+02 1.414e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 06:43:29,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=614013.3333333334, ans=0.125 2023-11-19 06:43:31,071 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 7950, loss[loss=0.08683, simple_loss=0.1084, pruned_loss=0.02138, audio_tagging_loss=0.01126, over 14640.00 frames. ], tot_loss[loss=0.08887, simple_loss=0.1063, pruned_loss=0.02476, audio_tagging_loss=0.01095, over 3049446.40 frames. ], batch size: 55, lr: 8.57e-03, grad_scale: 32.0 2023-11-19 06:43:32,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=614080.0, ans=0.125 2023-11-19 06:43:36,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=614080.0, ans=0.125 2023-11-19 06:43:42,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=614146.6666666666, ans=0.125 2023-11-19 06:43:44,721 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:43:48,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=614146.6666666666, ans=0.09899494936611666 2023-11-19 06:43:53,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=614213.3333333334, ans=0.0 2023-11-19 06:44:09,080 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=614280.0, ans=0.125 2023-11-19 06:44:16,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=614346.6666666666, ans=0.125 2023-11-19 06:44:26,293 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8000, loss[loss=0.09834, simple_loss=0.1121, pruned_loss=0.02939, audio_tagging_loss=0.01287, over 14780.00 frames. ], tot_loss[loss=0.08881, simple_loss=0.106, pruned_loss=0.02472, audio_tagging_loss=0.0111, over 3041804.86 frames. ], batch size: 54, lr: 8.57e-03, grad_scale: 32.0 2023-11-19 06:44:33,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=614413.3333333334, ans=0.1 2023-11-19 06:44:57,959 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.896e+01 9.629e+01 1.081e+02 1.400e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-19 06:45:04,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=614613.3333333334, ans=0.1 2023-11-19 06:45:05,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=614613.3333333334, ans=0.125 2023-11-19 06:45:20,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=614746.6666666666, ans=0.125 2023-11-19 06:45:21,588 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8050, loss[loss=0.1094, simple_loss=0.1299, pruned_loss=0.03443, audio_tagging_loss=0.01001, over 15150.00 frames. ], tot_loss[loss=0.08923, simple_loss=0.1064, pruned_loss=0.0249, audio_tagging_loss=0.01111, over 3044257.92 frames. ], batch size: 56, lr: 8.57e-03, grad_scale: 64.0 2023-11-19 06:45:21,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=614746.6666666666, ans=0.0 2023-11-19 06:45:36,473 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=614813.3333333334, ans=0.125 2023-11-19 06:45:37,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=614813.3333333334, ans=0.0 2023-11-19 06:45:38,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=614813.3333333334, ans=0.125 2023-11-19 06:45:43,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=614880.0, ans=0.1 2023-11-19 06:46:06,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=615013.3333333334, ans=0.0 2023-11-19 06:46:09,632 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=615013.3333333334, ans=0.0 2023-11-19 06:46:17,957 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8100, loss[loss=0.07978, simple_loss=0.09995, pruned_loss=0.02093, audio_tagging_loss=0.008869, over 15142.00 frames. ], tot_loss[loss=0.08955, simple_loss=0.1071, pruned_loss=0.02502, audio_tagging_loss=0.01099, over 3042669.59 frames. ], batch size: 57, lr: 8.57e-03, grad_scale: 64.0 2023-11-19 06:46:46,329 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.25 vs. limit=15.0 2023-11-19 06:46:49,784 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.425e+01 9.284e+01 1.022e+02 1.413e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-19 06:47:08,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=615346.6666666666, ans=0.0 2023-11-19 06:47:13,503 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8150, loss[loss=0.106, simple_loss=0.1318, pruned_loss=0.03316, audio_tagging_loss=0.006956, over 15479.00 frames. ], tot_loss[loss=0.09008, simple_loss=0.1079, pruned_loss=0.02536, audio_tagging_loss=0.01076, over 3041731.51 frames. ], batch size: 57, lr: 8.56e-03, grad_scale: 64.0 2023-11-19 06:47:18,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=615413.3333333334, ans=0.0 2023-11-19 06:47:19,008 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=13.16 vs. limit=15.0 2023-11-19 06:47:57,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=615680.0, ans=0.125 2023-11-19 06:48:08,936 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8200, loss[loss=0.08706, simple_loss=0.09976, pruned_loss=0.02223, audio_tagging_loss=0.01494, over 15053.00 frames. ], tot_loss[loss=0.08941, simple_loss=0.1073, pruned_loss=0.02501, audio_tagging_loss=0.01076, over 3050008.38 frames. ], batch size: 58, lr: 8.56e-03, grad_scale: 64.0 2023-11-19 06:48:08,984 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:48:10,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=615746.6666666666, ans=0.0 2023-11-19 06:48:20,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=615813.3333333334, ans=0.0 2023-11-19 06:48:25,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=615813.3333333334, ans=0.125 2023-11-19 06:48:36,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=615880.0, ans=0.1 2023-11-19 06:48:41,184 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.380e+01 9.339e+01 1.044e+02 1.327e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 06:48:45,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=615946.6666666666, ans=0.125 2023-11-19 06:49:05,271 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8250, loss[loss=0.0789, simple_loss=0.08413, pruned_loss=0.02026, audio_tagging_loss=0.01657, over 13639.00 frames. ], tot_loss[loss=0.08927, simple_loss=0.1073, pruned_loss=0.02494, audio_tagging_loss=0.01068, over 3047285.21 frames. ], batch size: 52, lr: 8.56e-03, grad_scale: 64.0 2023-11-19 06:49:16,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=616146.6666666666, ans=0.05 2023-11-19 06:49:53,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=616346.6666666666, ans=0.2 2023-11-19 06:50:00,758 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8300, loss[loss=0.1009, simple_loss=0.1284, pruned_loss=0.0278, audio_tagging_loss=0.008875, over 16846.00 frames. ], tot_loss[loss=0.0892, simple_loss=0.1073, pruned_loss=0.02478, audio_tagging_loss=0.01077, over 3053833.67 frames. ], batch size: 65, lr: 8.56e-03, grad_scale: 32.0 2023-11-19 06:50:19,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=616480.0, ans=0.125 2023-11-19 06:50:22,088 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.62 vs. limit=10.0 2023-11-19 06:50:34,216 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.747e+01 9.487e+01 1.050e+02 1.506e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-19 06:50:36,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=616613.3333333334, ans=0.0 2023-11-19 06:50:46,587 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.414e-01 2023-11-19 06:50:56,412 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8350, loss[loss=0.107, simple_loss=0.1328, pruned_loss=0.03027, audio_tagging_loss=0.01031, over 15102.00 frames. ], tot_loss[loss=0.08964, simple_loss=0.1078, pruned_loss=0.02502, audio_tagging_loss=0.01075, over 3054073.69 frames. ], batch size: 56, lr: 8.55e-03, grad_scale: 32.0 2023-11-19 06:50:56,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=616746.6666666666, ans=0.125 2023-11-19 06:51:28,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=616946.6666666666, ans=0.09899494936611666 2023-11-19 06:51:33,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=616946.6666666666, ans=0.125 2023-11-19 06:51:39,029 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.09 vs. limit=12.0 2023-11-19 06:51:47,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=617013.3333333334, ans=0.1 2023-11-19 06:51:48,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=617013.3333333334, ans=0.0 2023-11-19 06:51:51,344 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8400, loss[loss=0.07968, simple_loss=0.08537, pruned_loss=0.02581, audio_tagging_loss=0.01118, over 15887.00 frames. ], tot_loss[loss=0.08955, simple_loss=0.1075, pruned_loss=0.02506, audio_tagging_loss=0.01073, over 3054888.67 frames. ], batch size: 61, lr: 8.55e-03, grad_scale: 32.0 2023-11-19 06:52:05,323 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=617146.6666666666, ans=0.04949747468305833 2023-11-19 06:52:25,115 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.224e+01 8.578e+01 9.429e+01 1.025e+02 1.342e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-19 06:52:40,818 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.95 vs. limit=15.0 2023-11-19 06:52:47,679 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8450, loss[loss=0.05867, simple_loss=0.06005, pruned_loss=0.01552, audio_tagging_loss=0.01313, over 14605.00 frames. ], tot_loss[loss=0.09008, simple_loss=0.108, pruned_loss=0.02529, audio_tagging_loss=0.01078, over 3049691.49 frames. ], batch size: 57, lr: 8.55e-03, grad_scale: 32.0 2023-11-19 06:52:50,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=617413.3333333334, ans=0.0 2023-11-19 06:53:22,305 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.26 vs. limit=15.0 2023-11-19 06:53:28,776 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.54 vs. limit=5.0 2023-11-19 06:53:29,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=617613.3333333334, ans=0.125 2023-11-19 06:53:37,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=617680.0, ans=0.1 2023-11-19 06:53:43,102 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8500, loss[loss=0.09021, simple_loss=0.1092, pruned_loss=0.02593, audio_tagging_loss=0.009679, over 16753.00 frames. ], tot_loss[loss=0.09096, simple_loss=0.1093, pruned_loss=0.02556, audio_tagging_loss=0.01073, over 3055043.95 frames. ], batch size: 62, lr: 8.55e-03, grad_scale: 32.0 2023-11-19 06:53:46,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=617746.6666666666, ans=0.125 2023-11-19 06:54:00,205 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:54:04,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=617880.0, ans=0.025 2023-11-19 06:54:07,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=617880.0, ans=0.125 2023-11-19 06:54:16,818 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.808e+01 8.976e+01 1.039e+02 1.170e+02 1.800e+02, threshold=2.077e+02, percent-clipped=0.0 2023-11-19 06:54:21,662 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2023-11-19 06:54:24,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=617946.6666666666, ans=0.125 2023-11-19 06:54:34,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=618013.3333333334, ans=0.09899494936611666 2023-11-19 06:54:38,528 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8550, loss[loss=0.1046, simple_loss=0.1248, pruned_loss=0.03417, audio_tagging_loss=0.008065, over 15505.00 frames. ], tot_loss[loss=0.09056, simple_loss=0.1087, pruned_loss=0.02543, audio_tagging_loss=0.01075, over 3055506.79 frames. ], batch size: 55, lr: 8.55e-03, grad_scale: 16.0 2023-11-19 06:54:38,750 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=618080.0, ans=0.125 2023-11-19 06:55:00,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=618213.3333333334, ans=0.125 2023-11-19 06:55:34,197 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8600, loss[loss=0.05541, simple_loss=0.06052, pruned_loss=0.01334, audio_tagging_loss=0.01181, over 14408.00 frames. ], tot_loss[loss=0.09095, simple_loss=0.1093, pruned_loss=0.02554, audio_tagging_loss=0.01077, over 3051843.20 frames. ], batch size: 56, lr: 8.54e-03, grad_scale: 16.0 2023-11-19 06:55:36,023 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.87 vs. limit=15.0 2023-11-19 06:55:36,434 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=618413.3333333334, ans=0.015 2023-11-19 06:55:42,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=618413.3333333334, ans=0.125 2023-11-19 06:56:09,130 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.491e+01 8.546e+01 9.397e+01 1.054e+02 1.390e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 06:56:30,051 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8650, loss[loss=0.08103, simple_loss=0.09954, pruned_loss=0.01894, audio_tagging_loss=0.01231, over 15191.00 frames. ], tot_loss[loss=0.09082, simple_loss=0.1091, pruned_loss=0.02544, audio_tagging_loss=0.01084, over 3049829.80 frames. ], batch size: 57, lr: 8.54e-03, grad_scale: 16.0 2023-11-19 06:56:31,292 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=618746.6666666666, ans=0.04949747468305833 2023-11-19 06:56:34,501 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=618746.6666666666, ans=0.0 2023-11-19 06:56:47,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=618813.3333333334, ans=0.5 2023-11-19 06:56:48,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=618813.3333333334, ans=0.125 2023-11-19 06:57:19,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=619013.3333333334, ans=0.125 2023-11-19 06:57:24,839 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8700, loss[loss=0.1136, simple_loss=0.1425, pruned_loss=0.03293, audio_tagging_loss=0.009454, over 15896.00 frames. ], tot_loss[loss=0.09038, simple_loss=0.1084, pruned_loss=0.02522, audio_tagging_loss=0.01096, over 3047202.82 frames. ], batch size: 59, lr: 8.54e-03, grad_scale: 16.0 2023-11-19 06:57:36,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=619146.6666666666, ans=0.125 2023-11-19 06:57:43,124 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.90 vs. limit=6.0 2023-11-19 06:57:44,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=619146.6666666666, ans=0.0 2023-11-19 06:57:45,887 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=619146.6666666666, ans=0.0 2023-11-19 06:58:00,014 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 9.176e+01 9.937e+01 1.111e+02 1.947e+02, threshold=1.987e+02, percent-clipped=1.0 2023-11-19 06:58:05,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=619280.0, ans=0.1 2023-11-19 06:58:09,206 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.50 vs. limit=22.5 2023-11-19 06:58:14,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=619346.6666666666, ans=0.125 2023-11-19 06:58:21,834 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8750, loss[loss=0.1032, simple_loss=0.134, pruned_loss=0.0265, audio_tagging_loss=0.009682, over 15924.00 frames. ], tot_loss[loss=0.0901, simple_loss=0.108, pruned_loss=0.02516, audio_tagging_loss=0.01096, over 3048266.52 frames. ], batch size: 56, lr: 8.54e-03, grad_scale: 16.0 2023-11-19 06:58:25,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=619413.3333333334, ans=0.2 2023-11-19 06:58:33,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=619480.0, ans=0.2 2023-11-19 06:58:35,205 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=619480.0, ans=0.0 2023-11-19 06:58:38,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=619480.0, ans=0.0 2023-11-19 06:58:38,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=619480.0, ans=0.0 2023-11-19 06:58:47,906 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2023-11-19 06:58:54,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=619613.3333333334, ans=0.125 2023-11-19 06:58:59,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=619613.3333333334, ans=0.125 2023-11-19 06:59:08,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=619680.0, ans=0.125 2023-11-19 06:59:16,511 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8800, loss[loss=0.1199, simple_loss=0.145, pruned_loss=0.03842, audio_tagging_loss=0.00894, over 14405.00 frames. ], tot_loss[loss=0.09063, simple_loss=0.1088, pruned_loss=0.02524, audio_tagging_loss=0.01101, over 3048831.84 frames. ], batch size: 53, lr: 8.53e-03, grad_scale: 32.0 2023-11-19 06:59:20,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=619746.6666666666, ans=0.125 2023-11-19 06:59:24,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=619746.6666666666, ans=0.0 2023-11-19 06:59:44,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=619880.0, ans=0.125 2023-11-19 06:59:50,693 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.071e+01 8.376e+01 8.906e+01 9.929e+01 1.528e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-19 07:00:05,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=620013.3333333334, ans=0.2 2023-11-19 07:00:10,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=620080.0, ans=0.0 2023-11-19 07:00:11,476 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8850, loss[loss=0.1088, simple_loss=0.1237, pruned_loss=0.03491, audio_tagging_loss=0.01208, over 14853.00 frames. ], tot_loss[loss=0.0909, simple_loss=0.1093, pruned_loss=0.02533, audio_tagging_loss=0.01091, over 3050396.35 frames. ], batch size: 57, lr: 8.53e-03, grad_scale: 32.0 2023-11-19 07:00:16,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=620080.0, ans=0.125 2023-11-19 07:00:22,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=620146.6666666666, ans=0.07 2023-11-19 07:00:23,184 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:00:31,107 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.08 vs. limit=6.0 2023-11-19 07:00:38,700 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.604e-02 2023-11-19 07:00:45,059 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=620280.0, ans=0.125 2023-11-19 07:00:54,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=620346.6666666666, ans=0.0 2023-11-19 07:00:54,830 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.92 vs. limit=12.0 2023-11-19 07:00:55,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=620346.6666666666, ans=0.0 2023-11-19 07:01:07,559 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8900, loss[loss=0.1108, simple_loss=0.1246, pruned_loss=0.0349, audio_tagging_loss=0.01363, over 14600.00 frames. ], tot_loss[loss=0.09087, simple_loss=0.1093, pruned_loss=0.02544, audio_tagging_loss=0.01076, over 3044388.07 frames. ], batch size: 54, lr: 8.53e-03, grad_scale: 32.0 2023-11-19 07:01:28,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=620546.6666666666, ans=0.07 2023-11-19 07:01:30,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=620546.6666666666, ans=0.2 2023-11-19 07:01:33,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=620546.6666666666, ans=0.1 2023-11-19 07:01:42,269 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.959e+01 8.332e+01 9.247e+01 1.033e+02 1.504e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-19 07:01:44,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=620613.3333333334, ans=0.0 2023-11-19 07:01:53,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=620680.0, ans=0.09899494936611666 2023-11-19 07:01:55,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=620680.0, ans=0.125 2023-11-19 07:02:02,941 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 8950, loss[loss=0.1168, simple_loss=0.1337, pruned_loss=0.03854, audio_tagging_loss=0.01142, over 15855.00 frames. ], tot_loss[loss=0.09059, simple_loss=0.1091, pruned_loss=0.02537, audio_tagging_loss=0.01069, over 3049541.21 frames. ], batch size: 59, lr: 8.53e-03, grad_scale: 16.0 2023-11-19 07:02:14,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=620813.3333333334, ans=0.125 2023-11-19 07:02:51,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=621013.3333333334, ans=0.125 2023-11-19 07:02:57,849 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9000, loss[loss=0.05941, simple_loss=0.0609, pruned_loss=0.01759, audio_tagging_loss=0.01137, over 14565.00 frames. ], tot_loss[loss=0.09071, simple_loss=0.1094, pruned_loss=0.02554, audio_tagging_loss=0.01048, over 3057769.66 frames. ], batch size: 59, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:02:57,850 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 07:03:12,015 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.2.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4872, 4.0139, 4.6730, 4.2541], device='cuda:2') 2023-11-19 07:03:26,161 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.8534, 2.7930, 3.5172, 2.9712, 3.7814, 3.7786, 3.3960, 3.0334], device='cuda:2') 2023-11-19 07:03:30,614 INFO [train_asr.py:1147] (2/4) Epoch 8, validation: loss=0.06719, simple_loss=0.05665, pruned_loss=0.006997, audio_tagging_loss=0.03186, over 4681554.00 frames. 2023-11-19 07:03:30,614 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 07:03:31,264 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2023-11-19 07:03:33,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=621080.0, ans=0.125 2023-11-19 07:03:39,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=621080.0, ans=0.0 2023-11-19 07:03:45,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=621146.6666666666, ans=0.125 2023-11-19 07:03:55,816 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=15.97 vs. limit=15.0 2023-11-19 07:04:03,158 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=621280.0, ans=0.125 2023-11-19 07:04:03,274 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=621280.0, ans=0.0 2023-11-19 07:04:05,132 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.568e+01 9.180e+01 1.028e+02 1.650e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-19 07:04:26,000 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9050, loss[loss=0.1054, simple_loss=0.1322, pruned_loss=0.03034, audio_tagging_loss=0.00898, over 15353.00 frames. ], tot_loss[loss=0.09029, simple_loss=0.1091, pruned_loss=0.02534, audio_tagging_loss=0.01039, over 3061302.41 frames. ], batch size: 58, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:04:43,199 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.39 vs. limit=15.0 2023-11-19 07:04:59,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=621613.3333333334, ans=0.05 2023-11-19 07:05:02,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=621613.3333333334, ans=0.125 2023-11-19 07:05:12,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=621680.0, ans=0.125 2023-11-19 07:05:18,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=621680.0, ans=0.1 2023-11-19 07:05:19,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=621746.6666666666, ans=0.125 2023-11-19 07:05:20,480 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9100, loss[loss=0.09418, simple_loss=0.103, pruned_loss=0.0281, audio_tagging_loss=0.01458, over 14588.00 frames. ], tot_loss[loss=0.09112, simple_loss=0.1101, pruned_loss=0.02579, audio_tagging_loss=0.0103, over 3055254.79 frames. ], batch size: 54, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:05:56,144 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.531e+01 9.413e+01 1.050e+02 2.515e+02, threshold=1.883e+02, percent-clipped=1.0 2023-11-19 07:06:14,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=622013.3333333334, ans=0.0 2023-11-19 07:06:16,315 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9150, loss[loss=0.09108, simple_loss=0.1194, pruned_loss=0.02244, audio_tagging_loss=0.008929, over 14796.00 frames. ], tot_loss[loss=0.09107, simple_loss=0.1099, pruned_loss=0.02575, audio_tagging_loss=0.01038, over 3052389.62 frames. ], batch size: 54, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:06:39,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=622213.3333333334, ans=0.0 2023-11-19 07:06:44,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=622213.3333333334, ans=0.125 2023-11-19 07:06:52,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=622280.0, ans=0.125 2023-11-19 07:06:58,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=622280.0, ans=0.0 2023-11-19 07:07:12,194 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9200, loss[loss=0.08321, simple_loss=0.09476, pruned_loss=0.02455, audio_tagging_loss=0.01128, over 14925.00 frames. ], tot_loss[loss=0.09086, simple_loss=0.1094, pruned_loss=0.0257, audio_tagging_loss=0.01044, over 3059441.89 frames. ], batch size: 57, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:07:17,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=622413.3333333334, ans=0.125 2023-11-19 07:07:29,405 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.08 vs. limit=15.0 2023-11-19 07:07:48,139 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.097e+01 8.490e+01 9.151e+01 1.009e+02 3.492e+02, threshold=1.830e+02, percent-clipped=1.0 2023-11-19 07:07:51,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=622613.3333333334, ans=0.2 2023-11-19 07:07:53,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=622613.3333333334, ans=0.1 2023-11-19 07:07:57,897 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.40 vs. limit=15.0 2023-11-19 07:08:03,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=622680.0, ans=0.125 2023-11-19 07:08:06,981 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9250, loss[loss=0.1004, simple_loss=0.1206, pruned_loss=0.03218, audio_tagging_loss=0.007886, over 14291.00 frames. ], tot_loss[loss=0.09, simple_loss=0.1081, pruned_loss=0.02549, audio_tagging_loss=0.01045, over 3058299.54 frames. ], batch size: 57, lr: 8.51e-03, grad_scale: 16.0 2023-11-19 07:08:24,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=622813.3333333334, ans=0.125 2023-11-19 07:08:30,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=622880.0, ans=0.0 2023-11-19 07:08:34,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=622880.0, ans=0.1 2023-11-19 07:08:49,591 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:09:03,093 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9300, loss[loss=0.1187, simple_loss=0.1501, pruned_loss=0.03409, audio_tagging_loss=0.009534, over 16643.00 frames. ], tot_loss[loss=0.09026, simple_loss=0.1089, pruned_loss=0.02538, audio_tagging_loss=0.01044, over 3061271.44 frames. ], batch size: 58, lr: 8.51e-03, grad_scale: 16.0 2023-11-19 07:09:03,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=623080.0, ans=0.125 2023-11-19 07:09:19,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=623146.6666666666, ans=0.125 2023-11-19 07:09:24,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=623213.3333333334, ans=0.125 2023-11-19 07:09:27,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=623213.3333333334, ans=0.0 2023-11-19 07:09:28,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=623213.3333333334, ans=0.125 2023-11-19 07:09:39,349 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 8.509e+01 9.283e+01 1.013e+02 1.341e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-19 07:09:52,982 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.61 vs. limit=15.0 2023-11-19 07:09:58,417 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9350, loss[loss=0.09206, simple_loss=0.1107, pruned_loss=0.02675, audio_tagging_loss=0.009978, over 14508.00 frames. ], tot_loss[loss=0.09044, simple_loss=0.109, pruned_loss=0.02541, audio_tagging_loss=0.01054, over 3058265.28 frames. ], batch size: 57, lr: 8.51e-03, grad_scale: 16.0 2023-11-19 07:10:17,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=623480.0, ans=0.125 2023-11-19 07:10:18,312 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=623480.0, ans=0.0 2023-11-19 07:10:29,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=623546.6666666666, ans=0.0 2023-11-19 07:10:33,140 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=623613.3333333334, ans=0.125 2023-11-19 07:10:36,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=623613.3333333334, ans=0.0 2023-11-19 07:10:49,019 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=623680.0, ans=0.2 2023-11-19 07:10:54,096 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9400, loss[loss=0.08334, simple_loss=0.09074, pruned_loss=0.02421, audio_tagging_loss=0.01377, over 14374.00 frames. ], tot_loss[loss=0.09052, simple_loss=0.1089, pruned_loss=0.02553, audio_tagging_loss=0.01056, over 3057160.99 frames. ], batch size: 56, lr: 8.51e-03, grad_scale: 16.0 2023-11-19 07:10:57,775 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.81 vs. limit=15.0 2023-11-19 07:11:25,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=623880.0, ans=0.125 2023-11-19 07:11:29,492 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.56 vs. limit=15.0 2023-11-19 07:11:31,161 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.566e+01 8.638e+01 9.433e+01 1.071e+02 1.581e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-19 07:11:46,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=624013.3333333334, ans=0.0 2023-11-19 07:11:48,789 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:11:49,850 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9450, loss[loss=0.1051, simple_loss=0.1327, pruned_loss=0.02716, audio_tagging_loss=0.01162, over 14927.00 frames. ], tot_loss[loss=0.08983, simple_loss=0.1077, pruned_loss=0.02522, audio_tagging_loss=0.01076, over 3053637.13 frames. ], batch size: 56, lr: 8.50e-03, grad_scale: 16.0 2023-11-19 07:12:04,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=624146.6666666666, ans=0.125 2023-11-19 07:12:07,497 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=14.15 vs. limit=15.0 2023-11-19 07:12:24,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=624280.0, ans=0.125 2023-11-19 07:12:36,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=624346.6666666666, ans=0.0 2023-11-19 07:12:41,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=624346.6666666666, ans=0.125 2023-11-19 07:12:45,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=624413.3333333334, ans=0.0 2023-11-19 07:12:45,962 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9500, loss[loss=0.08865, simple_loss=0.1087, pruned_loss=0.02515, audio_tagging_loss=0.009166, over 17077.00 frames. ], tot_loss[loss=0.08928, simple_loss=0.107, pruned_loss=0.02493, audio_tagging_loss=0.01085, over 3049004.98 frames. ], batch size: 65, lr: 8.50e-03, grad_scale: 16.0 2023-11-19 07:12:58,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=624480.0, ans=0.125 2023-11-19 07:13:01,807 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.89 vs. limit=22.5 2023-11-19 07:13:22,386 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.807e+01 8.457e+01 9.140e+01 9.820e+01 1.196e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 07:13:23,725 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=624613.3333333334, ans=0.0 2023-11-19 07:13:29,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=624680.0, ans=0.125 2023-11-19 07:13:37,308 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.76 vs. limit=10.0 2023-11-19 07:13:41,566 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9550, loss[loss=0.0894, simple_loss=0.1024, pruned_loss=0.02758, audio_tagging_loss=0.01059, over 14865.00 frames. ], tot_loss[loss=0.09055, simple_loss=0.1087, pruned_loss=0.0254, audio_tagging_loss=0.01079, over 3053989.54 frames. ], batch size: 57, lr: 8.50e-03, grad_scale: 16.0 2023-11-19 07:13:50,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=624746.6666666666, ans=0.125 2023-11-19 07:13:57,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=624813.3333333334, ans=0.0 2023-11-19 07:14:05,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=624880.0, ans=0.05 2023-11-19 07:14:07,505 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.83 vs. limit=22.5 2023-11-19 07:14:18,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=624946.6666666666, ans=0.1 2023-11-19 07:14:36,197 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=625080.0, ans=0.0 2023-11-19 07:14:37,083 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9600, loss[loss=0.09425, simple_loss=0.1194, pruned_loss=0.02593, audio_tagging_loss=0.008605, over 14865.00 frames. ], tot_loss[loss=0.09042, simple_loss=0.1088, pruned_loss=0.02526, audio_tagging_loss=0.01075, over 3051997.42 frames. ], batch size: 55, lr: 8.50e-03, grad_scale: 32.0 2023-11-19 07:14:47,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=625146.6666666666, ans=0.05 2023-11-19 07:15:00,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=625213.3333333334, ans=0.0 2023-11-19 07:15:13,358 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.874e+01 8.461e+01 9.298e+01 1.020e+02 1.547e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 07:15:33,197 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9650, loss[loss=0.09426, simple_loss=0.1116, pruned_loss=0.02709, audio_tagging_loss=0.01138, over 14062.00 frames. ], tot_loss[loss=0.09088, simple_loss=0.1096, pruned_loss=0.02535, audio_tagging_loss=0.01073, over 3041471.99 frames. ], batch size: 54, lr: 8.50e-03, grad_scale: 32.0 2023-11-19 07:15:37,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=625413.3333333334, ans=0.125 2023-11-19 07:15:49,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=625480.0, ans=0.0 2023-11-19 07:15:53,191 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-19 07:15:55,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=625546.6666666666, ans=0.125 2023-11-19 07:16:01,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=625546.6666666666, ans=10.0 2023-11-19 07:16:17,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=625680.0, ans=0.2 2023-11-19 07:16:18,077 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=15.0 2023-11-19 07:16:28,189 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9700, loss[loss=0.07843, simple_loss=0.1011, pruned_loss=0.02048, audio_tagging_loss=0.007389, over 14898.00 frames. ], tot_loss[loss=0.09013, simple_loss=0.1086, pruned_loss=0.02524, audio_tagging_loss=0.01057, over 3044373.91 frames. ], batch size: 55, lr: 8.49e-03, grad_scale: 32.0 2023-11-19 07:16:30,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=625746.6666666666, ans=0.125 2023-11-19 07:16:53,830 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.99 vs. limit=15.0 2023-11-19 07:17:05,229 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.958e+01 8.422e+01 9.037e+01 9.716e+01 1.315e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 07:17:10,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=625946.6666666666, ans=0.125 2023-11-19 07:17:22,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=626080.0, ans=0.1 2023-11-19 07:17:24,174 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9750, loss[loss=0.093, simple_loss=0.1131, pruned_loss=0.02507, audio_tagging_loss=0.01139, over 15391.00 frames. ], tot_loss[loss=0.08924, simple_loss=0.1076, pruned_loss=0.02488, audio_tagging_loss=0.01055, over 3044221.91 frames. ], batch size: 58, lr: 8.49e-03, grad_scale: 32.0 2023-11-19 07:17:25,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=626080.0, ans=0.125 2023-11-19 07:17:33,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=626080.0, ans=0.07 2023-11-19 07:17:47,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=626213.3333333334, ans=0.125 2023-11-19 07:18:03,238 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.00 vs. limit=22.5 2023-11-19 07:18:05,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=626280.0, ans=0.125 2023-11-19 07:18:06,618 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.098e-01 2023-11-19 07:18:19,726 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9800, loss[loss=0.09456, simple_loss=0.1206, pruned_loss=0.02437, audio_tagging_loss=0.009882, over 16391.00 frames. ], tot_loss[loss=0.08988, simple_loss=0.1085, pruned_loss=0.0251, audio_tagging_loss=0.01052, over 3050447.85 frames. ], batch size: 60, lr: 8.49e-03, grad_scale: 32.0 2023-11-19 07:18:24,896 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.58 vs. limit=15.0 2023-11-19 07:18:45,175 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=626546.6666666666, ans=0.125 2023-11-19 07:18:47,196 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=626546.6666666666, ans=0.0 2023-11-19 07:18:56,499 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.157e+01 8.941e+01 9.770e+01 1.328e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-19 07:18:56,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=626613.3333333334, ans=0.125 2023-11-19 07:19:10,038 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:19:15,291 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9850, loss[loss=0.0823, simple_loss=0.0903, pruned_loss=0.02284, audio_tagging_loss=0.01431, over 14707.00 frames. ], tot_loss[loss=0.0906, simple_loss=0.1095, pruned_loss=0.02538, audio_tagging_loss=0.01045, over 3054510.77 frames. ], batch size: 56, lr: 8.49e-03, grad_scale: 32.0 2023-11-19 07:19:56,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=626946.6666666666, ans=0.125 2023-11-19 07:20:10,741 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9900, loss[loss=0.1207, simple_loss=0.1594, pruned_loss=0.03575, audio_tagging_loss=0.005266, over 14558.00 frames. ], tot_loss[loss=0.09064, simple_loss=0.1094, pruned_loss=0.02548, audio_tagging_loss=0.01046, over 3056379.49 frames. ], batch size: 53, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:20:23,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=627146.6666666666, ans=0.125 2023-11-19 07:20:25,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=627146.6666666666, ans=0.09899494936611666 2023-11-19 07:20:43,515 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.72 vs. limit=10.0 2023-11-19 07:20:47,071 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.674e+01 9.203e+01 1.082e+02 1.582e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 07:21:06,725 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 9950, loss[loss=0.0698, simple_loss=0.0863, pruned_loss=0.01736, audio_tagging_loss=0.009288, over 14625.00 frames. ], tot_loss[loss=0.08996, simple_loss=0.1085, pruned_loss=0.02521, audio_tagging_loss=0.01048, over 3052715.54 frames. ], batch size: 54, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:21:21,656 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=15.0 2023-11-19 07:21:25,474 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:21:40,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=627613.3333333334, ans=0.0 2023-11-19 07:21:44,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=627613.3333333334, ans=0.125 2023-11-19 07:21:46,412 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2023-11-19 07:21:49,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=627613.3333333334, ans=0.125 2023-11-19 07:22:02,101 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10000, loss[loss=0.105, simple_loss=0.1189, pruned_loss=0.03317, audio_tagging_loss=0.01232, over 15236.00 frames. ], tot_loss[loss=0.08927, simple_loss=0.1078, pruned_loss=0.02491, audio_tagging_loss=0.01044, over 3050418.82 frames. ], batch size: 57, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:22:38,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=627946.6666666666, ans=0.125 2023-11-19 07:22:38,986 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.402e+01 8.658e+01 9.575e+01 1.064e+02 1.480e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-19 07:22:45,201 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=15.0 2023-11-19 07:22:54,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=628013.3333333334, ans=0.125 2023-11-19 07:22:57,046 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10050, loss[loss=0.09588, simple_loss=0.1184, pruned_loss=0.02853, audio_tagging_loss=0.008153, over 16310.00 frames. ], tot_loss[loss=0.08846, simple_loss=0.1067, pruned_loss=0.02458, audio_tagging_loss=0.01055, over 3052426.86 frames. ], batch size: 59, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:22:57,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=628080.0, ans=0.125 2023-11-19 07:23:05,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=628080.0, ans=0.1 2023-11-19 07:23:05,391 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2023-11-19 07:23:11,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=628146.6666666666, ans=0.0 2023-11-19 07:23:12,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=628146.6666666666, ans=0.1 2023-11-19 07:23:28,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=628213.3333333334, ans=0.0 2023-11-19 07:23:35,076 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=628280.0, ans=0.125 2023-11-19 07:23:40,000 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.13 vs. limit=6.0 2023-11-19 07:23:44,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=628346.6666666666, ans=0.0 2023-11-19 07:23:53,590 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10100, loss[loss=0.07899, simple_loss=0.09882, pruned_loss=0.02017, audio_tagging_loss=0.00941, over 15321.00 frames. ], tot_loss[loss=0.08881, simple_loss=0.107, pruned_loss=0.02476, audio_tagging_loss=0.01056, over 3048999.66 frames. ], batch size: 58, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:23:54,069 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2023-11-19 07:24:00,779 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:24:01,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=628413.3333333334, ans=0.125 2023-11-19 07:24:05,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=628480.0, ans=0.125 2023-11-19 07:24:21,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=628546.6666666666, ans=0.07 2023-11-19 07:24:28,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=628613.3333333334, ans=0.125 2023-11-19 07:24:29,381 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.846e+01 9.478e+01 1.084e+02 1.850e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-19 07:24:37,885 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:24:43,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=628680.0, ans=0.125 2023-11-19 07:24:48,272 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.41 vs. limit=10.0 2023-11-19 07:24:48,922 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10150, loss[loss=0.08587, simple_loss=0.1031, pruned_loss=0.02432, audio_tagging_loss=0.01002, over 14801.00 frames. ], tot_loss[loss=0.08899, simple_loss=0.1067, pruned_loss=0.02487, audio_tagging_loss=0.01076, over 3046882.31 frames. ], batch size: 57, lr: 8.47e-03, grad_scale: 32.0 2023-11-19 07:24:50,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=628746.6666666666, ans=0.125 2023-11-19 07:25:13,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=628880.0, ans=0.0 2023-11-19 07:25:15,291 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:25:29,622 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2023-11-19 07:25:40,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=629013.3333333334, ans=0.125 2023-11-19 07:25:42,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=629013.3333333334, ans=0.0 2023-11-19 07:25:43,827 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10200, loss[loss=0.1147, simple_loss=0.1386, pruned_loss=0.03365, audio_tagging_loss=0.01177, over 15852.00 frames. ], tot_loss[loss=0.09005, simple_loss=0.108, pruned_loss=0.02525, audio_tagging_loss=0.0108, over 3049015.88 frames. ], batch size: 59, lr: 8.47e-03, grad_scale: 32.0 2023-11-19 07:25:57,133 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.22 vs. limit=22.5 2023-11-19 07:26:00,145 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.82 vs. limit=22.5 2023-11-19 07:26:05,551 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:26:20,960 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 8.630e+01 9.623e+01 1.074e+02 1.731e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-19 07:26:23,743 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.60 vs. limit=22.5 2023-11-19 07:26:40,088 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10250, loss[loss=0.1166, simple_loss=0.1351, pruned_loss=0.03742, audio_tagging_loss=0.01159, over 15733.00 frames. ], tot_loss[loss=0.09048, simple_loss=0.1085, pruned_loss=0.02536, audio_tagging_loss=0.01086, over 3049308.37 frames. ], batch size: 59, lr: 8.47e-03, grad_scale: 32.0 2023-11-19 07:26:46,614 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.77 vs. limit=15.0 2023-11-19 07:27:36,291 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10300, loss[loss=0.1379, simple_loss=0.1714, pruned_loss=0.04181, audio_tagging_loss=0.01034, over 14514.00 frames. ], tot_loss[loss=0.09088, simple_loss=0.109, pruned_loss=0.02552, audio_tagging_loss=0.01085, over 3044972.69 frames. ], batch size: 56, lr: 8.47e-03, grad_scale: 32.0 2023-11-19 07:27:57,090 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=629880.0, ans=0.125 2023-11-19 07:28:12,260 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.771e+01 9.491e+01 1.012e+02 1.579e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-19 07:28:29,970 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=630080.0, ans=0.2 2023-11-19 07:28:30,765 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10350, loss[loss=0.1135, simple_loss=0.1429, pruned_loss=0.03448, audio_tagging_loss=0.007516, over 16090.00 frames. ], tot_loss[loss=0.09074, simple_loss=0.1088, pruned_loss=0.02539, audio_tagging_loss=0.01094, over 3045640.44 frames. ], batch size: 58, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:28:54,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=630213.3333333334, ans=0.0 2023-11-19 07:28:58,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=630213.3333333334, ans=0.1 2023-11-19 07:29:10,457 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.75 vs. limit=10.0 2023-11-19 07:29:20,384 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-11-19 07:29:26,698 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10400, loss[loss=0.08428, simple_loss=0.09977, pruned_loss=0.02093, audio_tagging_loss=0.01346, over 15511.00 frames. ], tot_loss[loss=0.09085, simple_loss=0.1089, pruned_loss=0.02539, audio_tagging_loss=0.011, over 3040229.35 frames. ], batch size: 62, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:29:29,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=630413.3333333334, ans=0.0 2023-11-19 07:29:51,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=630546.6666666666, ans=0.125 2023-11-19 07:30:00,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=630613.3333333334, ans=0.0 2023-11-19 07:30:03,014 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.887e+01 8.595e+01 9.141e+01 1.035e+02 1.375e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 07:30:16,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=630680.0, ans=0.0 2023-11-19 07:30:22,397 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10450, loss[loss=0.08309, simple_loss=0.09977, pruned_loss=0.02293, audio_tagging_loss=0.01027, over 15015.00 frames. ], tot_loss[loss=0.09072, simple_loss=0.1087, pruned_loss=0.02532, audio_tagging_loss=0.01103, over 3036794.79 frames. ], batch size: 56, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:30:31,946 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=15.0 2023-11-19 07:30:39,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=630813.3333333334, ans=0.1 2023-11-19 07:30:58,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=630946.6666666666, ans=0.0 2023-11-19 07:31:01,677 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=630946.6666666666, ans=0.125 2023-11-19 07:31:08,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=631013.3333333334, ans=0.0 2023-11-19 07:31:13,146 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2023-11-19 07:31:17,676 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10500, loss[loss=0.1114, simple_loss=0.1414, pruned_loss=0.03318, audio_tagging_loss=0.007531, over 16762.00 frames. ], tot_loss[loss=0.09059, simple_loss=0.1088, pruned_loss=0.0254, audio_tagging_loss=0.0108, over 3038189.57 frames. ], batch size: 58, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:31:18,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=631080.0, ans=0.0 2023-11-19 07:31:24,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff2.min_abs, batch_count=631080.0, ans=0.1 2023-11-19 07:31:54,645 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.423e+01 8.554e+01 9.380e+01 1.032e+02 1.223e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-19 07:32:13,138 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10550, loss[loss=0.1128, simple_loss=0.1479, pruned_loss=0.031, audio_tagging_loss=0.00784, over 15879.00 frames. ], tot_loss[loss=0.09076, simple_loss=0.1092, pruned_loss=0.02547, audio_tagging_loss=0.0107, over 3036688.82 frames. ], batch size: 56, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:32:16,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=631413.3333333334, ans=0.0 2023-11-19 07:32:18,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=631413.3333333334, ans=0.125 2023-11-19 07:32:32,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=631480.0, ans=0.125 2023-11-19 07:33:09,242 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10600, loss[loss=0.07878, simple_loss=0.09554, pruned_loss=0.02388, audio_tagging_loss=0.007133, over 15261.00 frames. ], tot_loss[loss=0.09015, simple_loss=0.1084, pruned_loss=0.02531, audio_tagging_loss=0.01065, over 3036701.16 frames. ], batch size: 57, lr: 8.45e-03, grad_scale: 32.0 2023-11-19 07:33:30,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=631880.0, ans=0.125 2023-11-19 07:33:45,593 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.034e+01 8.589e+01 9.112e+01 9.990e+01 1.319e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-19 07:33:47,414 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.88 vs. limit=22.5 2023-11-19 07:33:56,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=632013.3333333334, ans=0.2 2023-11-19 07:34:00,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=632013.3333333334, ans=0.0 2023-11-19 07:34:04,984 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10650, loss[loss=0.06835, simple_loss=0.0796, pruned_loss=0.0166, audio_tagging_loss=0.01195, over 16105.00 frames. ], tot_loss[loss=0.08957, simple_loss=0.1078, pruned_loss=0.02518, audio_tagging_loss=0.01047, over 3035227.89 frames. ], batch size: 61, lr: 8.45e-03, grad_scale: 32.0 2023-11-19 07:34:12,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=632080.0, ans=0.125 2023-11-19 07:34:31,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=632213.3333333334, ans=0.0 2023-11-19 07:34:39,731 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=632280.0, ans=0.0 2023-11-19 07:34:49,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=632346.6666666666, ans=0.1 2023-11-19 07:34:53,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=632346.6666666666, ans=0.09899494936611666 2023-11-19 07:34:59,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=632346.6666666666, ans=15.0 2023-11-19 07:35:00,554 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10700, loss[loss=0.08489, simple_loss=0.09774, pruned_loss=0.02027, audio_tagging_loss=0.01575, over 15271.00 frames. ], tot_loss[loss=0.09065, simple_loss=0.1095, pruned_loss=0.02548, audio_tagging_loss=0.01041, over 3044167.95 frames. ], batch size: 58, lr: 8.45e-03, grad_scale: 32.0 2023-11-19 07:35:05,696 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-11-19 07:35:13,246 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.63 vs. limit=22.5 2023-11-19 07:35:25,739 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.97 vs. limit=15.0 2023-11-19 07:35:37,031 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.376e+01 8.469e+01 9.057e+01 9.771e+01 1.264e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 07:35:56,119 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10750, loss[loss=0.08278, simple_loss=0.09034, pruned_loss=0.02657, audio_tagging_loss=0.01104, over 15457.00 frames. ], tot_loss[loss=0.09035, simple_loss=0.1092, pruned_loss=0.02538, audio_tagging_loss=0.01035, over 3043189.94 frames. ], batch size: 61, lr: 8.45e-03, grad_scale: 32.0 2023-11-19 07:36:04,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=632746.6666666666, ans=0.125 2023-11-19 07:36:04,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=632746.6666666666, ans=0.125 2023-11-19 07:36:24,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=632880.0, ans=0.125 2023-11-19 07:36:30,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=632946.6666666666, ans=0.0 2023-11-19 07:36:34,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=632946.6666666666, ans=0.125 2023-11-19 07:36:35,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=632946.6666666666, ans=0.0 2023-11-19 07:36:51,425 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10800, loss[loss=0.05849, simple_loss=0.06559, pruned_loss=0.01148, audio_tagging_loss=0.01422, over 14356.00 frames. ], tot_loss[loss=0.08917, simple_loss=0.1077, pruned_loss=0.02492, audio_tagging_loss=0.01043, over 3047372.33 frames. ], batch size: 55, lr: 8.44e-03, grad_scale: 32.0 2023-11-19 07:37:01,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=633146.6666666666, ans=0.0 2023-11-19 07:37:01,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=633146.6666666666, ans=0.125 2023-11-19 07:37:29,646 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.684e+01 9.473e+01 1.039e+02 1.467e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-19 07:37:46,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=633346.6666666666, ans=0.125 2023-11-19 07:37:48,160 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10850, loss[loss=0.09493, simple_loss=0.1192, pruned_loss=0.02676, audio_tagging_loss=0.008562, over 15011.00 frames. ], tot_loss[loss=0.08917, simple_loss=0.1074, pruned_loss=0.02502, audio_tagging_loss=0.01044, over 3052112.40 frames. ], batch size: 57, lr: 8.44e-03, grad_scale: 16.0 2023-11-19 07:37:49,944 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.62 vs. limit=15.0 2023-11-19 07:37:50,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=633413.3333333334, ans=0.05 2023-11-19 07:37:53,177 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:38:22,505 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2023-11-19 07:38:23,221 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=633613.3333333334, ans=0.0 2023-11-19 07:38:37,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=633680.0, ans=0.125 2023-11-19 07:38:40,561 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:38:43,644 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10900, loss[loss=0.07634, simple_loss=0.08722, pruned_loss=0.02245, audio_tagging_loss=0.01028, over 15236.00 frames. ], tot_loss[loss=0.08939, simple_loss=0.1078, pruned_loss=0.02502, audio_tagging_loss=0.01045, over 3052987.59 frames. ], batch size: 59, lr: 8.44e-03, grad_scale: 16.0 2023-11-19 07:39:16,575 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.72 vs. limit=15.0 2023-11-19 07:39:21,879 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.390e+01 9.440e+01 1.044e+02 1.572e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-19 07:39:28,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=634013.3333333334, ans=0.125 2023-11-19 07:39:31,607 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.84 vs. limit=22.5 2023-11-19 07:39:33,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=634013.3333333334, ans=0.125 2023-11-19 07:39:39,382 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 10950, loss[loss=0.08082, simple_loss=0.09421, pruned_loss=0.02125, audio_tagging_loss=0.01247, over 16301.00 frames. ], tot_loss[loss=0.0893, simple_loss=0.1077, pruned_loss=0.02488, audio_tagging_loss=0.01057, over 3059232.82 frames. ], batch size: 61, lr: 8.44e-03, grad_scale: 16.0 2023-11-19 07:39:58,140 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.51 vs. limit=15.0 2023-11-19 07:40:12,342 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.47 vs. limit=15.0 2023-11-19 07:40:15,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=634280.0, ans=0.125 2023-11-19 07:40:17,816 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=634280.0, ans=0.2 2023-11-19 07:40:34,750 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11000, loss[loss=0.06328, simple_loss=0.06692, pruned_loss=0.01622, audio_tagging_loss=0.0136, over 13717.00 frames. ], tot_loss[loss=0.08918, simple_loss=0.1074, pruned_loss=0.02482, audio_tagging_loss=0.01065, over 3053499.94 frames. ], batch size: 54, lr: 8.44e-03, grad_scale: 16.0 2023-11-19 07:40:44,838 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:41:12,768 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.925e+01 8.297e+01 9.076e+01 1.002e+02 1.429e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-19 07:41:29,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=634680.0, ans=0.125 2023-11-19 07:41:31,718 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11050, loss[loss=0.09049, simple_loss=0.1195, pruned_loss=0.02243, audio_tagging_loss=0.008331, over 16382.00 frames. ], tot_loss[loss=0.08927, simple_loss=0.1073, pruned_loss=0.02487, audio_tagging_loss=0.01075, over 3052025.35 frames. ], batch size: 59, lr: 8.43e-03, grad_scale: 16.0 2023-11-19 07:41:35,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=634746.6666666666, ans=0.2 2023-11-19 07:41:45,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=634813.3333333334, ans=0.125 2023-11-19 07:41:49,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=634813.3333333334, ans=0.1 2023-11-19 07:41:50,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=634813.3333333334, ans=0.125 2023-11-19 07:41:52,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=634880.0, ans=0.125 2023-11-19 07:41:57,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=634880.0, ans=0.04949747468305833 2023-11-19 07:41:58,855 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=634880.0, ans=0.0 2023-11-19 07:42:09,405 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.98 vs. limit=22.5 2023-11-19 07:42:19,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=635013.3333333334, ans=0.0 2023-11-19 07:42:27,260 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11100, loss[loss=0.09945, simple_loss=0.1162, pruned_loss=0.02791, audio_tagging_loss=0.01346, over 17053.00 frames. ], tot_loss[loss=0.08883, simple_loss=0.1063, pruned_loss=0.0247, audio_tagging_loss=0.01099, over 3052563.19 frames. ], batch size: 63, lr: 8.43e-03, grad_scale: 16.0 2023-11-19 07:42:27,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=635080.0, ans=0.2 2023-11-19 07:43:05,510 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.720e+01 9.689e+01 1.049e+02 1.321e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-19 07:43:07,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=635280.0, ans=0.125 2023-11-19 07:43:15,810 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.97 vs. limit=10.0 2023-11-19 07:43:22,343 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11150, loss[loss=0.1189, simple_loss=0.1592, pruned_loss=0.0302, audio_tagging_loss=0.00915, over 15144.00 frames. ], tot_loss[loss=0.08902, simple_loss=0.1063, pruned_loss=0.0248, audio_tagging_loss=0.01106, over 3053291.95 frames. ], batch size: 55, lr: 8.43e-03, grad_scale: 16.0 2023-11-19 07:43:29,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=635413.3333333334, ans=0.125 2023-11-19 07:43:42,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635480.0, ans=0.1 2023-11-19 07:44:09,768 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=22.5 2023-11-19 07:44:18,683 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11200, loss[loss=0.1006, simple_loss=0.1155, pruned_loss=0.03244, audio_tagging_loss=0.01038, over 15399.00 frames. ], tot_loss[loss=0.08961, simple_loss=0.1071, pruned_loss=0.02501, audio_tagging_loss=0.01106, over 3061146.18 frames. ], batch size: 56, lr: 8.43e-03, grad_scale: 32.0 2023-11-19 07:44:31,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=635813.3333333334, ans=0.1 2023-11-19 07:44:44,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=635880.0, ans=0.125 2023-11-19 07:44:46,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=635880.0, ans=0.5 2023-11-19 07:44:56,021 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.574e+01 8.472e+01 9.235e+01 9.971e+01 1.338e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 07:44:56,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=635946.6666666666, ans=0.125 2023-11-19 07:44:58,444 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=635946.6666666666, ans=0.07 2023-11-19 07:45:01,322 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:45:08,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=636013.3333333334, ans=0.125 2023-11-19 07:45:14,407 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11250, loss[loss=0.09875, simple_loss=0.1297, pruned_loss=0.02729, audio_tagging_loss=0.006601, over 15239.00 frames. ], tot_loss[loss=0.0899, simple_loss=0.1077, pruned_loss=0.02513, audio_tagging_loss=0.01094, over 3059524.50 frames. ], batch size: 54, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:45:39,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=636213.3333333334, ans=0.0 2023-11-19 07:45:40,891 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=636213.3333333334, ans=0.0 2023-11-19 07:45:52,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=636280.0, ans=0.1 2023-11-19 07:46:00,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=636346.6666666666, ans=0.125 2023-11-19 07:46:03,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=636346.6666666666, ans=0.0 2023-11-19 07:46:09,191 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11300, loss[loss=0.08173, simple_loss=0.108, pruned_loss=0.02027, audio_tagging_loss=0.007477, over 15861.00 frames. ], tot_loss[loss=0.08972, simple_loss=0.1074, pruned_loss=0.02514, audio_tagging_loss=0.01087, over 3058982.28 frames. ], batch size: 58, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:46:11,074 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.04 vs. limit=6.0 2023-11-19 07:46:15,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=636413.3333333334, ans=0.125 2023-11-19 07:46:28,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=636480.0, ans=0.2 2023-11-19 07:46:43,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=636613.3333333334, ans=0.125 2023-11-19 07:46:47,209 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.708e+01 8.767e+01 9.641e+01 1.057e+02 1.574e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-19 07:46:52,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=636680.0, ans=0.125 2023-11-19 07:47:05,214 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11350, loss[loss=0.08544, simple_loss=0.0968, pruned_loss=0.02699, audio_tagging_loss=0.01006, over 14439.00 frames. ], tot_loss[loss=0.08906, simple_loss=0.107, pruned_loss=0.02491, audio_tagging_loss=0.01066, over 3048981.58 frames. ], batch size: 56, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:47:12,386 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=636746.6666666666, ans=0.125 2023-11-19 07:47:17,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=636813.3333333334, ans=0.2 2023-11-19 07:47:24,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=636813.3333333334, ans=0.0 2023-11-19 07:48:01,153 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11400, loss[loss=0.09629, simple_loss=0.1267, pruned_loss=0.02656, audio_tagging_loss=0.006397, over 15351.00 frames. ], tot_loss[loss=0.08917, simple_loss=0.1071, pruned_loss=0.02505, audio_tagging_loss=0.01056, over 3040846.75 frames. ], batch size: 57, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:48:38,432 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.888e+01 8.501e+01 9.316e+01 1.025e+02 1.797e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 07:48:56,275 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11450, loss[loss=0.08025, simple_loss=0.1002, pruned_loss=0.02164, audio_tagging_loss=0.008516, over 15831.00 frames. ], tot_loss[loss=0.0887, simple_loss=0.1067, pruned_loss=0.02478, audio_tagging_loss=0.01054, over 3043534.87 frames. ], batch size: 59, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:48:59,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=637413.3333333334, ans=0.2 2023-11-19 07:49:03,669 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=637413.3333333334, ans=0.2 2023-11-19 07:49:15,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=637480.0, ans=0.125 2023-11-19 07:49:19,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=637546.6666666666, ans=0.125 2023-11-19 07:49:28,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=637546.6666666666, ans=0.125 2023-11-19 07:49:32,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=637613.3333333334, ans=0.125 2023-11-19 07:49:32,393 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.58 vs. limit=22.5 2023-11-19 07:49:40,567 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=637680.0, ans=0.0 2023-11-19 07:49:47,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=637680.0, ans=0.0 2023-11-19 07:49:53,047 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11500, loss[loss=0.08615, simple_loss=0.09837, pruned_loss=0.0226, audio_tagging_loss=0.01437, over 14711.00 frames. ], tot_loss[loss=0.089, simple_loss=0.1072, pruned_loss=0.02487, audio_tagging_loss=0.01052, over 3044107.50 frames. ], batch size: 57, lr: 8.41e-03, grad_scale: 32.0 2023-11-19 07:50:04,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=637813.3333333334, ans=0.125 2023-11-19 07:50:15,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=637880.0, ans=0.025 2023-11-19 07:50:30,710 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.770e+01 8.483e+01 9.238e+01 9.842e+01 1.262e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 07:50:32,052 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=637946.6666666666, ans=0.0 2023-11-19 07:50:32,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=637946.6666666666, ans=0.125 2023-11-19 07:50:41,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=638013.3333333334, ans=0.1 2023-11-19 07:50:49,223 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11550, loss[loss=0.07512, simple_loss=0.08878, pruned_loss=0.01938, audio_tagging_loss=0.01135, over 13789.00 frames. ], tot_loss[loss=0.08862, simple_loss=0.1066, pruned_loss=0.0247, audio_tagging_loss=0.0106, over 3046209.17 frames. ], batch size: 53, lr: 8.41e-03, grad_scale: 32.0 2023-11-19 07:50:58,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=638146.6666666666, ans=0.125 2023-11-19 07:51:22,848 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:51:23,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=638280.0, ans=0.125 2023-11-19 07:51:37,141 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2023-11-19 07:51:39,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=638346.6666666666, ans=0.0 2023-11-19 07:51:43,890 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11600, loss[loss=0.1097, simple_loss=0.1338, pruned_loss=0.03464, audio_tagging_loss=0.008158, over 15228.00 frames. ], tot_loss[loss=0.08904, simple_loss=0.107, pruned_loss=0.02497, audio_tagging_loss=0.01057, over 3051955.92 frames. ], batch size: 57, lr: 8.41e-03, grad_scale: 32.0 2023-11-19 07:51:47,561 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.13 vs. limit=10.0 2023-11-19 07:52:17,672 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=638613.3333333334, ans=0.125 2023-11-19 07:52:17,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=638613.3333333334, ans=0.2 2023-11-19 07:52:21,739 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.115e+01 8.734e+01 9.337e+01 1.014e+02 1.440e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-19 07:52:28,514 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:52:28,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=638680.0, ans=0.125 2023-11-19 07:52:28,878 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.69 vs. limit=10.0 2023-11-19 07:52:39,906 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11650, loss[loss=0.06373, simple_loss=0.06762, pruned_loss=0.01616, audio_tagging_loss=0.01376, over 14791.00 frames. ], tot_loss[loss=0.08973, simple_loss=0.1081, pruned_loss=0.0251, audio_tagging_loss=0.01059, over 3052872.70 frames. ], batch size: 58, lr: 8.41e-03, grad_scale: 16.0 2023-11-19 07:53:28,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=639013.3333333334, ans=0.0 2023-11-19 07:53:34,702 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11700, loss[loss=0.1121, simple_loss=0.1386, pruned_loss=0.03259, audio_tagging_loss=0.0102, over 15087.00 frames. ], tot_loss[loss=0.08955, simple_loss=0.1077, pruned_loss=0.02499, audio_tagging_loss=0.01071, over 3050030.23 frames. ], batch size: 55, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:53:37,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=639080.0, ans=0.0 2023-11-19 07:53:42,486 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.50 vs. limit=12.0 2023-11-19 07:53:55,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=639146.6666666666, ans=0.1 2023-11-19 07:53:58,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=639213.3333333334, ans=0.1 2023-11-19 07:54:02,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.whiten.whitening_limit, batch_count=639213.3333333334, ans=12.0 2023-11-19 07:54:13,831 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.036e+01 8.154e+01 8.844e+01 9.528e+01 1.167e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 07:54:19,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=639346.6666666666, ans=0.0 2023-11-19 07:54:23,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=639346.6666666666, ans=0.0 2023-11-19 07:54:30,867 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11750, loss[loss=0.06057, simple_loss=0.07039, pruned_loss=0.01515, audio_tagging_loss=0.01022, over 14916.00 frames. ], tot_loss[loss=0.0898, simple_loss=0.1079, pruned_loss=0.02506, audio_tagging_loss=0.01077, over 3049091.97 frames. ], batch size: 58, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:54:50,962 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=639480.0, ans=0.125 2023-11-19 07:54:57,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=639546.6666666666, ans=0.05 2023-11-19 07:55:03,196 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.10 vs. limit=22.5 2023-11-19 07:55:23,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=639680.0, ans=0.0 2023-11-19 07:55:26,012 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11800, loss[loss=0.08509, simple_loss=0.09901, pruned_loss=0.02489, audio_tagging_loss=0.01069, over 14828.00 frames. ], tot_loss[loss=0.08998, simple_loss=0.1076, pruned_loss=0.02532, audio_tagging_loss=0.01085, over 3040147.38 frames. ], batch size: 58, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:55:31,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=639746.6666666666, ans=0.125 2023-11-19 07:55:57,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=639880.0, ans=0.0 2023-11-19 07:56:05,555 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.407e+01 8.609e+01 9.500e+01 1.074e+02 1.455e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-19 07:56:19,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=640013.3333333334, ans=0.125 2023-11-19 07:56:24,332 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11850, loss[loss=0.1075, simple_loss=0.1313, pruned_loss=0.03225, audio_tagging_loss=0.009624, over 16513.00 frames. ], tot_loss[loss=0.09087, simple_loss=0.109, pruned_loss=0.02554, audio_tagging_loss=0.01082, over 3046544.25 frames. ], batch size: 59, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:56:24,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=640080.0, ans=0.125 2023-11-19 07:56:34,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=640146.6666666666, ans=0.125 2023-11-19 07:56:35,186 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.31 vs. limit=15.0 2023-11-19 07:56:39,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=640146.6666666666, ans=0.125 2023-11-19 07:56:39,968 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.81 vs. limit=6.0 2023-11-19 07:56:40,524 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=640146.6666666666, ans=0.125 2023-11-19 07:56:53,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=640213.3333333334, ans=0.0 2023-11-19 07:57:03,593 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.96 vs. limit=6.0 2023-11-19 07:57:05,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=640280.0, ans=0.09899494936611666 2023-11-19 07:57:16,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=640346.6666666666, ans=0.0 2023-11-19 07:57:20,088 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11900, loss[loss=0.08465, simple_loss=0.1062, pruned_loss=0.02204, audio_tagging_loss=0.009518, over 14573.00 frames. ], tot_loss[loss=0.09051, simple_loss=0.1087, pruned_loss=0.02528, audio_tagging_loss=0.0109, over 3043930.33 frames. ], batch size: 54, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:57:29,629 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.40 vs. limit=12.0 2023-11-19 07:57:59,325 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.550e+01 8.414e+01 8.951e+01 9.837e+01 1.464e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 07:58:00,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=640613.3333333334, ans=0.125 2023-11-19 07:58:13,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=640680.0, ans=0.125 2023-11-19 07:58:13,793 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.18 vs. limit=15.0 2023-11-19 07:58:16,216 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 11950, loss[loss=0.1065, simple_loss=0.1262, pruned_loss=0.02851, audio_tagging_loss=0.01492, over 14464.00 frames. ], tot_loss[loss=0.09059, simple_loss=0.1086, pruned_loss=0.02535, audio_tagging_loss=0.01092, over 3050522.02 frames. ], batch size: 57, lr: 8.39e-03, grad_scale: 16.0 2023-11-19 07:58:22,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=640746.6666666666, ans=0.1 2023-11-19 07:58:31,179 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2023-11-19 07:58:55,733 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=640946.6666666666, ans=0.0 2023-11-19 07:59:10,347 INFO [train_asr.py:1115] (2/4) Epoch 8, batch 12000, loss[loss=0.09043, simple_loss=0.1125, pruned_loss=0.02247, audio_tagging_loss=0.01169, over 15771.00 frames. ], tot_loss[loss=0.09076, simple_loss=0.1087, pruned_loss=0.02546, audio_tagging_loss=0.01096, over 3053934.70 frames. ], batch size: 59, lr: 8.39e-03, grad_scale: 32.0 2023-11-19 07:59:10,348 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 07:59:37,795 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7438, 5.7667, 5.8201, 5.8810], device='cuda:2') 2023-11-19 07:59:43,000 INFO [train_asr.py:1147] (2/4) Epoch 8, validation: loss=0.06649, simple_loss=0.05653, pruned_loss=0.006961, audio_tagging_loss=0.03127, over 4681554.00 frames. 2023-11-19 07:59:43,001 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 07:59:45,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=641080.0, ans=0.125 2023-11-19 07:59:47,256 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:59:58,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=641146.6666666666, ans=0.125 2023-11-19 08:00:44,329 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 0, loss[loss=0.09581, simple_loss=0.1019, pruned_loss=0.02044, audio_tagging_loss=0.02441, over 15056.00 frames. ], tot_loss[loss=0.09581, simple_loss=0.1019, pruned_loss=0.02044, audio_tagging_loss=0.02441, over 15056.00 frames. ], batch size: 59, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:00:44,329 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 08:01:16,090 INFO [train_asr.py:1147] (2/4) Epoch 9, validation: loss=0.06566, simple_loss=0.05652, pruned_loss=0.006966, audio_tagging_loss=0.03043, over 4681554.00 frames. 2023-11-19 08:01:16,090 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 08:01:20,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=641240.0, ans=0.125 2023-11-19 08:01:28,796 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.618e+01 8.783e+01 9.637e+01 1.099e+02 1.400e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-19 08:01:33,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=641306.6666666666, ans=0.125 2023-11-19 08:01:35,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=641306.6666666666, ans=0.0 2023-11-19 08:01:55,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=641440.0, ans=0.1 2023-11-19 08:02:05,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=641506.6666666666, ans=15.0 2023-11-19 08:02:12,369 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 50, loss[loss=0.1068, simple_loss=0.1213, pruned_loss=0.02795, audio_tagging_loss=0.01821, over 14877.00 frames. ], tot_loss[loss=0.09872, simple_loss=0.1074, pruned_loss=0.02417, audio_tagging_loss=0.02083, over 690752.16 frames. ], batch size: 57, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:02:19,057 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:02:24,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=641640.0, ans=0.125 2023-11-19 08:02:30,154 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=641640.0, ans=0.125 2023-11-19 08:02:38,341 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.29 vs. limit=22.5 2023-11-19 08:02:49,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=641773.3333333334, ans=0.0 2023-11-19 08:02:58,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=641840.0, ans=0.0 2023-11-19 08:03:02,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=641840.0, ans=0.0 2023-11-19 08:03:07,990 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 100, loss[loss=0.0884, simple_loss=0.1098, pruned_loss=0.0164, audio_tagging_loss=0.01712, over 15733.00 frames. ], tot_loss[loss=0.09808, simple_loss=0.1074, pruned_loss=0.02443, audio_tagging_loss=0.01997, over 1207865.92 frames. ], batch size: 57, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:03:15,460 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=641906.6666666666, ans=0.125 2023-11-19 08:03:19,964 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.444e+01 8.656e+01 9.404e+01 1.019e+02 1.351e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-19 08:03:21,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=641973.3333333334, ans=0.1 2023-11-19 08:03:23,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=641973.3333333334, ans=0.125 2023-11-19 08:03:25,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=641973.3333333334, ans=0.1 2023-11-19 08:03:39,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=642040.0, ans=0.2 2023-11-19 08:03:42,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=642106.6666666666, ans=10.0 2023-11-19 08:03:46,615 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.48 vs. limit=22.5 2023-11-19 08:04:03,508 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 150, loss[loss=0.08107, simple_loss=0.09726, pruned_loss=0.01912, audio_tagging_loss=0.01332, over 14809.00 frames. ], tot_loss[loss=0.09589, simple_loss=0.1066, pruned_loss=0.0246, audio_tagging_loss=0.01797, over 1608261.61 frames. ], batch size: 57, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:04:11,094 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.07 vs. limit=22.5 2023-11-19 08:04:14,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=642306.6666666666, ans=10.0 2023-11-19 08:04:38,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=642440.0, ans=0.0 2023-11-19 08:04:51,136 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=642506.6666666666, ans=0.1 2023-11-19 08:04:52,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=642506.6666666666, ans=0.0 2023-11-19 08:04:53,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=642506.6666666666, ans=0.125 2023-11-19 08:04:59,888 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 200, loss[loss=0.08145, simple_loss=0.1082, pruned_loss=0.01765, audio_tagging_loss=0.009706, over 14783.00 frames. ], tot_loss[loss=0.09519, simple_loss=0.1086, pruned_loss=0.02523, audio_tagging_loss=0.01566, over 1926182.53 frames. ], batch size: 57, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:05:13,045 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.507e+01 8.518e+01 9.330e+01 1.026e+02 1.321e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-19 08:05:20,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=642706.6666666666, ans=0.125 2023-11-19 08:05:26,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=642706.6666666666, ans=0.125 2023-11-19 08:05:46,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=642840.0, ans=0.0 2023-11-19 08:05:53,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=642840.0, ans=0.0 2023-11-19 08:05:55,838 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 250, loss[loss=0.07701, simple_loss=0.09522, pruned_loss=0.01778, audio_tagging_loss=0.01162, over 15817.00 frames. ], tot_loss[loss=0.09195, simple_loss=0.1066, pruned_loss=0.02455, audio_tagging_loss=0.01409, over 2169462.16 frames. ], batch size: 58, lr: 7.93e-03, grad_scale: 16.0 2023-11-19 08:06:06,051 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.55 vs. limit=15.0 2023-11-19 08:06:07,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=642973.3333333334, ans=0.07 2023-11-19 08:06:23,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=643040.0, ans=0.125 2023-11-19 08:06:25,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=643040.0, ans=0.1 2023-11-19 08:06:35,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=643106.6666666666, ans=0.0 2023-11-19 08:06:51,156 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 300, loss[loss=0.09572, simple_loss=0.1168, pruned_loss=0.02979, audio_tagging_loss=0.007526, over 14702.00 frames. ], tot_loss[loss=0.09102, simple_loss=0.107, pruned_loss=0.02452, audio_tagging_loss=0.01302, over 2372536.79 frames. ], batch size: 54, lr: 7.93e-03, grad_scale: 16.0 2023-11-19 08:06:54,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=643240.0, ans=0.2 2023-11-19 08:07:04,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=643306.6666666666, ans=0.125 2023-11-19 08:07:05,327 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.313e+01 8.625e+01 9.241e+01 1.032e+02 1.343e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 08:07:05,807 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2023-11-19 08:07:06,528 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=643306.6666666666, ans=0.125 2023-11-19 08:07:15,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=643373.3333333334, ans=0.125 2023-11-19 08:07:15,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=643373.3333333334, ans=0.125 2023-11-19 08:07:27,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=643440.0, ans=0.125 2023-11-19 08:07:29,503 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.96 vs. limit=6.0 2023-11-19 08:07:34,491 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=643506.6666666666, ans=0.0 2023-11-19 08:07:45,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=643506.6666666666, ans=0.125 2023-11-19 08:07:47,537 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 350, loss[loss=0.09278, simple_loss=0.1098, pruned_loss=0.02789, audio_tagging_loss=0.01001, over 15758.00 frames. ], tot_loss[loss=0.09069, simple_loss=0.1074, pruned_loss=0.02468, audio_tagging_loss=0.01231, over 2524652.58 frames. ], batch size: 58, lr: 7.93e-03, grad_scale: 16.0 2023-11-19 08:07:51,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=643573.3333333334, ans=0.1 2023-11-19 08:07:57,691 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.20 vs. limit=12.0 2023-11-19 08:08:08,531 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:08:32,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=643840.0, ans=0.125 2023-11-19 08:08:43,342 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 400, loss[loss=0.07154, simple_loss=0.08286, pruned_loss=0.01722, audio_tagging_loss=0.01289, over 14599.00 frames. ], tot_loss[loss=0.09033, simple_loss=0.1076, pruned_loss=0.02466, audio_tagging_loss=0.01186, over 2634659.95 frames. ], batch size: 58, lr: 7.93e-03, grad_scale: 32.0 2023-11-19 08:08:53,486 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.17 vs. limit=15.0 2023-11-19 08:08:55,933 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 8.363e+01 9.025e+01 9.871e+01 1.227e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 08:09:33,876 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.66 vs. limit=15.0 2023-11-19 08:09:36,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=644173.3333333334, ans=0.0 2023-11-19 08:09:38,846 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 450, loss[loss=0.09227, simple_loss=0.1099, pruned_loss=0.026, audio_tagging_loss=0.01134, over 14517.00 frames. ], tot_loss[loss=0.0892, simple_loss=0.1065, pruned_loss=0.02428, audio_tagging_loss=0.01167, over 2724131.09 frames. ], batch size: 57, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:09:41,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=644240.0, ans=0.125 2023-11-19 08:09:57,926 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=644306.6666666666, ans=0.2 2023-11-19 08:09:59,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=644306.6666666666, ans=0.1 2023-11-19 08:10:24,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=644506.6666666666, ans=0.125 2023-11-19 08:10:26,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=644506.6666666666, ans=0.0 2023-11-19 08:10:31,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=644506.6666666666, ans=0.0 2023-11-19 08:10:35,243 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 500, loss[loss=0.08247, simple_loss=0.09354, pruned_loss=0.02257, audio_tagging_loss=0.01313, over 13845.00 frames. ], tot_loss[loss=0.08915, simple_loss=0.1065, pruned_loss=0.02449, audio_tagging_loss=0.01138, over 2801375.35 frames. ], batch size: 54, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:10:38,393 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.08 vs. limit=22.5 2023-11-19 08:10:48,618 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.872e+01 8.533e+01 9.443e+01 1.042e+02 1.372e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-19 08:11:00,457 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.97 vs. limit=10.0 2023-11-19 08:11:12,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=644773.3333333334, ans=0.125 2023-11-19 08:11:24,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=644840.0, ans=0.035 2023-11-19 08:11:31,022 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 550, loss[loss=0.09925, simple_loss=0.1193, pruned_loss=0.03117, audio_tagging_loss=0.008418, over 14527.00 frames. ], tot_loss[loss=0.08897, simple_loss=0.1067, pruned_loss=0.0244, audio_tagging_loss=0.01123, over 2852963.15 frames. ], batch size: 53, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:11:41,380 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:11:43,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=644973.3333333334, ans=0.0 2023-11-19 08:11:58,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=645040.0, ans=0.2 2023-11-19 08:12:04,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=645106.6666666666, ans=0.125 2023-11-19 08:12:15,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=645173.3333333334, ans=0.035 2023-11-19 08:12:26,848 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 600, loss[loss=0.08236, simple_loss=0.09466, pruned_loss=0.02378, audio_tagging_loss=0.01125, over 15060.00 frames. ], tot_loss[loss=0.08874, simple_loss=0.1064, pruned_loss=0.02438, audio_tagging_loss=0.01113, over 2893952.70 frames. ], batch size: 56, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:12:37,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=645306.6666666666, ans=0.0 2023-11-19 08:12:37,488 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.54 vs. limit=10.0 2023-11-19 08:12:40,044 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.132e+01 8.342e+01 9.026e+01 9.768e+01 1.504e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 08:12:40,333 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:12:43,296 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.595e-01 2023-11-19 08:13:08,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=645440.0, ans=0.125 2023-11-19 08:13:22,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=645573.3333333334, ans=0.0 2023-11-19 08:13:22,986 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 650, loss[loss=0.1175, simple_loss=0.1445, pruned_loss=0.03572, audio_tagging_loss=0.009481, over 15173.00 frames. ], tot_loss[loss=0.08878, simple_loss=0.1067, pruned_loss=0.02442, audio_tagging_loss=0.01102, over 2926435.81 frames. ], batch size: 56, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:14:15,603 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=645840.0, ans=0.0 2023-11-19 08:14:19,614 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 700, loss[loss=0.06313, simple_loss=0.07407, pruned_loss=0.01534, audio_tagging_loss=0.01076, over 15020.00 frames. ], tot_loss[loss=0.08889, simple_loss=0.1073, pruned_loss=0.02428, audio_tagging_loss=0.01095, over 2964196.53 frames. ], batch size: 61, lr: 7.91e-03, grad_scale: 16.0 2023-11-19 08:14:33,922 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.588e+01 8.287e+01 8.978e+01 1.006e+02 1.254e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 08:14:47,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=646040.0, ans=0.125 2023-11-19 08:15:12,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=646173.3333333334, ans=0.125 2023-11-19 08:15:15,502 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 750, loss[loss=0.06961, simple_loss=0.07557, pruned_loss=0.01819, audio_tagging_loss=0.01364, over 14324.00 frames. ], tot_loss[loss=0.08909, simple_loss=0.1073, pruned_loss=0.02443, audio_tagging_loss=0.01101, over 2979076.91 frames. ], batch size: 56, lr: 7.91e-03, grad_scale: 16.0 2023-11-19 08:15:24,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=646240.0, ans=0.1 2023-11-19 08:15:38,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=646373.3333333334, ans=0.1 2023-11-19 08:15:51,143 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.58 vs. limit=12.0 2023-11-19 08:16:11,347 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 800, loss[loss=0.08605, simple_loss=0.1025, pruned_loss=0.02149, audio_tagging_loss=0.01329, over 16384.00 frames. ], tot_loss[loss=0.08908, simple_loss=0.107, pruned_loss=0.02447, audio_tagging_loss=0.0111, over 2992042.89 frames. ], batch size: 61, lr: 7.91e-03, grad_scale: 32.0 2023-11-19 08:16:25,538 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.571e+01 9.363e+01 1.050e+02 1.472e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-19 08:16:33,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=646706.6666666666, ans=0.125 2023-11-19 08:16:43,280 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.66 vs. limit=15.0 2023-11-19 08:16:46,608 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.39 vs. limit=22.5 2023-11-19 08:17:03,366 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.58 vs. limit=22.5 2023-11-19 08:17:07,123 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 850, loss[loss=0.1191, simple_loss=0.1496, pruned_loss=0.03572, audio_tagging_loss=0.008613, over 15442.00 frames. ], tot_loss[loss=0.08954, simple_loss=0.1077, pruned_loss=0.02469, audio_tagging_loss=0.01098, over 3002078.31 frames. ], batch size: 56, lr: 7.91e-03, grad_scale: 32.0 2023-11-19 08:17:32,726 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=647040.0, ans=0.125 2023-11-19 08:18:02,488 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 900, loss[loss=0.08919, simple_loss=0.1135, pruned_loss=0.02082, audio_tagging_loss=0.01161, over 15718.00 frames. ], tot_loss[loss=0.09078, simple_loss=0.1092, pruned_loss=0.02513, audio_tagging_loss=0.01103, over 3018363.99 frames. ], batch size: 58, lr: 7.91e-03, grad_scale: 32.0 2023-11-19 08:18:04,892 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:18:09,404 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.17 vs. limit=12.0 2023-11-19 08:18:09,485 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.18 vs. limit=15.0 2023-11-19 08:18:16,859 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.327e+01 8.077e+01 9.345e+01 1.007e+02 1.276e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-19 08:18:24,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=647373.3333333334, ans=0.0 2023-11-19 08:18:24,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=647373.3333333334, ans=0.0 2023-11-19 08:18:28,761 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.12 vs. limit=22.5 2023-11-19 08:18:48,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=647506.6666666666, ans=0.0 2023-11-19 08:18:58,290 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 950, loss[loss=0.08236, simple_loss=0.1024, pruned_loss=0.02184, audio_tagging_loss=0.009299, over 15338.00 frames. ], tot_loss[loss=0.08994, simple_loss=0.1083, pruned_loss=0.02487, audio_tagging_loss=0.01093, over 3021498.46 frames. ], batch size: 56, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:19:04,822 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.54 vs. limit=12.0 2023-11-19 08:19:08,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=647640.0, ans=0.0 2023-11-19 08:19:10,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=647640.0, ans=0.2 2023-11-19 08:19:10,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=647640.0, ans=0.125 2023-11-19 08:19:40,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=647773.3333333334, ans=0.125 2023-11-19 08:19:53,960 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1000, loss[loss=0.0963, simple_loss=0.1114, pruned_loss=0.02776, audio_tagging_loss=0.01286, over 15064.00 frames. ], tot_loss[loss=0.08897, simple_loss=0.1071, pruned_loss=0.02462, audio_tagging_loss=0.01078, over 3025439.33 frames. ], batch size: 60, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:20:03,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=647906.6666666666, ans=0.125 2023-11-19 08:20:08,930 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.042e+01 8.007e+01 8.931e+01 9.562e+01 1.265e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-19 08:20:17,760 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:20:25,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=648040.0, ans=0.125 2023-11-19 08:20:47,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=648173.3333333334, ans=0.0 2023-11-19 08:20:50,068 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1050, loss[loss=0.08561, simple_loss=0.1034, pruned_loss=0.02193, audio_tagging_loss=0.01196, over 15705.00 frames. ], tot_loss[loss=0.08927, simple_loss=0.1077, pruned_loss=0.02478, audio_tagging_loss=0.01063, over 3039344.61 frames. ], batch size: 59, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:21:05,231 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=648306.6666666666, ans=0.0 2023-11-19 08:21:09,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=648306.6666666666, ans=0.125 2023-11-19 08:21:14,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_na.min_abs, batch_count=648373.3333333334, ans=0.02 2023-11-19 08:21:16,407 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=648373.3333333334, ans=0.0 2023-11-19 08:21:18,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=648373.3333333334, ans=0.0 2023-11-19 08:21:22,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=648440.0, ans=0.125 2023-11-19 08:21:39,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=648506.6666666666, ans=0.0 2023-11-19 08:21:39,471 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=648506.6666666666, ans=0.125 2023-11-19 08:21:46,129 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1100, loss[loss=0.08536, simple_loss=0.1072, pruned_loss=0.02255, audio_tagging_loss=0.0092, over 14552.00 frames. ], tot_loss[loss=0.08843, simple_loss=0.1068, pruned_loss=0.02443, audio_tagging_loss=0.01061, over 3030752.53 frames. ], batch size: 54, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:21:48,224 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:21:53,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=648573.3333333334, ans=0.125 2023-11-19 08:21:54,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=648573.3333333334, ans=0.0 2023-11-19 08:22:00,250 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.479e+01 9.483e+01 1.065e+02 1.916e+02, threshold=1.897e+02, percent-clipped=1.0 2023-11-19 08:22:12,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=648706.6666666666, ans=0.0 2023-11-19 08:22:30,449 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.00 vs. limit=22.5 2023-11-19 08:22:31,441 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2023-11-19 08:22:35,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=648840.0, ans=0.125 2023-11-19 08:22:40,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=648840.0, ans=0.1 2023-11-19 08:22:41,974 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1150, loss[loss=0.05827, simple_loss=0.06531, pruned_loss=0.01295, audio_tagging_loss=0.01267, over 14245.00 frames. ], tot_loss[loss=0.0884, simple_loss=0.1068, pruned_loss=0.02446, audio_tagging_loss=0.01052, over 3032235.74 frames. ], batch size: 55, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:22:59,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=648973.3333333334, ans=0.125 2023-11-19 08:23:01,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=648973.3333333334, ans=0.0 2023-11-19 08:23:08,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=649040.0, ans=0.0 2023-11-19 08:23:30,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=649173.3333333334, ans=0.0 2023-11-19 08:23:37,817 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1200, loss[loss=0.09553, simple_loss=0.1124, pruned_loss=0.02865, audio_tagging_loss=0.01066, over 15009.00 frames. ], tot_loss[loss=0.08806, simple_loss=0.106, pruned_loss=0.02453, audio_tagging_loss=0.01055, over 3030493.47 frames. ], batch size: 55, lr: 7.89e-03, grad_scale: 32.0 2023-11-19 08:23:42,442 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=649240.0, ans=0.0 2023-11-19 08:23:51,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=649306.6666666666, ans=0.125 2023-11-19 08:23:52,033 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.405e+01 8.142e+01 8.954e+01 1.003e+02 1.503e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-19 08:23:52,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=649306.6666666666, ans=0.2 2023-11-19 08:24:22,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=649506.6666666666, ans=0.125 2023-11-19 08:24:33,448 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1250, loss[loss=0.1129, simple_loss=0.1414, pruned_loss=0.03188, audio_tagging_loss=0.01031, over 15168.00 frames. ], tot_loss[loss=0.08861, simple_loss=0.1071, pruned_loss=0.02458, audio_tagging_loss=0.01048, over 3039703.81 frames. ], batch size: 56, lr: 7.89e-03, grad_scale: 32.0 2023-11-19 08:24:40,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=649573.3333333334, ans=0.1 2023-11-19 08:24:48,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=649640.0, ans=0.125 2023-11-19 08:25:03,252 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2023-11-19 08:25:19,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=649840.0, ans=0.5 2023-11-19 08:25:29,553 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1300, loss[loss=0.09736, simple_loss=0.1253, pruned_loss=0.02417, audio_tagging_loss=0.01055, over 15779.00 frames. ], tot_loss[loss=0.08925, simple_loss=0.1079, pruned_loss=0.02486, audio_tagging_loss=0.01041, over 3036469.37 frames. ], batch size: 59, lr: 7.89e-03, grad_scale: 16.0 2023-11-19 08:25:39,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=649973.3333333334, ans=0.0 2023-11-19 08:25:44,818 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.781e+01 8.314e+01 8.988e+01 1.010e+02 1.259e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 08:25:47,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=649973.3333333334, ans=0.125 2023-11-19 08:25:54,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=650040.0, ans=0.95 2023-11-19 08:26:10,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=650106.6666666666, ans=0.125 2023-11-19 08:26:17,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=650173.3333333334, ans=0.0 2023-11-19 08:26:25,335 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1350, loss[loss=0.08243, simple_loss=0.1073, pruned_loss=0.01938, audio_tagging_loss=0.009384, over 14488.00 frames. ], tot_loss[loss=0.08858, simple_loss=0.1071, pruned_loss=0.02467, audio_tagging_loss=0.01037, over 3036553.20 frames. ], batch size: 53, lr: 7.89e-03, grad_scale: 16.0 2023-11-19 08:26:28,023 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2023-11-19 08:26:47,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=650373.3333333334, ans=0.0 2023-11-19 08:26:58,159 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.82 vs. limit=15.0 2023-11-19 08:27:00,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=650440.0, ans=0.1 2023-11-19 08:27:05,207 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:27:11,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=650506.6666666666, ans=0.1 2023-11-19 08:27:16,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=650506.6666666666, ans=0.05 2023-11-19 08:27:20,256 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1400, loss[loss=0.09275, simple_loss=0.1161, pruned_loss=0.02332, audio_tagging_loss=0.01136, over 14964.00 frames. ], tot_loss[loss=0.08859, simple_loss=0.1072, pruned_loss=0.02451, audio_tagging_loss=0.01046, over 3046462.59 frames. ], batch size: 55, lr: 7.89e-03, grad_scale: 16.0 2023-11-19 08:27:26,914 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.86 vs. limit=10.0 2023-11-19 08:27:29,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=650573.3333333334, ans=0.2 2023-11-19 08:27:36,580 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.876e+01 8.323e+01 8.984e+01 9.924e+01 1.651e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-19 08:27:36,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=650640.0, ans=0.0 2023-11-19 08:27:48,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=650706.6666666666, ans=0.0 2023-11-19 08:28:10,632 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.61 vs. limit=15.0 2023-11-19 08:28:17,049 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1450, loss[loss=0.07832, simple_loss=0.08088, pruned_loss=0.0257, audio_tagging_loss=0.01218, over 16816.00 frames. ], tot_loss[loss=0.08779, simple_loss=0.1062, pruned_loss=0.02411, audio_tagging_loss=0.01055, over 3045662.78 frames. ], batch size: 67, lr: 7.88e-03, grad_scale: 16.0 2023-11-19 08:28:33,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=650973.3333333334, ans=0.2 2023-11-19 08:28:57,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=651106.6666666666, ans=0.125 2023-11-19 08:29:06,619 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2023-11-19 08:29:11,927 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2023-11-19 08:29:12,422 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1500, loss[loss=0.11, simple_loss=0.1369, pruned_loss=0.03194, audio_tagging_loss=0.009629, over 15279.00 frames. ], tot_loss[loss=0.08832, simple_loss=0.1068, pruned_loss=0.02434, audio_tagging_loss=0.01058, over 3037982.84 frames. ], batch size: 56, lr: 7.88e-03, grad_scale: 16.0 2023-11-19 08:29:12,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=651240.0, ans=10.0 2023-11-19 08:29:26,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=651306.6666666666, ans=0.95 2023-11-19 08:29:27,754 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.092e+01 8.592e+01 9.437e+01 1.054e+02 1.547e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-19 08:29:30,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=651306.6666666666, ans=0.125 2023-11-19 08:29:35,888 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=651373.3333333334, ans=0.2 2023-11-19 08:29:55,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=651440.0, ans=0.025 2023-11-19 08:29:56,710 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:30:01,554 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.68 vs. limit=15.0 2023-11-19 08:30:08,175 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1550, loss[loss=0.0832, simple_loss=0.1011, pruned_loss=0.01885, audio_tagging_loss=0.01381, over 14688.00 frames. ], tot_loss[loss=0.08799, simple_loss=0.1062, pruned_loss=0.02413, audio_tagging_loss=0.01075, over 3035271.45 frames. ], batch size: 55, lr: 7.88e-03, grad_scale: 16.0 2023-11-19 08:30:10,940 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2023-11-19 08:30:11,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=651573.3333333334, ans=0.1 2023-11-19 08:30:19,795 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.73 vs. limit=22.5 2023-11-19 08:30:26,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=651640.0, ans=0.125 2023-11-19 08:30:43,157 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=651773.3333333334, ans=0.0 2023-11-19 08:31:04,998 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1600, loss[loss=0.1098, simple_loss=0.133, pruned_loss=0.03274, audio_tagging_loss=0.0106, over 15339.00 frames. ], tot_loss[loss=0.08858, simple_loss=0.107, pruned_loss=0.0243, audio_tagging_loss=0.0108, over 3040783.36 frames. ], batch size: 56, lr: 7.88e-03, grad_scale: 32.0 2023-11-19 08:31:20,267 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.774e+01 8.197e+01 8.883e+01 9.741e+01 1.380e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-19 08:31:24,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=651973.3333333334, ans=0.125 2023-11-19 08:31:41,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=652106.6666666666, ans=0.0 2023-11-19 08:31:49,222 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=652173.3333333334, ans=0.125 2023-11-19 08:32:00,806 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1650, loss[loss=0.09783, simple_loss=0.1142, pruned_loss=0.03199, audio_tagging_loss=0.008758, over 14539.00 frames. ], tot_loss[loss=0.08877, simple_loss=0.1073, pruned_loss=0.02431, audio_tagging_loss=0.01083, over 3044064.31 frames. ], batch size: 57, lr: 7.88e-03, grad_scale: 32.0 2023-11-19 08:32:29,320 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2023-11-19 08:32:43,330 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=652440.0, ans=0.2 2023-11-19 08:32:49,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=652506.6666666666, ans=0.07 2023-11-19 08:32:56,437 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1700, loss[loss=0.07831, simple_loss=0.09993, pruned_loss=0.02019, audio_tagging_loss=0.008154, over 16530.00 frames. ], tot_loss[loss=0.08871, simple_loss=0.1071, pruned_loss=0.02429, audio_tagging_loss=0.01086, over 3043313.91 frames. ], batch size: 61, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:32:59,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=652573.3333333334, ans=0.0 2023-11-19 08:33:05,624 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=652573.3333333334, ans=0.125 2023-11-19 08:33:13,285 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.601e+01 9.234e+01 1.036e+02 1.381e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 08:33:20,553 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=652706.6666666666, ans=0.0 2023-11-19 08:33:29,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=652773.3333333334, ans=0.2 2023-11-19 08:33:30,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=652773.3333333334, ans=0.1 2023-11-19 08:33:31,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=652773.3333333334, ans=10.0 2023-11-19 08:33:36,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=652773.3333333334, ans=0.125 2023-11-19 08:33:36,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=652773.3333333334, ans=0.2 2023-11-19 08:33:37,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=652773.3333333334, ans=0.125 2023-11-19 08:33:52,661 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1750, loss[loss=0.08981, simple_loss=0.1022, pruned_loss=0.02594, audio_tagging_loss=0.01278, over 15020.00 frames. ], tot_loss[loss=0.08824, simple_loss=0.1067, pruned_loss=0.02418, audio_tagging_loss=0.0107, over 3047462.41 frames. ], batch size: 58, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:33:58,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=652906.6666666666, ans=0.125 2023-11-19 08:34:08,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=652973.3333333334, ans=0.125 2023-11-19 08:34:16,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=653040.0, ans=0.0 2023-11-19 08:34:35,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=653106.6666666666, ans=0.0 2023-11-19 08:34:38,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=653173.3333333334, ans=0.025 2023-11-19 08:34:43,555 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=653173.3333333334, ans=0.125 2023-11-19 08:34:48,524 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1800, loss[loss=0.05785, simple_loss=0.06465, pruned_loss=0.01344, audio_tagging_loss=0.01208, over 15025.00 frames. ], tot_loss[loss=0.08836, simple_loss=0.107, pruned_loss=0.02422, audio_tagging_loss=0.01064, over 3049811.39 frames. ], batch size: 57, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:34:49,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=653240.0, ans=0.0 2023-11-19 08:35:05,315 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.385e+01 9.235e+01 9.785e+01 1.398e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 08:35:20,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=653373.3333333334, ans=0.5 2023-11-19 08:35:24,668 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=653440.0, ans=0.125 2023-11-19 08:35:36,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=653506.6666666666, ans=0.125 2023-11-19 08:35:44,910 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1850, loss[loss=0.0713, simple_loss=0.0849, pruned_loss=0.01727, audio_tagging_loss=0.01158, over 15269.00 frames. ], tot_loss[loss=0.08827, simple_loss=0.1069, pruned_loss=0.02428, audio_tagging_loss=0.01053, over 3054550.72 frames. ], batch size: 58, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:35:56,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=653640.0, ans=0.07 2023-11-19 08:35:58,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=653640.0, ans=0.2 2023-11-19 08:36:16,049 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2023-11-19 08:36:25,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=653773.3333333334, ans=0.125 2023-11-19 08:36:34,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=653840.0, ans=0.2 2023-11-19 08:36:40,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=653906.6666666666, ans=0.125 2023-11-19 08:36:40,922 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1900, loss[loss=0.0987, simple_loss=0.1239, pruned_loss=0.02913, audio_tagging_loss=0.007612, over 14153.00 frames. ], tot_loss[loss=0.08808, simple_loss=0.1068, pruned_loss=0.02425, audio_tagging_loss=0.01044, over 3061326.51 frames. ], batch size: 52, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:36:43,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=653906.6666666666, ans=0.125 2023-11-19 08:36:53,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=653973.3333333334, ans=0.125 2023-11-19 08:36:57,970 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.243e+01 8.285e+01 9.091e+01 1.015e+02 1.507e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-19 08:37:13,761 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=15.0 2023-11-19 08:37:15,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=654106.6666666666, ans=0.0 2023-11-19 08:37:18,794 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=654106.6666666666, ans=0.125 2023-11-19 08:37:23,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=654106.6666666666, ans=0.0 2023-11-19 08:37:35,861 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=654240.0, ans=0.0 2023-11-19 08:37:36,685 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 1950, loss[loss=0.1056, simple_loss=0.1345, pruned_loss=0.02936, audio_tagging_loss=0.009001, over 15758.00 frames. ], tot_loss[loss=0.08726, simple_loss=0.106, pruned_loss=0.02386, audio_tagging_loss=0.01043, over 3056039.75 frames. ], batch size: 58, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:37:49,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=654306.6666666666, ans=0.125 2023-11-19 08:38:12,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=654440.0, ans=0.0 2023-11-19 08:38:20,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=654506.6666666666, ans=0.0 2023-11-19 08:38:26,625 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=654506.6666666666, ans=0.1 2023-11-19 08:38:32,851 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2000, loss[loss=0.05999, simple_loss=0.06312, pruned_loss=0.01285, audio_tagging_loss=0.01557, over 15700.00 frames. ], tot_loss[loss=0.08772, simple_loss=0.1061, pruned_loss=0.02416, audio_tagging_loss=0.0105, over 3057731.61 frames. ], batch size: 62, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:38:34,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=654573.3333333334, ans=0.07 2023-11-19 08:38:37,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=654573.3333333334, ans=0.125 2023-11-19 08:38:51,181 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.460e+01 8.516e+01 9.233e+01 1.009e+02 1.531e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 08:38:57,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=654706.6666666666, ans=0.1 2023-11-19 08:39:05,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=654773.3333333334, ans=0.1 2023-11-19 08:39:15,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=654773.3333333334, ans=0.125 2023-11-19 08:39:16,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=654840.0, ans=0.0 2023-11-19 08:39:22,254 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=654840.0, ans=0.1 2023-11-19 08:39:28,857 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2050, loss[loss=0.09792, simple_loss=0.1214, pruned_loss=0.02727, audio_tagging_loss=0.009957, over 15005.00 frames. ], tot_loss[loss=0.08759, simple_loss=0.106, pruned_loss=0.0241, audio_tagging_loss=0.01049, over 3050119.30 frames. ], batch size: 56, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:39:56,731 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-19 08:39:59,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=655040.0, ans=0.0 2023-11-19 08:40:06,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=655106.6666666666, ans=0.125 2023-11-19 08:40:25,126 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2100, loss[loss=0.08447, simple_loss=0.1082, pruned_loss=0.0218, audio_tagging_loss=0.008582, over 15097.00 frames. ], tot_loss[loss=0.08817, simple_loss=0.1069, pruned_loss=0.02427, audio_tagging_loss=0.01047, over 3055442.14 frames. ], batch size: 54, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:40:42,514 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.322e+01 9.112e+01 9.913e+01 1.952e+02, threshold=1.822e+02, percent-clipped=1.0 2023-11-19 08:41:20,754 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2150, loss[loss=0.08506, simple_loss=0.1062, pruned_loss=0.02349, audio_tagging_loss=0.008497, over 14811.00 frames. ], tot_loss[loss=0.08878, simple_loss=0.1075, pruned_loss=0.02455, audio_tagging_loss=0.01047, over 3053681.79 frames. ], batch size: 57, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:41:29,821 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.20 vs. limit=6.0 2023-11-19 08:41:41,215 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.307e-01 2023-11-19 08:41:42,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=655706.6666666666, ans=0.125 2023-11-19 08:41:44,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=655706.6666666666, ans=0.0 2023-11-19 08:41:46,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=655706.6666666666, ans=0.125 2023-11-19 08:41:53,799 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:41:57,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=655773.3333333334, ans=0.125 2023-11-19 08:41:57,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_na.min_abs, batch_count=655773.3333333334, ans=0.02 2023-11-19 08:42:10,315 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.79 vs. limit=22.5 2023-11-19 08:42:16,756 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2200, loss[loss=0.08654, simple_loss=0.09859, pruned_loss=0.02573, audio_tagging_loss=0.01152, over 14883.00 frames. ], tot_loss[loss=0.08907, simple_loss=0.1078, pruned_loss=0.02467, audio_tagging_loss=0.01049, over 3056699.72 frames. ], batch size: 56, lr: 7.85e-03, grad_scale: 16.0 2023-11-19 08:42:16,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=655906.6666666666, ans=0.1 2023-11-19 08:42:28,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=655973.3333333334, ans=0.125 2023-11-19 08:42:30,650 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.43 vs. limit=15.0 2023-11-19 08:42:31,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=655973.3333333334, ans=0.0 2023-11-19 08:42:34,502 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.518e+01 8.336e+01 8.931e+01 9.747e+01 1.930e+02, threshold=1.786e+02, percent-clipped=1.0 2023-11-19 08:42:34,790 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=655973.3333333334, ans=0.125 2023-11-19 08:42:35,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=655973.3333333334, ans=0.125 2023-11-19 08:42:43,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=656040.0, ans=0.0 2023-11-19 08:42:45,143 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.82 vs. limit=6.0 2023-11-19 08:42:51,798 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.11 vs. limit=12.0 2023-11-19 08:43:12,630 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2250, loss[loss=0.08719, simple_loss=0.1062, pruned_loss=0.02235, audio_tagging_loss=0.01172, over 15681.00 frames. ], tot_loss[loss=0.08926, simple_loss=0.108, pruned_loss=0.0248, audio_tagging_loss=0.01046, over 3064653.41 frames. ], batch size: 62, lr: 7.85e-03, grad_scale: 16.0 2023-11-19 08:43:22,545 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=656306.6666666666, ans=0.0 2023-11-19 08:43:31,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=656306.6666666666, ans=0.125 2023-11-19 08:44:08,703 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2300, loss[loss=0.07705, simple_loss=0.0949, pruned_loss=0.02149, audio_tagging_loss=0.008113, over 14863.00 frames. ], tot_loss[loss=0.08932, simple_loss=0.1081, pruned_loss=0.02468, audio_tagging_loss=0.01057, over 3061079.96 frames. ], batch size: 56, lr: 7.85e-03, grad_scale: 16.0 2023-11-19 08:44:26,703 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.756e+01 8.387e+01 9.079e+01 9.954e+01 1.397e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-19 08:44:57,979 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:45:04,357 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2350, loss[loss=0.1142, simple_loss=0.1411, pruned_loss=0.03497, audio_tagging_loss=0.008747, over 16329.00 frames. ], tot_loss[loss=0.08984, simple_loss=0.1088, pruned_loss=0.02488, audio_tagging_loss=0.01057, over 3057320.00 frames. ], batch size: 59, lr: 7.85e-03, grad_scale: 16.0 2023-11-19 08:45:04,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=656906.6666666666, ans=0.125 2023-11-19 08:45:07,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=656906.6666666666, ans=0.125 2023-11-19 08:45:54,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=657173.3333333334, ans=0.0 2023-11-19 08:46:00,325 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2400, loss[loss=0.0813, simple_loss=0.1002, pruned_loss=0.0212, audio_tagging_loss=0.009988, over 15512.00 frames. ], tot_loss[loss=0.08947, simple_loss=0.1082, pruned_loss=0.02466, audio_tagging_loss=0.0107, over 3051937.80 frames. ], batch size: 59, lr: 7.85e-03, grad_scale: 32.0 2023-11-19 08:46:03,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=657240.0, ans=0.2 2023-11-19 08:46:12,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=657306.6666666666, ans=0.0 2023-11-19 08:46:15,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=657306.6666666666, ans=0.125 2023-11-19 08:46:17,892 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 8.346e+01 9.299e+01 9.977e+01 1.391e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 08:46:22,351 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=657373.3333333334, ans=0.125 2023-11-19 08:46:54,236 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.21 vs. limit=15.0 2023-11-19 08:46:56,231 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2450, loss[loss=0.09867, simple_loss=0.121, pruned_loss=0.02988, audio_tagging_loss=0.008268, over 14799.00 frames. ], tot_loss[loss=0.08888, simple_loss=0.1074, pruned_loss=0.02443, audio_tagging_loss=0.01074, over 3050287.80 frames. ], batch size: 55, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:47:34,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=657773.3333333334, ans=0.125 2023-11-19 08:47:40,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=657840.0, ans=0.0 2023-11-19 08:47:46,447 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.04 vs. limit=15.0 2023-11-19 08:47:48,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=657840.0, ans=0.0 2023-11-19 08:47:51,715 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2500, loss[loss=0.07862, simple_loss=0.08619, pruned_loss=0.02268, audio_tagging_loss=0.01284, over 14308.00 frames. ], tot_loss[loss=0.08826, simple_loss=0.1063, pruned_loss=0.02417, audio_tagging_loss=0.01091, over 3048780.04 frames. ], batch size: 57, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:47:54,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=657906.6666666666, ans=0.125 2023-11-19 08:47:56,662 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=657906.6666666666, ans=0.125 2023-11-19 08:48:01,004 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=657906.6666666666, ans=0.125 2023-11-19 08:48:09,790 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.559e+01 8.238e+01 9.130e+01 9.958e+01 1.355e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-19 08:48:27,899 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=15.0 2023-11-19 08:48:48,242 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2550, loss[loss=0.1049, simple_loss=0.1335, pruned_loss=0.02719, audio_tagging_loss=0.01093, over 15269.00 frames. ], tot_loss[loss=0.08818, simple_loss=0.106, pruned_loss=0.0243, audio_tagging_loss=0.0109, over 3040993.86 frames. ], batch size: 56, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:48:53,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=658240.0, ans=0.125 2023-11-19 08:49:02,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=658306.6666666666, ans=0.2 2023-11-19 08:49:17,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=658373.3333333334, ans=0.125 2023-11-19 08:49:25,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=658440.0, ans=0.125 2023-11-19 08:49:35,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=658506.6666666666, ans=0.0 2023-11-19 08:49:43,987 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2600, loss[loss=0.07386, simple_loss=0.08421, pruned_loss=0.01855, audio_tagging_loss=0.01321, over 13936.00 frames. ], tot_loss[loss=0.08735, simple_loss=0.1049, pruned_loss=0.0241, audio_tagging_loss=0.01082, over 3044393.19 frames. ], batch size: 53, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:49:48,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=658573.3333333334, ans=0.125 2023-11-19 08:49:52,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=658573.3333333334, ans=0.125 2023-11-19 08:50:01,805 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.704e+01 9.558e+01 1.082e+02 2.151e+02, threshold=1.912e+02, percent-clipped=1.0 2023-11-19 08:50:20,723 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=658773.3333333334, ans=0.05 2023-11-19 08:50:21,933 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2023-11-19 08:50:33,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=658840.0, ans=0.125 2023-11-19 08:50:38,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=658840.0, ans=0.125 2023-11-19 08:50:40,081 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2650, loss[loss=0.09815, simple_loss=0.124, pruned_loss=0.02777, audio_tagging_loss=0.008405, over 15566.00 frames. ], tot_loss[loss=0.08734, simple_loss=0.1051, pruned_loss=0.02402, audio_tagging_loss=0.01075, over 3043389.29 frames. ], batch size: 57, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:51:13,698 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2023-11-19 08:51:36,941 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2700, loss[loss=0.1205, simple_loss=0.1446, pruned_loss=0.0381, audio_tagging_loss=0.01012, over 15362.00 frames. ], tot_loss[loss=0.0876, simple_loss=0.1059, pruned_loss=0.02415, audio_tagging_loss=0.01051, over 3052733.60 frames. ], batch size: 59, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:51:38,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=659240.0, ans=0.0 2023-11-19 08:51:47,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=659306.6666666666, ans=0.125 2023-11-19 08:51:54,344 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.151e+01 8.620e+01 9.365e+01 1.021e+02 1.535e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-19 08:51:56,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=659306.6666666666, ans=0.05 2023-11-19 08:51:57,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=659373.3333333334, ans=0.0 2023-11-19 08:52:09,896 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=659440.0, ans=0.125 2023-11-19 08:52:18,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=659440.0, ans=0.5 2023-11-19 08:52:21,822 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.27 vs. limit=15.0 2023-11-19 08:52:24,660 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=659506.6666666666, ans=0.125 2023-11-19 08:52:31,919 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2750, loss[loss=0.1047, simple_loss=0.1209, pruned_loss=0.03194, audio_tagging_loss=0.01235, over 15900.00 frames. ], tot_loss[loss=0.08732, simple_loss=0.1056, pruned_loss=0.02398, audio_tagging_loss=0.01052, over 3052536.51 frames. ], batch size: 60, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:52:53,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=659706.6666666666, ans=0.125 2023-11-19 08:52:54,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=659706.6666666666, ans=0.1 2023-11-19 08:53:00,137 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=659706.6666666666, ans=0.125 2023-11-19 08:53:05,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=659773.3333333334, ans=0.125 2023-11-19 08:53:18,437 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:53:19,644 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=659840.0, ans=0.1 2023-11-19 08:53:23,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=659840.0, ans=0.0 2023-11-19 08:53:26,881 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2800, loss[loss=0.07666, simple_loss=0.09389, pruned_loss=0.02102, audio_tagging_loss=0.008694, over 15883.00 frames. ], tot_loss[loss=0.08617, simple_loss=0.1041, pruned_loss=0.02358, audio_tagging_loss=0.01054, over 3050419.64 frames. ], batch size: 63, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:53:32,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=659906.6666666666, ans=0.125 2023-11-19 08:53:45,305 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.607e+01 8.203e+01 8.899e+01 9.500e+01 1.297e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 08:53:49,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=660040.0, ans=0.125 2023-11-19 08:53:50,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=660040.0, ans=0.125 2023-11-19 08:53:54,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=660040.0, ans=0.1 2023-11-19 08:54:08,260 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=660106.6666666666, ans=0.1 2023-11-19 08:54:22,320 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2850, loss[loss=0.08311, simple_loss=0.09621, pruned_loss=0.02801, audio_tagging_loss=0.006995, over 14616.00 frames. ], tot_loss[loss=0.08603, simple_loss=0.1041, pruned_loss=0.02352, audio_tagging_loss=0.01047, over 3039666.88 frames. ], batch size: 58, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:54:34,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=660306.6666666666, ans=0.0 2023-11-19 08:54:39,619 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=660306.6666666666, ans=0.0 2023-11-19 08:54:56,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=660440.0, ans=0.1 2023-11-19 08:55:18,650 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2900, loss[loss=0.08497, simple_loss=0.09822, pruned_loss=0.02505, audio_tagging_loss=0.01081, over 15135.00 frames. ], tot_loss[loss=0.08628, simple_loss=0.1044, pruned_loss=0.02355, audio_tagging_loss=0.01053, over 3036775.46 frames. ], batch size: 59, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:55:19,195 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.33 vs. limit=10.0 2023-11-19 08:55:31,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=660640.0, ans=0.1 2023-11-19 08:55:36,362 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.241e+01 8.534e+01 9.333e+01 1.020e+02 1.874e+02, threshold=1.867e+02, percent-clipped=1.0 2023-11-19 08:55:58,387 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=660773.3333333334, ans=0.0 2023-11-19 08:56:06,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=660840.0, ans=0.09899494936611666 2023-11-19 08:56:14,631 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 2950, loss[loss=0.05471, simple_loss=0.06068, pruned_loss=0.01173, audio_tagging_loss=0.01264, over 16674.00 frames. ], tot_loss[loss=0.08674, simple_loss=0.105, pruned_loss=0.02373, audio_tagging_loss=0.01051, over 3027067.79 frames. ], batch size: 64, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 08:56:18,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=660906.6666666666, ans=0.05 2023-11-19 08:56:20,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=660906.6666666666, ans=0.0 2023-11-19 08:56:27,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=660973.3333333334, ans=0.125 2023-11-19 08:57:01,958 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=661173.3333333334, ans=0.0 2023-11-19 08:57:06,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=661173.3333333334, ans=0.05 2023-11-19 08:57:08,098 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=22.5 2023-11-19 08:57:10,811 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3000, loss[loss=0.08238, simple_loss=0.1046, pruned_loss=0.02164, audio_tagging_loss=0.008438, over 14697.00 frames. ], tot_loss[loss=0.08764, simple_loss=0.1061, pruned_loss=0.0241, audio_tagging_loss=0.0105, over 3031948.41 frames. ], batch size: 54, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 08:57:10,812 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 08:57:42,042 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.5731, 3.7694, 4.3396, 3.4277], device='cuda:2') 2023-11-19 08:57:42,105 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.5420, 2.5614, 3.7959, 3.0813], device='cuda:2') 2023-11-19 08:57:44,073 INFO [train_asr.py:1147] (2/4) Epoch 9, validation: loss=0.06604, simple_loss=0.05618, pruned_loss=0.006775, audio_tagging_loss=0.03117, over 4681554.00 frames. 2023-11-19 08:57:44,074 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 08:58:01,249 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.176e+01 8.862e+01 9.645e+01 1.062e+02 1.575e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-19 08:58:13,029 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=661373.3333333334, ans=0.1 2023-11-19 08:58:21,463 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=661440.0, ans=15.0 2023-11-19 08:58:26,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=661440.0, ans=0.1 2023-11-19 08:58:31,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=661506.6666666666, ans=0.125 2023-11-19 08:58:31,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=661506.6666666666, ans=0.125 2023-11-19 08:58:31,496 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.81 vs. limit=15.0 2023-11-19 08:58:32,237 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=6.327e-02 2023-11-19 08:58:39,499 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3050, loss[loss=0.1113, simple_loss=0.1513, pruned_loss=0.02905, audio_tagging_loss=0.006629, over 16654.00 frames. ], tot_loss[loss=0.08919, simple_loss=0.1081, pruned_loss=0.02464, audio_tagging_loss=0.01048, over 3035644.21 frames. ], batch size: 60, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 08:59:00,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=661640.0, ans=0.125 2023-11-19 08:59:05,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=661706.6666666666, ans=0.125 2023-11-19 08:59:06,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=661706.6666666666, ans=0.125 2023-11-19 08:59:12,969 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:59:21,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=661773.3333333334, ans=0.125 2023-11-19 08:59:23,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=661773.3333333334, ans=0.1 2023-11-19 08:59:27,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=661840.0, ans=0.125 2023-11-19 08:59:28,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=661840.0, ans=0.125 2023-11-19 08:59:30,244 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.68 vs. limit=6.0 2023-11-19 08:59:36,848 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3100, loss[loss=0.1008, simple_loss=0.129, pruned_loss=0.02877, audio_tagging_loss=0.007566, over 15633.00 frames. ], tot_loss[loss=0.0898, simple_loss=0.1091, pruned_loss=0.02472, audio_tagging_loss=0.01051, over 3040344.44 frames. ], batch size: 57, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 08:59:51,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=661973.3333333334, ans=0.125 2023-11-19 08:59:54,977 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.514e+01 9.258e+01 1.059e+02 1.772e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-19 09:00:24,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=662173.3333333334, ans=0.0 2023-11-19 09:00:32,264 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3150, loss[loss=0.1079, simple_loss=0.1298, pruned_loss=0.03306, audio_tagging_loss=0.009932, over 14760.00 frames. ], tot_loss[loss=0.08976, simple_loss=0.1087, pruned_loss=0.02478, audio_tagging_loss=0.01062, over 3031405.81 frames. ], batch size: 56, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 09:00:49,819 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.86 vs. limit=15.0 2023-11-19 09:00:57,757 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.83 vs. limit=22.5 2023-11-19 09:01:28,537 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3200, loss[loss=0.07725, simple_loss=0.09336, pruned_loss=0.01982, audio_tagging_loss=0.01075, over 15947.00 frames. ], tot_loss[loss=0.08975, simple_loss=0.1086, pruned_loss=0.02475, audio_tagging_loss=0.01068, over 3033619.35 frames. ], batch size: 60, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 09:01:33,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=662573.3333333334, ans=0.125 2023-11-19 09:01:34,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=662573.3333333334, ans=0.125 2023-11-19 09:01:34,392 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.85 vs. limit=15.0 2023-11-19 09:01:46,777 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.390e+01 8.354e+01 9.192e+01 1.025e+02 1.348e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 09:01:49,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=662640.0, ans=0.1 2023-11-19 09:01:58,675 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=662706.6666666666, ans=0.125 2023-11-19 09:02:09,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=662773.3333333334, ans=0.125 2023-11-19 09:02:15,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=662840.0, ans=0.04949747468305833 2023-11-19 09:02:24,476 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3250, loss[loss=0.08776, simple_loss=0.1048, pruned_loss=0.02592, audio_tagging_loss=0.009419, over 14756.00 frames. ], tot_loss[loss=0.08935, simple_loss=0.1077, pruned_loss=0.0246, audio_tagging_loss=0.01088, over 3035481.89 frames. ], batch size: 56, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:03:20,475 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3300, loss[loss=0.06312, simple_loss=0.07721, pruned_loss=0.01283, audio_tagging_loss=0.01169, over 14257.00 frames. ], tot_loss[loss=0.08933, simple_loss=0.1077, pruned_loss=0.02451, audio_tagging_loss=0.01095, over 3044803.64 frames. ], batch size: 54, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:03:23,911 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:03:38,458 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.893e+01 9.711e+01 1.106e+02 1.510e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-19 09:03:52,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=663373.3333333334, ans=0.0 2023-11-19 09:03:53,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=663440.0, ans=0.0 2023-11-19 09:03:56,766 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=663440.0, ans=0.125 2023-11-19 09:04:03,184 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=663440.0, ans=0.125 2023-11-19 09:04:05,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=663506.6666666666, ans=0.0 2023-11-19 09:04:16,814 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3350, loss[loss=0.0836, simple_loss=0.1021, pruned_loss=0.0222, audio_tagging_loss=0.01036, over 15174.00 frames. ], tot_loss[loss=0.08875, simple_loss=0.1072, pruned_loss=0.02433, audio_tagging_loss=0.0108, over 3050913.63 frames. ], batch size: 57, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:04:23,712 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.76 vs. limit=15.0 2023-11-19 09:04:27,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=663640.0, ans=0.125 2023-11-19 09:04:42,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=663706.6666666666, ans=0.125 2023-11-19 09:04:58,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=663773.3333333334, ans=0.125 2023-11-19 09:05:10,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=663840.0, ans=15.0 2023-11-19 09:05:12,839 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3400, loss[loss=0.08374, simple_loss=0.1108, pruned_loss=0.01757, audio_tagging_loss=0.01077, over 16104.00 frames. ], tot_loss[loss=0.08892, simple_loss=0.1075, pruned_loss=0.02443, audio_tagging_loss=0.01073, over 3058000.36 frames. ], batch size: 59, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:05:23,599 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.05 vs. limit=6.0 2023-11-19 09:05:24,237 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:05:26,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=663973.3333333334, ans=0.125 2023-11-19 09:05:26,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=663973.3333333334, ans=0.2 2023-11-19 09:05:30,637 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.572e+01 8.652e+01 9.509e+01 1.042e+02 1.757e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 09:05:37,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=664040.0, ans=0.125 2023-11-19 09:05:51,699 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.15 vs. limit=15.0 2023-11-19 09:05:55,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=664106.6666666666, ans=0.125 2023-11-19 09:05:57,955 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=664173.3333333334, ans=0.125 2023-11-19 09:06:08,677 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3450, loss[loss=0.07334, simple_loss=0.09215, pruned_loss=0.01734, audio_tagging_loss=0.009924, over 14302.00 frames. ], tot_loss[loss=0.08876, simple_loss=0.1076, pruned_loss=0.02442, audio_tagging_loss=0.01056, over 3050059.74 frames. ], batch size: 53, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:06:22,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=664306.6666666666, ans=0.2 2023-11-19 09:06:26,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=664306.6666666666, ans=0.125 2023-11-19 09:06:30,398 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=664373.3333333334, ans=0.125 2023-11-19 09:06:46,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=664440.0, ans=0.125 2023-11-19 09:06:58,719 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.89 vs. limit=15.0 2023-11-19 09:07:04,548 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3500, loss[loss=0.06904, simple_loss=0.089, pruned_loss=0.01568, audio_tagging_loss=0.008858, over 15186.00 frames. ], tot_loss[loss=0.08779, simple_loss=0.1064, pruned_loss=0.02405, audio_tagging_loss=0.01054, over 3055310.98 frames. ], batch size: 57, lr: 7.80e-03, grad_scale: 32.0 2023-11-19 09:07:05,990 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=22.5 2023-11-19 09:07:17,945 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.73 vs. limit=15.0 2023-11-19 09:07:18,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=664640.0, ans=0.0 2023-11-19 09:07:22,595 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.188e+01 8.336e+01 8.969e+01 9.703e+01 1.247e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-19 09:07:32,797 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:07:42,998 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=664773.3333333334, ans=0.125 2023-11-19 09:07:45,519 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2023-11-19 09:07:46,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=664773.3333333334, ans=0.0 2023-11-19 09:07:53,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=664840.0, ans=0.125 2023-11-19 09:07:53,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=664840.0, ans=0.125 2023-11-19 09:07:53,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=664840.0, ans=0.07 2023-11-19 09:07:55,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=664840.0, ans=0.1 2023-11-19 09:08:00,887 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3550, loss[loss=0.1181, simple_loss=0.1445, pruned_loss=0.03711, audio_tagging_loss=0.008756, over 14770.00 frames. ], tot_loss[loss=0.08736, simple_loss=0.1059, pruned_loss=0.02396, audio_tagging_loss=0.01044, over 3053782.95 frames. ], batch size: 56, lr: 7.80e-03, grad_scale: 32.0 2023-11-19 09:08:24,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=665040.0, ans=0.125 2023-11-19 09:08:45,649 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=665173.3333333334, ans=0.125 2023-11-19 09:08:56,456 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3600, loss[loss=0.113, simple_loss=0.1426, pruned_loss=0.03223, audio_tagging_loss=0.009439, over 15906.00 frames. ], tot_loss[loss=0.08853, simple_loss=0.1074, pruned_loss=0.02446, audio_tagging_loss=0.01038, over 3059047.91 frames. ], batch size: 54, lr: 7.80e-03, grad_scale: 32.0 2023-11-19 09:09:13,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=665306.6666666666, ans=0.1 2023-11-19 09:09:15,147 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.775e+01 8.230e+01 8.845e+01 9.597e+01 1.384e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 09:09:23,137 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.13 vs. limit=12.0 2023-11-19 09:09:31,670 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.28 vs. limit=15.0 2023-11-19 09:09:34,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=665440.0, ans=0.0 2023-11-19 09:09:36,517 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.536e-01 2023-11-19 09:09:39,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=665440.0, ans=0.0 2023-11-19 09:09:41,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=665506.6666666666, ans=0.0 2023-11-19 09:09:52,793 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3650, loss[loss=0.08893, simple_loss=0.09944, pruned_loss=0.02519, audio_tagging_loss=0.01401, over 15058.00 frames. ], tot_loss[loss=0.08927, simple_loss=0.1083, pruned_loss=0.02476, audio_tagging_loss=0.01034, over 3057852.89 frames. ], batch size: 57, lr: 7.80e-03, grad_scale: 16.0 2023-11-19 09:10:23,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=665706.6666666666, ans=0.125 2023-11-19 09:10:38,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=665840.0, ans=0.0 2023-11-19 09:10:39,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=665840.0, ans=0.09899494936611666 2023-11-19 09:10:42,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=665840.0, ans=0.0 2023-11-19 09:10:45,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=665840.0, ans=0.0 2023-11-19 09:10:48,458 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3700, loss[loss=0.1001, simple_loss=0.1167, pruned_loss=0.03043, audio_tagging_loss=0.01129, over 16415.00 frames. ], tot_loss[loss=0.08951, simple_loss=0.1088, pruned_loss=0.02483, audio_tagging_loss=0.01027, over 3055702.53 frames. ], batch size: 60, lr: 7.80e-03, grad_scale: 16.0 2023-11-19 09:11:03,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=665973.3333333334, ans=0.0 2023-11-19 09:11:06,729 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.089e+01 8.367e+01 9.183e+01 1.016e+02 1.508e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-19 09:11:18,497 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.69 vs. limit=15.0 2023-11-19 09:11:20,646 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.74 vs. limit=15.0 2023-11-19 09:11:31,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=666173.3333333334, ans=0.125 2023-11-19 09:11:31,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=666173.3333333334, ans=0.0 2023-11-19 09:11:35,976 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.37 vs. limit=22.5 2023-11-19 09:11:43,398 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3750, loss[loss=0.07433, simple_loss=0.09091, pruned_loss=0.02092, audio_tagging_loss=0.007957, over 14046.00 frames. ], tot_loss[loss=0.08928, simple_loss=0.1083, pruned_loss=0.02472, audio_tagging_loss=0.01038, over 3056931.78 frames. ], batch size: 56, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:11:54,837 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.72 vs. limit=15.0 2023-11-19 09:12:18,708 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=666440.0, ans=0.125 2023-11-19 09:12:20,550 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:12:20,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=666440.0, ans=0.0 2023-11-19 09:12:37,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=666506.6666666666, ans=0.0 2023-11-19 09:12:39,254 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3800, loss[loss=0.09007, simple_loss=0.1005, pruned_loss=0.02747, audio_tagging_loss=0.01234, over 15439.00 frames. ], tot_loss[loss=0.08995, simple_loss=0.1092, pruned_loss=0.02492, audio_tagging_loss=0.01041, over 3060336.81 frames. ], batch size: 59, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:13:01,079 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.623e+01 8.849e+01 9.395e+01 1.021e+02 1.360e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 09:13:23,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=666773.3333333334, ans=0.125 2023-11-19 09:13:31,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=666840.0, ans=0.1 2023-11-19 09:13:32,250 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.12 vs. limit=22.5 2023-11-19 09:13:37,598 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3850, loss[loss=0.08998, simple_loss=0.115, pruned_loss=0.02396, audio_tagging_loss=0.008532, over 15809.00 frames. ], tot_loss[loss=0.08907, simple_loss=0.1078, pruned_loss=0.02452, audio_tagging_loss=0.01065, over 3053754.99 frames. ], batch size: 59, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:13:39,209 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.96 vs. limit=15.0 2023-11-19 09:13:40,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=666906.6666666666, ans=0.125 2023-11-19 09:13:40,513 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.53 vs. limit=12.0 2023-11-19 09:13:41,603 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:13:43,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=666906.6666666666, ans=0.07 2023-11-19 09:13:59,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=667040.0, ans=0.0 2023-11-19 09:14:09,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=667040.0, ans=0.125 2023-11-19 09:14:13,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=667106.6666666666, ans=0.125 2023-11-19 09:14:17,067 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2023-11-19 09:14:26,639 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=12.0 2023-11-19 09:14:27,770 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-11-19 09:14:29,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=667173.3333333334, ans=0.0 2023-11-19 09:14:30,155 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=667173.3333333334, ans=0.1 2023-11-19 09:14:33,710 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.32 vs. limit=22.5 2023-11-19 09:14:34,206 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3900, loss[loss=0.08213, simple_loss=0.09228, pruned_loss=0.01986, audio_tagging_loss=0.01613, over 14789.00 frames. ], tot_loss[loss=0.08987, simple_loss=0.1087, pruned_loss=0.0249, audio_tagging_loss=0.01061, over 3043971.57 frames. ], batch size: 57, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:14:34,388 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=667240.0, ans=0.2 2023-11-19 09:14:38,659 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=667240.0, ans=0.1 2023-11-19 09:14:42,859 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:14:52,705 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.594e+01 8.271e+01 8.958e+01 9.768e+01 1.292e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-19 09:15:10,226 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=667440.0, ans=0.125 2023-11-19 09:15:18,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=667506.6666666666, ans=0.125 2023-11-19 09:15:23,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=667506.6666666666, ans=0.125 2023-11-19 09:15:29,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=667573.3333333334, ans=0.0 2023-11-19 09:15:30,405 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 3950, loss[loss=0.08613, simple_loss=0.1037, pruned_loss=0.02615, audio_tagging_loss=0.008127, over 14636.00 frames. ], tot_loss[loss=0.08889, simple_loss=0.1073, pruned_loss=0.02456, audio_tagging_loss=0.0107, over 3045924.62 frames. ], batch size: 54, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:15:40,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=667640.0, ans=0.125 2023-11-19 09:15:46,656 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.61 vs. limit=8.0 2023-11-19 09:15:57,609 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.09 vs. limit=15.0 2023-11-19 09:16:00,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=667706.6666666666, ans=0.125 2023-11-19 09:16:02,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=667773.3333333334, ans=0.0 2023-11-19 09:16:11,998 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.52 vs. limit=22.5 2023-11-19 09:16:23,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=667840.0, ans=0.2 2023-11-19 09:16:25,232 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4000, loss[loss=0.08819, simple_loss=0.1086, pruned_loss=0.022, audio_tagging_loss=0.01187, over 14726.00 frames. ], tot_loss[loss=0.08898, simple_loss=0.1073, pruned_loss=0.02452, audio_tagging_loss=0.01083, over 3044116.49 frames. ], batch size: 56, lr: 7.78e-03, grad_scale: 32.0 2023-11-19 09:16:28,244 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=667906.6666666666, ans=0.125 2023-11-19 09:16:30,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=667906.6666666666, ans=15.0 2023-11-19 09:16:32,048 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.11 vs. limit=15.0 2023-11-19 09:16:33,512 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=667906.6666666666, ans=0.125 2023-11-19 09:16:45,244 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 8.595e+01 9.425e+01 1.030e+02 1.465e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-19 09:16:49,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=668040.0, ans=0.0 2023-11-19 09:16:57,642 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=668040.0, ans=0.09899494936611666 2023-11-19 09:17:10,084 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.90 vs. limit=12.0 2023-11-19 09:17:15,072 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=668173.3333333334, ans=0.125 2023-11-19 09:17:15,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=668173.3333333334, ans=15.0 2023-11-19 09:17:22,408 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4050, loss[loss=0.06924, simple_loss=0.07464, pruned_loss=0.01865, audio_tagging_loss=0.01327, over 16017.00 frames. ], tot_loss[loss=0.08928, simple_loss=0.1079, pruned_loss=0.02454, audio_tagging_loss=0.0108, over 3051707.24 frames. ], batch size: 61, lr: 7.78e-03, grad_scale: 32.0 2023-11-19 09:17:23,541 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:17:53,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=668373.3333333334, ans=0.1 2023-11-19 09:18:17,768 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4100, loss[loss=0.05405, simple_loss=0.06895, pruned_loss=0.00889, audio_tagging_loss=0.01069, over 15827.00 frames. ], tot_loss[loss=0.08933, simple_loss=0.1081, pruned_loss=0.02451, audio_tagging_loss=0.01077, over 3044077.36 frames. ], batch size: 61, lr: 7.78e-03, grad_scale: 16.0 2023-11-19 09:18:18,078 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=668573.3333333334, ans=0.125 2023-11-19 09:18:24,103 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2023-11-19 09:18:27,291 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2023-11-19 09:18:30,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=668640.0, ans=0.1 2023-11-19 09:18:33,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=668640.0, ans=0.125 2023-11-19 09:18:37,942 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 8.527e+01 9.124e+01 9.950e+01 1.525e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 09:19:04,282 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=668840.0, ans=0.125 2023-11-19 09:19:13,572 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4150, loss[loss=0.08631, simple_loss=0.106, pruned_loss=0.0229, audio_tagging_loss=0.01042, over 15408.00 frames. ], tot_loss[loss=0.08853, simple_loss=0.1073, pruned_loss=0.02424, audio_tagging_loss=0.01061, over 3042553.62 frames. ], batch size: 55, lr: 7.78e-03, grad_scale: 16.0 2023-11-19 09:19:15,401 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.85 vs. limit=15.0 2023-11-19 09:19:28,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=668973.3333333334, ans=0.125 2023-11-19 09:19:28,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=668973.3333333334, ans=0.04949747468305833 2023-11-19 09:19:28,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=668973.3333333334, ans=0.1 2023-11-19 09:19:50,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=669106.6666666666, ans=0.0 2023-11-19 09:19:52,693 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2023-11-19 09:19:53,155 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:19:56,616 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=669106.6666666666, ans=0.2 2023-11-19 09:19:58,010 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.97 vs. limit=22.5 2023-11-19 09:20:07,737 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=669173.3333333334, ans=10.0 2023-11-19 09:20:10,264 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4200, loss[loss=0.07795, simple_loss=0.09886, pruned_loss=0.02006, audio_tagging_loss=0.008467, over 15246.00 frames. ], tot_loss[loss=0.08835, simple_loss=0.107, pruned_loss=0.02425, audio_tagging_loss=0.01059, over 3042681.83 frames. ], batch size: 57, lr: 7.78e-03, grad_scale: 16.0 2023-11-19 09:20:22,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=669306.6666666666, ans=0.1 2023-11-19 09:20:30,746 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.637e+01 8.694e+01 9.811e+01 1.116e+02 1.412e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-19 09:20:49,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=669440.0, ans=0.125 2023-11-19 09:20:49,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=669440.0, ans=0.0 2023-11-19 09:21:03,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=669506.6666666666, ans=0.0 2023-11-19 09:21:06,478 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4250, loss[loss=0.09266, simple_loss=0.117, pruned_loss=0.02441, audio_tagging_loss=0.009728, over 16305.00 frames. ], tot_loss[loss=0.0878, simple_loss=0.1065, pruned_loss=0.02399, audio_tagging_loss=0.01053, over 3044382.66 frames. ], batch size: 62, lr: 7.77e-03, grad_scale: 16.0 2023-11-19 09:21:12,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=669573.3333333334, ans=0.2 2023-11-19 09:21:14,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=669573.3333333334, ans=0.125 2023-11-19 09:21:28,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=669706.6666666666, ans=0.125 2023-11-19 09:21:31,702 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.57 vs. limit=12.0 2023-11-19 09:21:45,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=669773.3333333334, ans=0.07 2023-11-19 09:21:50,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=669840.0, ans=0.1 2023-11-19 09:21:52,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=669840.0, ans=0.09899494936611666 2023-11-19 09:22:02,670 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4300, loss[loss=0.1099, simple_loss=0.1387, pruned_loss=0.03125, audio_tagging_loss=0.009326, over 14908.00 frames. ], tot_loss[loss=0.0893, simple_loss=0.1086, pruned_loss=0.02457, audio_tagging_loss=0.01041, over 3046460.64 frames. ], batch size: 53, lr: 7.77e-03, grad_scale: 16.0 2023-11-19 09:22:22,813 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.156e+01 8.641e+01 9.552e+01 1.070e+02 1.517e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-19 09:22:58,317 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4350, loss[loss=0.05726, simple_loss=0.06117, pruned_loss=0.01567, audio_tagging_loss=0.011, over 15962.00 frames. ], tot_loss[loss=0.08858, simple_loss=0.1074, pruned_loss=0.02441, audio_tagging_loss=0.01045, over 3044230.32 frames. ], batch size: 63, lr: 7.77e-03, grad_scale: 16.0 2023-11-19 09:23:15,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=670306.6666666666, ans=0.0 2023-11-19 09:23:47,979 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:23:54,176 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4400, loss[loss=0.06997, simple_loss=0.07704, pruned_loss=0.01709, audio_tagging_loss=0.01437, over 15802.00 frames. ], tot_loss[loss=0.08824, simple_loss=0.107, pruned_loss=0.02423, audio_tagging_loss=0.01052, over 3045211.91 frames. ], batch size: 61, lr: 7.77e-03, grad_scale: 32.0 2023-11-19 09:23:54,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=670573.3333333334, ans=0.09899494936611666 2023-11-19 09:23:57,482 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=670573.3333333334, ans=0.1 2023-11-19 09:24:03,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=670573.3333333334, ans=0.1 2023-11-19 09:24:14,412 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.076e+01 8.724e+01 9.307e+01 1.083e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-19 09:24:27,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=670773.3333333334, ans=0.125 2023-11-19 09:24:32,543 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=670773.3333333334, ans=0.125 2023-11-19 09:24:34,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=670773.3333333334, ans=0.125 2023-11-19 09:24:35,655 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:24:42,330 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2023-11-19 09:24:49,410 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.56 vs. limit=15.0 2023-11-19 09:24:49,634 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4450, loss[loss=0.1144, simple_loss=0.1408, pruned_loss=0.03462, audio_tagging_loss=0.009394, over 15276.00 frames. ], tot_loss[loss=0.08823, simple_loss=0.1069, pruned_loss=0.02419, audio_tagging_loss=0.01056, over 3049142.14 frames. ], batch size: 59, lr: 7.77e-03, grad_scale: 32.0 2023-11-19 09:25:05,109 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:25:10,920 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=671040.0, ans=0.0 2023-11-19 09:25:16,441 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.02 vs. limit=6.0 2023-11-19 09:25:32,862 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.08 vs. limit=15.0 2023-11-19 09:25:45,406 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4500, loss[loss=0.07431, simple_loss=0.08989, pruned_loss=0.01353, audio_tagging_loss=0.01583, over 15445.00 frames. ], tot_loss[loss=0.0885, simple_loss=0.1077, pruned_loss=0.02418, audio_tagging_loss=0.01048, over 3047216.16 frames. ], batch size: 58, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:25:52,636 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=671240.0, ans=0.0 2023-11-19 09:26:01,103 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=671306.6666666666, ans=0.1 2023-11-19 09:26:06,066 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 8.307e+01 9.155e+01 9.901e+01 1.565e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 09:26:22,739 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=671440.0, ans=0.0 2023-11-19 09:26:25,265 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.36 vs. limit=15.0 2023-11-19 09:26:27,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=671440.0, ans=0.2 2023-11-19 09:26:32,782 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=671506.6666666666, ans=0.0 2023-11-19 09:26:36,310 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.81 vs. limit=15.0 2023-11-19 09:26:39,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=671506.6666666666, ans=0.125 2023-11-19 09:26:41,071 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4550, loss[loss=0.09457, simple_loss=0.1186, pruned_loss=0.02542, audio_tagging_loss=0.009863, over 15011.00 frames. ], tot_loss[loss=0.08769, simple_loss=0.1068, pruned_loss=0.02387, audio_tagging_loss=0.01041, over 3047531.64 frames. ], batch size: 56, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:26:44,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=671573.3333333334, ans=22.5 2023-11-19 09:27:22,356 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:27:36,508 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4600, loss[loss=0.08817, simple_loss=0.1078, pruned_loss=0.02241, audio_tagging_loss=0.01187, over 14835.00 frames. ], tot_loss[loss=0.08781, simple_loss=0.1068, pruned_loss=0.02392, audio_tagging_loss=0.0105, over 3048334.24 frames. ], batch size: 54, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:27:39,529 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.34 vs. limit=10.0 2023-11-19 09:27:46,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=671906.6666666666, ans=0.0 2023-11-19 09:27:48,269 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=671973.3333333334, ans=0.1 2023-11-19 09:27:51,412 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=671973.3333333334, ans=0.0 2023-11-19 09:27:52,714 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=671973.3333333334, ans=0.2 2023-11-19 09:27:58,267 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.100e+01 8.562e+01 9.559e+01 1.086e+02 1.814e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-19 09:28:04,037 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.94 vs. limit=15.0 2023-11-19 09:28:05,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=672040.0, ans=0.025 2023-11-19 09:28:05,886 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=672040.0, ans=0.09899494936611666 2023-11-19 09:28:05,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=672040.0, ans=0.125 2023-11-19 09:28:12,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=672106.6666666666, ans=0.125 2023-11-19 09:28:23,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=672173.3333333334, ans=0.1 2023-11-19 09:28:29,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=672173.3333333334, ans=0.125 2023-11-19 09:28:32,581 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4650, loss[loss=0.125, simple_loss=0.1645, pruned_loss=0.03546, audio_tagging_loss=0.007334, over 15846.00 frames. ], tot_loss[loss=0.08723, simple_loss=0.1058, pruned_loss=0.02376, audio_tagging_loss=0.01058, over 3038608.07 frames. ], batch size: 56, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:28:32,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=672240.0, ans=0.1 2023-11-19 09:28:48,239 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:28:48,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=672306.6666666666, ans=0.125 2023-11-19 09:28:49,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=672306.6666666666, ans=0.125 2023-11-19 09:28:55,016 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-11-19 09:29:22,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=672506.6666666666, ans=0.2 2023-11-19 09:29:28,493 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4700, loss[loss=0.08562, simple_loss=0.1007, pruned_loss=0.02359, audio_tagging_loss=0.01166, over 14695.00 frames. ], tot_loss[loss=0.0877, simple_loss=0.106, pruned_loss=0.02398, audio_tagging_loss=0.0107, over 3044931.03 frames. ], batch size: 57, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:29:28,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=672573.3333333334, ans=0.125 2023-11-19 09:29:29,930 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.51 vs. limit=15.0 2023-11-19 09:29:30,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=672573.3333333334, ans=0.125 2023-11-19 09:29:39,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=672640.0, ans=0.0 2023-11-19 09:29:45,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=672640.0, ans=0.125 2023-11-19 09:29:47,341 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=672640.0, ans=0.0 2023-11-19 09:29:47,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=672640.0, ans=0.0 2023-11-19 09:29:49,808 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.777e+01 8.341e+01 9.226e+01 1.015e+02 1.641e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 09:29:54,273 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2023-11-19 09:29:55,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=672706.6666666666, ans=0.2 2023-11-19 09:30:07,169 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=672773.3333333334, ans=0.125 2023-11-19 09:30:11,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=672773.3333333334, ans=0.125 2023-11-19 09:30:16,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=672840.0, ans=0.0 2023-11-19 09:30:18,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=672840.0, ans=0.125 2023-11-19 09:30:21,338 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=672840.0, ans=0.04949747468305833 2023-11-19 09:30:24,325 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4750, loss[loss=0.09317, simple_loss=0.113, pruned_loss=0.02554, audio_tagging_loss=0.01112, over 15039.00 frames. ], tot_loss[loss=0.08756, simple_loss=0.1056, pruned_loss=0.02399, audio_tagging_loss=0.01078, over 3047166.16 frames. ], batch size: 57, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:30:28,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=672906.6666666666, ans=0.125 2023-11-19 09:30:36,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=672973.3333333334, ans=0.2 2023-11-19 09:30:47,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=673040.0, ans=0.0 2023-11-19 09:30:47,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=673040.0, ans=0.0 2023-11-19 09:30:57,234 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.98 vs. limit=10.0 2023-11-19 09:30:59,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=673106.6666666666, ans=0.125 2023-11-19 09:31:02,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=673106.6666666666, ans=0.125 2023-11-19 09:31:18,638 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=673173.3333333334, ans=0.0 2023-11-19 09:31:19,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=673240.0, ans=0.04949747468305833 2023-11-19 09:31:20,473 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4800, loss[loss=0.1067, simple_loss=0.1154, pruned_loss=0.03327, audio_tagging_loss=0.0157, over 15901.00 frames. ], tot_loss[loss=0.08829, simple_loss=0.1066, pruned_loss=0.0242, audio_tagging_loss=0.0108, over 3045424.30 frames. ], batch size: 59, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:31:31,195 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.36 vs. limit=12.0 2023-11-19 09:31:41,636 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.288e+01 8.950e+01 9.768e+01 1.286e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 09:31:46,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=673373.3333333334, ans=0.125 2023-11-19 09:32:06,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=673506.6666666666, ans=0.1 2023-11-19 09:32:07,812 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=673506.6666666666, ans=0.1 2023-11-19 09:32:16,666 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4850, loss[loss=0.07302, simple_loss=0.09041, pruned_loss=0.01706, audio_tagging_loss=0.01076, over 15775.00 frames. ], tot_loss[loss=0.08819, simple_loss=0.1063, pruned_loss=0.02414, audio_tagging_loss=0.01092, over 3043846.87 frames. ], batch size: 57, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:32:18,176 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.38 vs. limit=15.0 2023-11-19 09:32:32,308 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=673640.0, ans=0.0 2023-11-19 09:32:51,535 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=673773.3333333334, ans=0.125 2023-11-19 09:33:12,514 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4900, loss[loss=0.1011, simple_loss=0.1191, pruned_loss=0.03103, audio_tagging_loss=0.01052, over 15232.00 frames. ], tot_loss[loss=0.08819, simple_loss=0.1062, pruned_loss=0.02417, audio_tagging_loss=0.01091, over 3043976.06 frames. ], batch size: 57, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:33:18,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=673906.6666666666, ans=0.125 2023-11-19 09:33:19,293 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2023-11-19 09:33:27,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=673973.3333333334, ans=0.125 2023-11-19 09:33:33,593 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.273e+01 8.373e+01 9.037e+01 1.012e+02 1.386e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 09:33:45,313 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.22 vs. limit=15.0 2023-11-19 09:34:07,882 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 4950, loss[loss=0.09339, simple_loss=0.127, pruned_loss=0.02397, audio_tagging_loss=0.005951, over 15559.00 frames. ], tot_loss[loss=0.08874, simple_loss=0.1073, pruned_loss=0.02434, audio_tagging_loss=0.01075, over 3044302.78 frames. ], batch size: 54, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:34:08,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=674240.0, ans=0.5 2023-11-19 09:34:09,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=674240.0, ans=0.125 2023-11-19 09:34:30,402 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=674373.3333333334, ans=0.125 2023-11-19 09:34:34,618 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=674373.3333333334, ans=0.5 2023-11-19 09:34:47,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=674440.0, ans=0.0 2023-11-19 09:34:52,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=674506.6666666666, ans=0.0 2023-11-19 09:34:54,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=674506.6666666666, ans=0.0 2023-11-19 09:34:57,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=674506.6666666666, ans=0.2 2023-11-19 09:34:57,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=674506.6666666666, ans=0.125 2023-11-19 09:34:59,470 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=674506.6666666666, ans=0.0 2023-11-19 09:35:04,108 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5000, loss[loss=0.07144, simple_loss=0.08978, pruned_loss=0.01606, audio_tagging_loss=0.01048, over 15731.00 frames. ], tot_loss[loss=0.08831, simple_loss=0.107, pruned_loss=0.02411, audio_tagging_loss=0.01072, over 3035853.65 frames. ], batch size: 60, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:35:07,502 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=674573.3333333334, ans=0.1 2023-11-19 09:35:25,265 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.803e+01 8.355e+01 9.053e+01 1.007e+02 1.287e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 09:35:37,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=674773.3333333334, ans=0.125 2023-11-19 09:35:59,630 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5050, loss[loss=0.07667, simple_loss=0.08563, pruned_loss=0.02087, audio_tagging_loss=0.01298, over 15355.00 frames. ], tot_loss[loss=0.08896, simple_loss=0.1079, pruned_loss=0.02443, audio_tagging_loss=0.01059, over 3034036.43 frames. ], batch size: 57, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:36:14,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=674973.3333333334, ans=0.04949747468305833 2023-11-19 09:36:21,008 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=675040.0, ans=0.125 2023-11-19 09:36:22,008 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:36:23,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=675040.0, ans=0.0 2023-11-19 09:36:26,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=675040.0, ans=0.0 2023-11-19 09:36:30,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=675040.0, ans=0.0 2023-11-19 09:36:35,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=675106.6666666666, ans=0.2 2023-11-19 09:36:40,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=675106.6666666666, ans=0.0 2023-11-19 09:36:55,047 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5100, loss[loss=0.0961, simple_loss=0.1211, pruned_loss=0.02637, audio_tagging_loss=0.009201, over 16162.00 frames. ], tot_loss[loss=0.08821, simple_loss=0.107, pruned_loss=0.02409, audio_tagging_loss=0.01061, over 3034248.95 frames. ], batch size: 59, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:37:04,410 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.27 vs. limit=15.0 2023-11-19 09:37:11,123 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=675306.6666666666, ans=0.0 2023-11-19 09:37:16,151 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.361e+01 9.263e+01 1.052e+02 1.984e+02, threshold=1.853e+02, percent-clipped=1.0 2023-11-19 09:37:18,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=675373.3333333334, ans=0.07 2023-11-19 09:37:36,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=675440.0, ans=0.0 2023-11-19 09:37:40,699 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=675506.6666666666, ans=0.0 2023-11-19 09:37:43,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=675506.6666666666, ans=0.125 2023-11-19 09:37:51,133 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5150, loss[loss=0.09376, simple_loss=0.1159, pruned_loss=0.02488, audio_tagging_loss=0.01093, over 17161.00 frames. ], tot_loss[loss=0.08818, simple_loss=0.1072, pruned_loss=0.02408, audio_tagging_loss=0.01049, over 3036912.52 frames. ], batch size: 65, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:37:52,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=675573.3333333334, ans=0.125 2023-11-19 09:38:03,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=675640.0, ans=0.125 2023-11-19 09:38:06,183 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675640.0, ans=0.1 2023-11-19 09:38:18,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=675706.6666666666, ans=0.0 2023-11-19 09:38:23,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=675773.3333333334, ans=0.05 2023-11-19 09:38:23,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=675773.3333333334, ans=0.1 2023-11-19 09:38:23,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=675773.3333333334, ans=0.125 2023-11-19 09:38:25,146 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=675773.3333333334, ans=0.0 2023-11-19 09:38:46,052 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5200, loss[loss=0.07676, simple_loss=0.0897, pruned_loss=0.0202, audio_tagging_loss=0.01171, over 15983.00 frames. ], tot_loss[loss=0.08852, simple_loss=0.1077, pruned_loss=0.02424, audio_tagging_loss=0.0104, over 3030239.19 frames. ], batch size: 60, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:38:46,467 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.20 vs. limit=15.0 2023-11-19 09:39:01,746 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=675973.3333333334, ans=0.0 2023-11-19 09:39:04,741 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2023-11-19 09:39:07,236 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.810e+01 8.521e+01 9.161e+01 1.002e+02 1.521e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 09:39:36,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=676173.3333333334, ans=0.05 2023-11-19 09:39:41,616 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5250, loss[loss=0.05929, simple_loss=0.06687, pruned_loss=0.014, audio_tagging_loss=0.01186, over 15060.00 frames. ], tot_loss[loss=0.08786, simple_loss=0.1068, pruned_loss=0.02406, audio_tagging_loss=0.0104, over 3034616.03 frames. ], batch size: 57, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:39:41,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=676240.0, ans=0.125 2023-11-19 09:39:51,911 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=676306.6666666666, ans=0.1 2023-11-19 09:39:54,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=676306.6666666666, ans=0.2 2023-11-19 09:39:58,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=676306.6666666666, ans=0.1 2023-11-19 09:40:08,671 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=676373.3333333334, ans=0.125 2023-11-19 09:40:14,962 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.68 vs. limit=15.0 2023-11-19 09:40:22,108 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2023-11-19 09:40:37,335 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5300, loss[loss=0.08584, simple_loss=0.1016, pruned_loss=0.02179, audio_tagging_loss=0.01325, over 14679.00 frames. ], tot_loss[loss=0.08875, simple_loss=0.1078, pruned_loss=0.02442, audio_tagging_loss=0.01044, over 3032281.40 frames. ], batch size: 57, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:40:38,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=676573.3333333334, ans=0.1 2023-11-19 09:40:41,987 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.41 vs. limit=22.5 2023-11-19 09:40:45,549 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=676573.3333333334, ans=0.1 2023-11-19 09:40:47,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=676640.0, ans=0.1 2023-11-19 09:40:48,608 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=676640.0, ans=0.1 2023-11-19 09:40:55,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=676640.0, ans=0.2 2023-11-19 09:40:56,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=676640.0, ans=0.125 2023-11-19 09:40:58,530 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.778e+01 8.435e+01 9.072e+01 1.015e+02 1.516e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-19 09:40:59,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=676706.6666666666, ans=0.1 2023-11-19 09:41:00,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=676706.6666666666, ans=0.125 2023-11-19 09:41:10,313 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2023-11-19 09:41:21,091 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=676840.0, ans=0.125 2023-11-19 09:41:29,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=676840.0, ans=0.0 2023-11-19 09:41:32,752 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5350, loss[loss=0.1051, simple_loss=0.1343, pruned_loss=0.03005, audio_tagging_loss=0.007872, over 16221.00 frames. ], tot_loss[loss=0.08922, simple_loss=0.1085, pruned_loss=0.02464, audio_tagging_loss=0.01035, over 3032933.44 frames. ], batch size: 57, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:41:36,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=676906.6666666666, ans=0.125 2023-11-19 09:41:37,230 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=676906.6666666666, ans=0.2 2023-11-19 09:41:54,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=677040.0, ans=0.0 2023-11-19 09:41:56,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=677040.0, ans=0.0 2023-11-19 09:42:00,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=677040.0, ans=0.2 2023-11-19 09:42:06,283 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=677106.6666666666, ans=0.2 2023-11-19 09:42:14,607 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.32 vs. limit=15.0 2023-11-19 09:42:17,336 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=677173.3333333334, ans=0.125 2023-11-19 09:42:17,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=677173.3333333334, ans=0.125 2023-11-19 09:42:28,292 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5400, loss[loss=0.08959, simple_loss=0.1182, pruned_loss=0.02272, audio_tagging_loss=0.007762, over 16568.00 frames. ], tot_loss[loss=0.08873, simple_loss=0.1077, pruned_loss=0.02441, audio_tagging_loss=0.01049, over 3033974.76 frames. ], batch size: 62, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:42:35,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=677240.0, ans=0.125 2023-11-19 09:42:37,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=677240.0, ans=0.2 2023-11-19 09:42:49,670 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 8.356e+01 9.040e+01 1.006e+02 1.272e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 09:43:02,149 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=677440.0, ans=0.05 2023-11-19 09:43:03,584 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=677440.0, ans=0.1 2023-11-19 09:43:04,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=677440.0, ans=0.125 2023-11-19 09:43:07,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=677440.0, ans=0.2 2023-11-19 09:43:20,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=677506.6666666666, ans=0.2 2023-11-19 09:43:24,107 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5450, loss[loss=0.0802, simple_loss=0.09633, pruned_loss=0.02024, audio_tagging_loss=0.0118, over 14469.00 frames. ], tot_loss[loss=0.0892, simple_loss=0.1079, pruned_loss=0.02458, audio_tagging_loss=0.01066, over 3033227.10 frames. ], batch size: 55, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:43:30,285 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:43:42,450 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=677640.0, ans=0.0 2023-11-19 09:44:03,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=677773.3333333334, ans=0.125 2023-11-19 09:44:13,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=677840.0, ans=0.0 2023-11-19 09:44:20,025 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5500, loss[loss=0.08664, simple_loss=0.1032, pruned_loss=0.02338, audio_tagging_loss=0.01168, over 15692.00 frames. ], tot_loss[loss=0.08896, simple_loss=0.1075, pruned_loss=0.02456, audio_tagging_loss=0.01067, over 3035856.47 frames. ], batch size: 59, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:44:37,404 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=677973.3333333334, ans=0.2 2023-11-19 09:44:41,360 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.921e+01 8.594e+01 9.259e+01 1.001e+02 1.326e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-19 09:45:12,970 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2023-11-19 09:45:13,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=678173.3333333334, ans=0.125 2023-11-19 09:45:16,093 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2023-11-19 09:45:16,525 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5550, loss[loss=0.08282, simple_loss=0.1028, pruned_loss=0.01952, audio_tagging_loss=0.0119, over 14757.00 frames. ], tot_loss[loss=0.08992, simple_loss=0.1086, pruned_loss=0.02489, audio_tagging_loss=0.01071, over 3040104.04 frames. ], batch size: 55, lr: 7.72e-03, grad_scale: 32.0 2023-11-19 09:45:32,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=678306.6666666666, ans=0.1 2023-11-19 09:45:38,587 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.22 vs. limit=15.0 2023-11-19 09:45:54,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=678440.0, ans=0.125 2023-11-19 09:45:59,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=678440.0, ans=0.125 2023-11-19 09:46:12,186 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5600, loss[loss=0.1117, simple_loss=0.1327, pruned_loss=0.03624, audio_tagging_loss=0.009124, over 15566.00 frames. ], tot_loss[loss=0.08994, simple_loss=0.1087, pruned_loss=0.02481, audio_tagging_loss=0.01075, over 3039816.08 frames. ], batch size: 57, lr: 7.72e-03, grad_scale: 32.0 2023-11-19 09:46:34,638 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 8.355e+01 9.090e+01 1.021e+02 1.619e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-19 09:46:41,667 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=678706.6666666666, ans=0.07 2023-11-19 09:46:44,872 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=678773.3333333334, ans=0.0 2023-11-19 09:46:52,089 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.37 vs. limit=15.0 2023-11-19 09:46:52,567 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:46:57,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=678840.0, ans=0.125 2023-11-19 09:47:07,976 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5650, loss[loss=0.09516, simple_loss=0.1135, pruned_loss=0.02598, audio_tagging_loss=0.01241, over 14368.00 frames. ], tot_loss[loss=0.08913, simple_loss=0.1076, pruned_loss=0.02449, audio_tagging_loss=0.01087, over 3042482.14 frames. ], batch size: 56, lr: 7.72e-03, grad_scale: 16.0 2023-11-19 09:47:16,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=678906.6666666666, ans=0.0 2023-11-19 09:47:26,655 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2023-11-19 09:47:53,048 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=679173.3333333334, ans=0.125 2023-11-19 09:48:04,375 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5700, loss[loss=0.1025, simple_loss=0.1174, pruned_loss=0.03279, audio_tagging_loss=0.01101, over 13622.00 frames. ], tot_loss[loss=0.08903, simple_loss=0.1077, pruned_loss=0.02438, audio_tagging_loss=0.01082, over 3041739.52 frames. ], batch size: 53, lr: 7.72e-03, grad_scale: 16.0 2023-11-19 09:48:05,014 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.13 vs. limit=6.0 2023-11-19 09:48:19,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=679306.6666666666, ans=0.125 2023-11-19 09:48:26,559 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.673e+01 9.391e+01 1.015e+02 1.366e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-19 09:48:34,212 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=679373.3333333334, ans=0.0 2023-11-19 09:48:37,648 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.30 vs. limit=12.0 2023-11-19 09:48:39,395 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=679440.0, ans=0.0 2023-11-19 09:48:51,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=679506.6666666666, ans=0.125 2023-11-19 09:48:59,025 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=679573.3333333334, ans=0.125 2023-11-19 09:48:59,868 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5750, loss[loss=0.07279, simple_loss=0.07891, pruned_loss=0.02218, audio_tagging_loss=0.01116, over 15890.00 frames. ], tot_loss[loss=0.08864, simple_loss=0.1071, pruned_loss=0.0243, audio_tagging_loss=0.01077, over 3044144.54 frames. ], batch size: 61, lr: 7.72e-03, grad_scale: 16.0 2023-11-19 09:49:18,630 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=679640.0, ans=0.0 2023-11-19 09:49:40,556 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.20 vs. limit=15.0 2023-11-19 09:49:55,318 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5800, loss[loss=0.06398, simple_loss=0.08069, pruned_loss=0.01416, audio_tagging_loss=0.009474, over 15446.00 frames. ], tot_loss[loss=0.08879, simple_loss=0.1076, pruned_loss=0.02441, audio_tagging_loss=0.0106, over 3050201.03 frames. ], batch size: 57, lr: 7.72e-03, grad_scale: 16.0 2023-11-19 09:49:58,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=679906.6666666666, ans=0.125 2023-11-19 09:50:17,693 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.685e+01 8.360e+01 9.012e+01 9.906e+01 1.267e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 09:50:35,971 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:50:43,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=680173.3333333334, ans=0.0 2023-11-19 09:50:49,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=680173.3333333334, ans=0.125 2023-11-19 09:50:50,891 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5850, loss[loss=0.09105, simple_loss=0.1065, pruned_loss=0.02923, audio_tagging_loss=0.00858, over 14508.00 frames. ], tot_loss[loss=0.08805, simple_loss=0.1069, pruned_loss=0.02408, audio_tagging_loss=0.01054, over 3045823.72 frames. ], batch size: 53, lr: 7.71e-03, grad_scale: 16.0 2023-11-19 09:51:11,207 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=680306.6666666666, ans=0.0 2023-11-19 09:51:11,328 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=680306.6666666666, ans=0.125 2023-11-19 09:51:12,942 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=680373.3333333334, ans=0.07 2023-11-19 09:51:18,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=680373.3333333334, ans=0.125 2023-11-19 09:51:47,067 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5900, loss[loss=0.07223, simple_loss=0.08935, pruned_loss=0.01785, audio_tagging_loss=0.009698, over 15000.00 frames. ], tot_loss[loss=0.08743, simple_loss=0.1061, pruned_loss=0.02387, audio_tagging_loss=0.01053, over 3052040.76 frames. ], batch size: 57, lr: 7.71e-03, grad_scale: 16.0 2023-11-19 09:52:08,763 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.892e+01 8.198e+01 8.843e+01 9.810e+01 1.400e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 09:52:14,762 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.364e-01 2023-11-19 09:52:26,906 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=680773.3333333334, ans=0.125 2023-11-19 09:52:30,589 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=680840.0, ans=0.2 2023-11-19 09:52:42,555 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 5950, loss[loss=0.06733, simple_loss=0.07247, pruned_loss=0.0187, audio_tagging_loss=0.01239, over 14459.00 frames. ], tot_loss[loss=0.08746, simple_loss=0.1063, pruned_loss=0.02383, audio_tagging_loss=0.01047, over 3061801.51 frames. ], batch size: 56, lr: 7.71e-03, grad_scale: 16.0 2023-11-19 09:52:46,362 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.16 vs. limit=10.0 2023-11-19 09:52:50,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=680906.6666666666, ans=0.125 2023-11-19 09:53:14,006 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:53:24,276 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.10 vs. limit=22.5 2023-11-19 09:53:38,039 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6000, loss[loss=0.0837, simple_loss=0.09488, pruned_loss=0.02522, audio_tagging_loss=0.01104, over 14772.00 frames. ], tot_loss[loss=0.08695, simple_loss=0.1058, pruned_loss=0.02364, audio_tagging_loss=0.01043, over 3065717.31 frames. ], batch size: 58, lr: 7.71e-03, grad_scale: 32.0 2023-11-19 09:53:38,040 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 09:54:10,903 INFO [train_asr.py:1147] (2/4) Epoch 9, validation: loss=0.06636, simple_loss=0.05607, pruned_loss=0.006778, audio_tagging_loss=0.03155, over 4681554.00 frames. 2023-11-19 09:54:10,904 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 09:54:15,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=681240.0, ans=0.125 2023-11-19 09:54:33,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=681373.3333333334, ans=0.125 2023-11-19 09:54:33,948 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.777e+01 8.283e+01 9.118e+01 1.003e+02 1.340e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 09:54:44,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=681440.0, ans=0.125 2023-11-19 09:54:51,004 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:55:06,818 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6050, loss[loss=0.05391, simple_loss=0.06196, pruned_loss=0.01351, audio_tagging_loss=0.00942, over 15123.00 frames. ], tot_loss[loss=0.08715, simple_loss=0.106, pruned_loss=0.02377, audio_tagging_loss=0.01035, over 3070270.03 frames. ], batch size: 59, lr: 7.71e-03, grad_scale: 16.0 2023-11-19 09:55:24,850 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.17 vs. limit=22.5 2023-11-19 09:55:25,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=681640.0, ans=0.0 2023-11-19 09:55:32,597 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.39 vs. limit=22.5 2023-11-19 09:55:48,662 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2023-11-19 09:55:58,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=681840.0, ans=0.0 2023-11-19 09:56:02,350 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6100, loss[loss=0.06874, simple_loss=0.07772, pruned_loss=0.01663, audio_tagging_loss=0.01326, over 15799.00 frames. ], tot_loss[loss=0.08722, simple_loss=0.1059, pruned_loss=0.02389, audio_tagging_loss=0.01037, over 3064358.00 frames. ], batch size: 63, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:56:02,893 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2023-11-19 09:56:03,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=681906.6666666666, ans=0.125 2023-11-19 09:56:06,650 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=681906.6666666666, ans=0.125 2023-11-19 09:56:08,009 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.45 vs. limit=22.5 2023-11-19 09:56:20,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=681973.3333333334, ans=0.125 2023-11-19 09:56:26,131 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.696e+01 8.498e+01 9.519e+01 1.052e+02 1.737e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 09:56:35,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=682106.6666666666, ans=0.125 2023-11-19 09:56:38,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=682106.6666666666, ans=0.0 2023-11-19 09:56:41,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=682106.6666666666, ans=0.2 2023-11-19 09:56:44,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=682106.6666666666, ans=0.125 2023-11-19 09:56:57,824 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6150, loss[loss=0.1065, simple_loss=0.1346, pruned_loss=0.0318, audio_tagging_loss=0.007405, over 15523.00 frames. ], tot_loss[loss=0.08747, simple_loss=0.106, pruned_loss=0.02399, audio_tagging_loss=0.01047, over 3056359.82 frames. ], batch size: 57, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:57:23,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=682373.3333333334, ans=0.1 2023-11-19 09:57:24,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=682373.3333333334, ans=0.125 2023-11-19 09:57:27,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=682373.3333333334, ans=0.0 2023-11-19 09:57:52,400 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=682573.3333333334, ans=0.1 2023-11-19 09:57:52,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=682573.3333333334, ans=0.125 2023-11-19 09:57:53,365 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6200, loss[loss=0.08841, simple_loss=0.1011, pruned_loss=0.02633, audio_tagging_loss=0.01152, over 14378.00 frames. ], tot_loss[loss=0.088, simple_loss=0.1064, pruned_loss=0.02427, audio_tagging_loss=0.01051, over 3051009.74 frames. ], batch size: 54, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:57:58,213 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.63 vs. limit=22.5 2023-11-19 09:58:03,606 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.67 vs. limit=12.0 2023-11-19 09:58:04,654 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.89 vs. limit=15.0 2023-11-19 09:58:08,191 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.17 vs. limit=15.0 2023-11-19 09:58:13,305 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=682640.0, ans=0.125 2023-11-19 09:58:16,315 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.684e+01 8.555e+01 9.157e+01 9.904e+01 1.201e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 09:58:37,395 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.67 vs. limit=15.0 2023-11-19 09:58:39,570 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.13 vs. limit=15.0 2023-11-19 09:58:49,116 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6250, loss[loss=0.07841, simple_loss=0.09541, pruned_loss=0.02197, audio_tagging_loss=0.008739, over 14789.00 frames. ], tot_loss[loss=0.08843, simple_loss=0.1068, pruned_loss=0.02443, audio_tagging_loss=0.01059, over 3043941.46 frames. ], batch size: 57, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:59:21,972 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=12.0 2023-11-19 09:59:28,050 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=683106.6666666666, ans=0.125 2023-11-19 09:59:36,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=683173.3333333334, ans=0.125 2023-11-19 09:59:41,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=683173.3333333334, ans=0.2 2023-11-19 09:59:44,719 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6300, loss[loss=0.1027, simple_loss=0.1339, pruned_loss=0.02714, audio_tagging_loss=0.008658, over 14856.00 frames. ], tot_loss[loss=0.08903, simple_loss=0.1077, pruned_loss=0.02456, audio_tagging_loss=0.01062, over 3041974.94 frames. ], batch size: 54, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:59:48,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=683240.0, ans=0.025 2023-11-19 09:59:54,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=683306.6666666666, ans=0.125 2023-11-19 10:00:01,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=683306.6666666666, ans=0.125 2023-11-19 10:00:07,712 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.970e+01 8.511e+01 9.206e+01 1.011e+02 2.353e+02, threshold=1.841e+02, percent-clipped=1.0 2023-11-19 10:00:11,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=683373.3333333334, ans=0.0 2023-11-19 10:00:22,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=683440.0, ans=0.125 2023-11-19 10:00:27,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=683506.6666666666, ans=0.0 2023-11-19 10:00:29,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=683506.6666666666, ans=0.125 2023-11-19 10:00:40,554 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6350, loss[loss=0.08643, simple_loss=0.1122, pruned_loss=0.02051, audio_tagging_loss=0.009809, over 14943.00 frames. ], tot_loss[loss=0.08872, simple_loss=0.1074, pruned_loss=0.02443, audio_tagging_loss=0.01058, over 3037135.20 frames. ], batch size: 56, lr: 7.69e-03, grad_scale: 16.0 2023-11-19 10:00:44,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=683573.3333333334, ans=0.125 2023-11-19 10:01:12,934 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.40 vs. limit=22.5 2023-11-19 10:01:16,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=683773.3333333334, ans=0.125 2023-11-19 10:01:31,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=683840.0, ans=0.04949747468305833 2023-11-19 10:01:35,458 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6400, loss[loss=0.09582, simple_loss=0.1221, pruned_loss=0.02477, audio_tagging_loss=0.01001, over 15353.00 frames. ], tot_loss[loss=0.08945, simple_loss=0.1088, pruned_loss=0.02448, audio_tagging_loss=0.01058, over 3036188.82 frames. ], batch size: 56, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:01:43,959 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=683906.6666666666, ans=15.0 2023-11-19 10:01:50,938 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2023-11-19 10:01:58,531 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=684040.0, ans=0.125 2023-11-19 10:01:59,221 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.664e+01 8.378e+01 8.903e+01 9.717e+01 1.251e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-19 10:02:06,086 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.19 vs. limit=15.0 2023-11-19 10:02:29,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=684173.3333333334, ans=0.1 2023-11-19 10:02:30,940 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6450, loss[loss=0.0674, simple_loss=0.07489, pruned_loss=0.02003, audio_tagging_loss=0.009932, over 14734.00 frames. ], tot_loss[loss=0.08921, simple_loss=0.1083, pruned_loss=0.02453, audio_tagging_loss=0.01054, over 3038718.53 frames. ], batch size: 56, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:02:37,360 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.12 vs. limit=12.0 2023-11-19 10:02:51,870 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:03:08,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=684440.0, ans=0.04949747468305833 2023-11-19 10:03:10,382 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=684440.0, ans=0.1 2023-11-19 10:03:27,107 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6500, loss[loss=0.1152, simple_loss=0.1412, pruned_loss=0.03633, audio_tagging_loss=0.008242, over 15257.00 frames. ], tot_loss[loss=0.08883, simple_loss=0.1078, pruned_loss=0.02436, audio_tagging_loss=0.01056, over 3039907.90 frames. ], batch size: 57, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:03:29,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=684573.3333333334, ans=0.0 2023-11-19 10:03:29,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=684573.3333333334, ans=0.2 2023-11-19 10:03:50,207 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 8.426e+01 9.031e+01 9.982e+01 1.610e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-19 10:04:04,246 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=684773.3333333334, ans=0.0 2023-11-19 10:04:09,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=684773.3333333334, ans=0.07 2023-11-19 10:04:22,321 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6550, loss[loss=0.06943, simple_loss=0.07454, pruned_loss=0.01662, audio_tagging_loss=0.01554, over 13698.00 frames. ], tot_loss[loss=0.08883, simple_loss=0.1082, pruned_loss=0.02431, audio_tagging_loss=0.01044, over 3042730.94 frames. ], batch size: 54, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:04:23,193 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.93 vs. limit=5.0 2023-11-19 10:04:23,956 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.67 vs. limit=15.0 2023-11-19 10:04:49,935 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.94 vs. limit=10.0 2023-11-19 10:04:53,127 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.72 vs. limit=12.0 2023-11-19 10:05:02,054 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.48 vs. limit=22.5 2023-11-19 10:05:18,101 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6600, loss[loss=0.08853, simple_loss=0.1201, pruned_loss=0.02165, audio_tagging_loss=0.006826, over 15923.00 frames. ], tot_loss[loss=0.08785, simple_loss=0.1069, pruned_loss=0.02403, audio_tagging_loss=0.01035, over 3042472.88 frames. ], batch size: 56, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:05:20,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=685240.0, ans=0.1 2023-11-19 10:05:41,605 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.207e+01 8.810e+01 9.589e+01 1.176e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-19 10:05:41,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=685373.3333333334, ans=0.125 2023-11-19 10:06:06,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=685506.6666666666, ans=0.035 2023-11-19 10:06:14,471 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6650, loss[loss=0.07855, simple_loss=0.09131, pruned_loss=0.01992, audio_tagging_loss=0.01298, over 15232.00 frames. ], tot_loss[loss=0.08742, simple_loss=0.1062, pruned_loss=0.02389, audio_tagging_loss=0.01042, over 3042981.85 frames. ], batch size: 58, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:06:40,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=685706.6666666666, ans=15.0 2023-11-19 10:06:50,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=685773.3333333334, ans=0.1 2023-11-19 10:07:04,688 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.11 vs. limit=15.0 2023-11-19 10:07:09,322 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6700, loss[loss=0.07112, simple_loss=0.08289, pruned_loss=0.01851, audio_tagging_loss=0.01117, over 14126.00 frames. ], tot_loss[loss=0.08803, simple_loss=0.107, pruned_loss=0.02412, audio_tagging_loss=0.01042, over 3048079.13 frames. ], batch size: 56, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:07:15,853 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=685906.6666666666, ans=0.0 2023-11-19 10:07:25,569 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=685973.3333333334, ans=0.125 2023-11-19 10:07:27,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=685973.3333333334, ans=0.2 2023-11-19 10:07:29,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=685973.3333333334, ans=0.0 2023-11-19 10:07:33,112 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.849e+01 8.482e+01 9.410e+01 1.023e+02 1.409e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-19 10:07:34,934 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.98 vs. limit=6.0 2023-11-19 10:07:41,860 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.66 vs. limit=15.0 2023-11-19 10:07:54,249 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=686173.3333333334, ans=0.0 2023-11-19 10:07:59,909 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.89 vs. limit=10.0 2023-11-19 10:07:59,944 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.57 vs. limit=15.0 2023-11-19 10:08:05,668 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6750, loss[loss=0.07657, simple_loss=0.08352, pruned_loss=0.02274, audio_tagging_loss=0.01207, over 14561.00 frames. ], tot_loss[loss=0.08869, simple_loss=0.1079, pruned_loss=0.0244, audio_tagging_loss=0.01033, over 3041015.58 frames. ], batch size: 57, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:08:13,135 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.85 vs. limit=10.0 2023-11-19 10:08:15,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=686240.0, ans=0.0 2023-11-19 10:08:29,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=686373.3333333334, ans=0.1 2023-11-19 10:08:33,325 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=686373.3333333334, ans=0.2 2023-11-19 10:08:37,633 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:08:45,178 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.65 vs. limit=15.0 2023-11-19 10:08:51,379 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.85 vs. limit=12.0 2023-11-19 10:08:54,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=686506.6666666666, ans=15.0 2023-11-19 10:09:01,563 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6800, loss[loss=0.1079, simple_loss=0.1349, pruned_loss=0.03045, audio_tagging_loss=0.01006, over 14927.00 frames. ], tot_loss[loss=0.08803, simple_loss=0.1071, pruned_loss=0.02414, audio_tagging_loss=0.01033, over 3049341.53 frames. ], batch size: 54, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:09:06,954 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=686573.3333333334, ans=0.0 2023-11-19 10:09:23,727 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2023-11-19 10:09:24,301 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.110e+01 8.200e+01 8.866e+01 9.843e+01 1.346e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-19 10:09:29,458 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.95 vs. limit=22.5 2023-11-19 10:09:56,519 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6850, loss[loss=0.06717, simple_loss=0.0724, pruned_loss=0.01752, audio_tagging_loss=0.01345, over 15735.00 frames. ], tot_loss[loss=0.08778, simple_loss=0.1068, pruned_loss=0.02411, audio_tagging_loss=0.01028, over 3041058.84 frames. ], batch size: 62, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:09:56,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=686906.6666666666, ans=0.125 2023-11-19 10:09:58,957 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=686906.6666666666, ans=0.0 2023-11-19 10:10:27,523 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=687040.0, ans=0.125 2023-11-19 10:10:31,833 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=687106.6666666666, ans=0.125 2023-11-19 10:10:35,275 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.98 vs. limit=6.0 2023-11-19 10:10:52,109 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6900, loss[loss=0.06974, simple_loss=0.07969, pruned_loss=0.01676, audio_tagging_loss=0.01314, over 15865.00 frames. ], tot_loss[loss=0.08735, simple_loss=0.1063, pruned_loss=0.02383, audio_tagging_loss=0.01036, over 3045904.82 frames. ], batch size: 60, lr: 7.67e-03, grad_scale: 32.0 2023-11-19 10:11:05,975 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=687306.6666666666, ans=0.1 2023-11-19 10:11:11,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=687306.6666666666, ans=0.1 2023-11-19 10:11:15,749 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.905e+01 8.720e+01 9.438e+01 1.043e+02 1.545e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-19 10:11:15,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=687373.3333333334, ans=0.1 2023-11-19 10:11:34,262 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:11:34,441 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=687440.0, ans=0.125 2023-11-19 10:11:47,997 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 6950, loss[loss=0.09972, simple_loss=0.1168, pruned_loss=0.03041, audio_tagging_loss=0.01089, over 14061.00 frames. ], tot_loss[loss=0.08714, simple_loss=0.106, pruned_loss=0.0237, audio_tagging_loss=0.01041, over 3041244.91 frames. ], batch size: 53, lr: 7.67e-03, grad_scale: 32.0 2023-11-19 10:12:15,042 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.41 vs. limit=12.0 2023-11-19 10:12:16,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=687706.6666666666, ans=0.125 2023-11-19 10:12:20,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=687773.3333333334, ans=22.5 2023-11-19 10:12:39,753 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=15.0 2023-11-19 10:12:43,296 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7000, loss[loss=0.07198, simple_loss=0.08855, pruned_loss=0.01682, audio_tagging_loss=0.01089, over 14464.00 frames. ], tot_loss[loss=0.08706, simple_loss=0.106, pruned_loss=0.02359, audio_tagging_loss=0.01046, over 3037685.99 frames. ], batch size: 57, lr: 7.67e-03, grad_scale: 16.0 2023-11-19 10:13:08,460 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.313e+01 8.438e+01 9.310e+01 1.017e+02 1.231e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-19 10:13:39,126 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7050, loss[loss=0.08691, simple_loss=0.1081, pruned_loss=0.02312, audio_tagging_loss=0.009767, over 14452.00 frames. ], tot_loss[loss=0.08727, simple_loss=0.106, pruned_loss=0.02367, audio_tagging_loss=0.01062, over 3030198.64 frames. ], batch size: 52, lr: 7.67e-03, grad_scale: 16.0 2023-11-19 10:13:43,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=688240.0, ans=0.0 2023-11-19 10:13:45,729 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=688240.0, ans=0.125 2023-11-19 10:13:48,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=688240.0, ans=0.125 2023-11-19 10:13:48,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=688240.0, ans=0.0 2023-11-19 10:13:51,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=688306.6666666666, ans=0.125 2023-11-19 10:14:14,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=688440.0, ans=0.0 2023-11-19 10:14:30,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=688506.6666666666, ans=0.05 2023-11-19 10:14:35,312 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7100, loss[loss=0.06587, simple_loss=0.07606, pruned_loss=0.01781, audio_tagging_loss=0.01002, over 14899.00 frames. ], tot_loss[loss=0.08797, simple_loss=0.107, pruned_loss=0.02387, audio_tagging_loss=0.01061, over 3039927.69 frames. ], batch size: 56, lr: 7.67e-03, grad_scale: 16.0 2023-11-19 10:14:36,910 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.25 vs. limit=15.0 2023-11-19 10:14:45,947 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.65 vs. limit=22.5 2023-11-19 10:14:50,784 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=688640.0, ans=0.1 2023-11-19 10:14:59,030 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.651e+01 8.454e+01 9.144e+01 1.007e+02 1.381e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-19 10:15:23,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=688840.0, ans=0.07 2023-11-19 10:15:30,844 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7150, loss[loss=0.09075, simple_loss=0.1203, pruned_loss=0.02139, audio_tagging_loss=0.009201, over 15757.00 frames. ], tot_loss[loss=0.0877, simple_loss=0.1062, pruned_loss=0.02391, audio_tagging_loss=0.01067, over 3040573.79 frames. ], batch size: 56, lr: 7.67e-03, grad_scale: 16.0 2023-11-19 10:15:40,806 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.06 vs. limit=15.0 2023-11-19 10:15:47,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=688973.3333333334, ans=0.2 2023-11-19 10:15:50,649 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=15.0 2023-11-19 10:16:26,359 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7200, loss[loss=0.1032, simple_loss=0.1228, pruned_loss=0.03116, audio_tagging_loss=0.01066, over 15734.00 frames. ], tot_loss[loss=0.08779, simple_loss=0.1063, pruned_loss=0.02392, audio_tagging_loss=0.01072, over 3040981.65 frames. ], batch size: 57, lr: 7.66e-03, grad_scale: 32.0 2023-11-19 10:16:35,126 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=689240.0, ans=0.0 2023-11-19 10:16:48,382 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2023-11-19 10:16:51,000 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.339e+01 8.352e+01 9.080e+01 1.000e+02 1.342e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-19 10:16:51,236 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=689373.3333333334, ans=0.125 2023-11-19 10:17:16,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=689506.6666666666, ans=0.125 2023-11-19 10:17:19,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=689506.6666666666, ans=0.0 2023-11-19 10:17:21,596 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7250, loss[loss=0.1037, simple_loss=0.1394, pruned_loss=0.02625, audio_tagging_loss=0.007684, over 14910.00 frames. ], tot_loss[loss=0.08888, simple_loss=0.1076, pruned_loss=0.02431, audio_tagging_loss=0.01075, over 3040622.35 frames. ], batch size: 55, lr: 7.66e-03, grad_scale: 32.0 2023-11-19 10:17:33,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=689640.0, ans=0.0 2023-11-19 10:17:37,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=689640.0, ans=0.025 2023-11-19 10:17:44,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=689706.6666666666, ans=0.125 2023-11-19 10:17:50,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=689706.6666666666, ans=0.0 2023-11-19 10:17:52,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=689706.6666666666, ans=0.07 2023-11-19 10:18:17,662 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7300, loss[loss=0.08679, simple_loss=0.103, pruned_loss=0.02393, audio_tagging_loss=0.01135, over 15067.00 frames. ], tot_loss[loss=0.08859, simple_loss=0.1073, pruned_loss=0.02418, audio_tagging_loss=0.01074, over 3042426.16 frames. ], batch size: 56, lr: 7.66e-03, grad_scale: 16.0 2023-11-19 10:18:42,971 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.905e+01 8.177e+01 8.479e+01 9.252e+01 1.452e+02, threshold=1.696e+02, percent-clipped=0.0 2023-11-19 10:18:44,550 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.15 vs. limit=12.0 2023-11-19 10:19:12,531 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7350, loss[loss=0.08974, simple_loss=0.1184, pruned_loss=0.0219, audio_tagging_loss=0.008616, over 15743.00 frames. ], tot_loss[loss=0.08804, simple_loss=0.1068, pruned_loss=0.02406, audio_tagging_loss=0.01059, over 3041918.35 frames. ], batch size: 57, lr: 7.66e-03, grad_scale: 16.0 2023-11-19 10:19:16,357 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=690240.0, ans=0.125 2023-11-19 10:19:36,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=690373.3333333334, ans=0.2 2023-11-19 10:20:08,407 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7400, loss[loss=0.07046, simple_loss=0.07416, pruned_loss=0.02171, audio_tagging_loss=0.01167, over 13818.00 frames. ], tot_loss[loss=0.08803, simple_loss=0.1075, pruned_loss=0.02395, audio_tagging_loss=0.01034, over 3038774.14 frames. ], batch size: 54, lr: 7.66e-03, grad_scale: 16.0 2023-11-19 10:20:30,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=690706.6666666666, ans=0.125 2023-11-19 10:20:34,036 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.592e+01 8.284e+01 9.235e+01 1.033e+02 1.364e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 10:20:35,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=690706.6666666666, ans=0.125 2023-11-19 10:20:37,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=690706.6666666666, ans=0.0 2023-11-19 10:20:54,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=690840.0, ans=0.0 2023-11-19 10:21:04,105 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7450, loss[loss=0.09214, simple_loss=0.1159, pruned_loss=0.02603, audio_tagging_loss=0.008171, over 15020.00 frames. ], tot_loss[loss=0.08778, simple_loss=0.1072, pruned_loss=0.02392, audio_tagging_loss=0.01028, over 3036692.70 frames. ], batch size: 58, lr: 7.65e-03, grad_scale: 16.0 2023-11-19 10:21:11,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=690906.6666666666, ans=0.125 2023-11-19 10:21:25,320 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=691040.0, ans=0.0 2023-11-19 10:21:26,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=691040.0, ans=0.2 2023-11-19 10:21:27,390 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=691040.0, ans=0.0 2023-11-19 10:21:27,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=691040.0, ans=0.0 2023-11-19 10:21:28,513 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=691040.0, ans=0.0 2023-11-19 10:21:31,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=691040.0, ans=0.125 2023-11-19 10:21:48,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=691173.3333333334, ans=0.125 2023-11-19 10:21:56,459 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=691173.3333333334, ans=0.0 2023-11-19 10:21:59,373 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7500, loss[loss=0.07759, simple_loss=0.08684, pruned_loss=0.02435, audio_tagging_loss=0.009824, over 16061.00 frames. ], tot_loss[loss=0.08741, simple_loss=0.1064, pruned_loss=0.02396, audio_tagging_loss=0.01024, over 3050655.70 frames. ], batch size: 61, lr: 7.65e-03, grad_scale: 16.0 2023-11-19 10:22:01,705 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=691240.0, ans=0.0 2023-11-19 10:22:03,187 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=691240.0, ans=0.0 2023-11-19 10:22:06,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=691240.0, ans=0.1 2023-11-19 10:22:21,506 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.83 vs. limit=22.5 2023-11-19 10:22:25,172 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.501e+01 8.537e+01 9.196e+01 9.974e+01 1.563e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-19 10:22:51,908 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=691506.6666666666, ans=0.2 2023-11-19 10:22:52,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=691506.6666666666, ans=0.125 2023-11-19 10:22:52,863 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=691506.6666666666, ans=0.0 2023-11-19 10:22:54,768 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7550, loss[loss=0.1096, simple_loss=0.1406, pruned_loss=0.03055, audio_tagging_loss=0.008753, over 15541.00 frames. ], tot_loss[loss=0.08821, simple_loss=0.1075, pruned_loss=0.02426, audio_tagging_loss=0.01023, over 3053877.57 frames. ], batch size: 55, lr: 7.65e-03, grad_scale: 16.0 2023-11-19 10:23:03,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=691573.3333333334, ans=0.125 2023-11-19 10:23:26,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=691706.6666666666, ans=0.0 2023-11-19 10:23:30,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=691773.3333333334, ans=0.95 2023-11-19 10:23:42,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=691840.0, ans=0.125 2023-11-19 10:23:50,749 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7600, loss[loss=0.07745, simple_loss=0.0916, pruned_loss=0.02261, audio_tagging_loss=0.009048, over 16064.00 frames. ], tot_loss[loss=0.08737, simple_loss=0.1061, pruned_loss=0.02397, audio_tagging_loss=0.01036, over 3051949.48 frames. ], batch size: 61, lr: 7.65e-03, grad_scale: 32.0 2023-11-19 10:23:59,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=691906.6666666666, ans=0.2 2023-11-19 10:24:12,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=692040.0, ans=0.125 2023-11-19 10:24:16,202 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.367e+01 9.110e+01 1.007e+02 1.295e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-19 10:24:18,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=692040.0, ans=0.125 2023-11-19 10:24:22,298 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=692040.0, ans=0.125 2023-11-19 10:24:28,559 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-19 10:24:30,259 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=692106.6666666666, ans=0.0 2023-11-19 10:24:46,400 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7650, loss[loss=0.06679, simple_loss=0.08942, pruned_loss=0.01593, audio_tagging_loss=0.006148, over 15032.00 frames. ], tot_loss[loss=0.08645, simple_loss=0.1048, pruned_loss=0.02358, audio_tagging_loss=0.01045, over 3049241.31 frames. ], batch size: 56, lr: 7.65e-03, grad_scale: 16.0 2023-11-19 10:24:48,028 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.56 vs. limit=10.0 2023-11-19 10:24:50,881 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:25:01,550 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.63 vs. limit=6.0 2023-11-19 10:25:25,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=692440.0, ans=0.0 2023-11-19 10:25:42,036 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7700, loss[loss=0.09884, simple_loss=0.1218, pruned_loss=0.02456, audio_tagging_loss=0.01338, over 13995.00 frames. ], tot_loss[loss=0.08616, simple_loss=0.1046, pruned_loss=0.02336, audio_tagging_loss=0.01048, over 3048877.87 frames. ], batch size: 52, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:26:00,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=692640.0, ans=0.125 2023-11-19 10:26:06,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=692706.6666666666, ans=0.1 2023-11-19 10:26:08,253 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.466e+01 9.076e+01 9.722e+01 1.155e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-19 10:26:33,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=692840.0, ans=0.09899494936611666 2023-11-19 10:26:33,899 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.10 vs. limit=15.0 2023-11-19 10:26:36,406 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=692840.0, ans=0.125 2023-11-19 10:26:37,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=692906.6666666666, ans=0.0 2023-11-19 10:26:38,247 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7750, loss[loss=0.08412, simple_loss=0.1073, pruned_loss=0.01894, audio_tagging_loss=0.01151, over 14850.00 frames. ], tot_loss[loss=0.08611, simple_loss=0.1045, pruned_loss=0.02333, audio_tagging_loss=0.01054, over 3044224.37 frames. ], batch size: 57, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:26:39,012 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.18 vs. limit=15.0 2023-11-19 10:27:16,011 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=693106.6666666666, ans=0.125 2023-11-19 10:27:18,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=693106.6666666666, ans=10.0 2023-11-19 10:27:26,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=693173.3333333334, ans=0.0 2023-11-19 10:27:33,223 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7800, loss[loss=0.0902, simple_loss=0.09676, pruned_loss=0.02854, audio_tagging_loss=0.01328, over 16069.00 frames. ], tot_loss[loss=0.08656, simple_loss=0.1047, pruned_loss=0.02364, audio_tagging_loss=0.01057, over 3042694.91 frames. ], batch size: 60, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:27:55,152 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=693306.6666666666, ans=0.125 2023-11-19 10:28:02,856 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.622e+01 9.449e+01 1.060e+02 1.939e+02, threshold=1.890e+02, percent-clipped=1.0 2023-11-19 10:28:31,430 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7850, loss[loss=0.1202, simple_loss=0.1397, pruned_loss=0.03809, audio_tagging_loss=0.01228, over 16101.00 frames. ], tot_loss[loss=0.08673, simple_loss=0.1047, pruned_loss=0.02375, audio_tagging_loss=0.01065, over 3051446.61 frames. ], batch size: 57, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:28:42,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=693640.0, ans=0.0 2023-11-19 10:29:09,016 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.51 vs. limit=15.0 2023-11-19 10:29:27,564 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7900, loss[loss=0.1129, simple_loss=0.13, pruned_loss=0.03622, audio_tagging_loss=0.01171, over 14494.00 frames. ], tot_loss[loss=0.08787, simple_loss=0.1057, pruned_loss=0.02431, audio_tagging_loss=0.01073, over 3048757.65 frames. ], batch size: 53, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:29:32,939 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=693906.6666666666, ans=0.125 2023-11-19 10:29:42,848 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=693973.3333333334, ans=0.125 2023-11-19 10:29:53,843 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.703e+01 8.460e+01 9.050e+01 1.000e+02 1.219e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 10:29:58,439 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=694040.0, ans=0.05 2023-11-19 10:30:06,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=694106.6666666666, ans=0.125 2023-11-19 10:30:20,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=694173.3333333334, ans=0.1 2023-11-19 10:30:22,403 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 7950, loss[loss=0.09279, simple_loss=0.1218, pruned_loss=0.02495, audio_tagging_loss=0.006963, over 14083.00 frames. ], tot_loss[loss=0.08792, simple_loss=0.1055, pruned_loss=0.02439, audio_tagging_loss=0.01079, over 3049566.62 frames. ], batch size: 53, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:30:36,304 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:30:48,860 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.52 vs. limit=22.5 2023-11-19 10:31:13,546 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=694506.6666666666, ans=0.2 2023-11-19 10:31:13,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=694506.6666666666, ans=0.125 2023-11-19 10:31:18,660 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8000, loss[loss=0.1159, simple_loss=0.1362, pruned_loss=0.03645, audio_tagging_loss=0.01138, over 16114.00 frames. ], tot_loss[loss=0.08761, simple_loss=0.1048, pruned_loss=0.02425, audio_tagging_loss=0.01095, over 3044843.21 frames. ], batch size: 58, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:31:32,349 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.56 vs. limit=15.0 2023-11-19 10:31:39,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=694640.0, ans=0.125 2023-11-19 10:31:43,940 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-11-19 10:31:45,483 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 8.170e+01 9.028e+01 9.822e+01 2.160e+02, threshold=1.806e+02, percent-clipped=1.0 2023-11-19 10:31:50,199 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2023-11-19 10:32:14,648 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8050, loss[loss=0.06348, simple_loss=0.07344, pruned_loss=0.01292, audio_tagging_loss=0.01383, over 14851.00 frames. ], tot_loss[loss=0.08768, simple_loss=0.105, pruned_loss=0.02418, audio_tagging_loss=0.011, over 3046369.40 frames. ], batch size: 55, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:32:32,798 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.43 vs. limit=15.0 2023-11-19 10:32:40,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=695040.0, ans=0.125 2023-11-19 10:32:41,157 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.095e-01 2023-11-19 10:32:42,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=695040.0, ans=0.2 2023-11-19 10:32:52,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=695106.6666666666, ans=0.0 2023-11-19 10:32:54,057 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.06 vs. limit=6.0 2023-11-19 10:33:10,037 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8100, loss[loss=0.09343, simple_loss=0.1194, pruned_loss=0.02556, audio_tagging_loss=0.008153, over 14370.00 frames. ], tot_loss[loss=0.08836, simple_loss=0.1064, pruned_loss=0.02439, audio_tagging_loss=0.01075, over 3043308.77 frames. ], batch size: 53, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:33:16,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=695240.0, ans=0.125 2023-11-19 10:33:17,926 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.37 vs. limit=22.5 2023-11-19 10:33:19,000 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.88 vs. limit=12.0 2023-11-19 10:33:27,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=695306.6666666666, ans=0.125 2023-11-19 10:33:27,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=695306.6666666666, ans=0.0 2023-11-19 10:33:31,562 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=695373.3333333334, ans=0.125 2023-11-19 10:33:35,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=695373.3333333334, ans=0.125 2023-11-19 10:33:37,037 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.784e+01 8.319e+01 9.007e+01 9.983e+01 1.168e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-19 10:33:47,545 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.09 vs. limit=22.5 2023-11-19 10:33:57,735 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=695506.6666666666, ans=0.09899494936611666 2023-11-19 10:34:05,409 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8150, loss[loss=0.09104, simple_loss=0.1028, pruned_loss=0.02678, audio_tagging_loss=0.01285, over 15301.00 frames. ], tot_loss[loss=0.0883, simple_loss=0.1068, pruned_loss=0.02427, audio_tagging_loss=0.01064, over 3050935.26 frames. ], batch size: 56, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:34:23,334 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.60 vs. limit=15.0 2023-11-19 10:34:27,291 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=695706.6666666666, ans=0.125 2023-11-19 10:34:37,818 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=695773.3333333334, ans=0.2 2023-11-19 10:34:39,865 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=695773.3333333334, ans=0.2 2023-11-19 10:34:49,744 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=695840.0, ans=0.1 2023-11-19 10:34:52,755 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=695840.0, ans=0.1 2023-11-19 10:34:59,359 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=695840.0, ans=0.125 2023-11-19 10:35:01,180 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8200, loss[loss=0.07643, simple_loss=0.09785, pruned_loss=0.01973, audio_tagging_loss=0.007768, over 14151.00 frames. ], tot_loss[loss=0.08752, simple_loss=0.1061, pruned_loss=0.02401, audio_tagging_loss=0.01048, over 3038751.77 frames. ], batch size: 52, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:35:02,265 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:35:19,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=695973.3333333334, ans=0.1 2023-11-19 10:35:27,077 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.672e+01 8.400e+01 8.844e+01 9.876e+01 1.152e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 10:35:41,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=696106.6666666666, ans=0.125 2023-11-19 10:35:56,649 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8250, loss[loss=0.07739, simple_loss=0.08666, pruned_loss=0.02, audio_tagging_loss=0.01407, over 15804.00 frames. ], tot_loss[loss=0.08774, simple_loss=0.1065, pruned_loss=0.02413, audio_tagging_loss=0.01035, over 3040497.99 frames. ], batch size: 60, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:36:15,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=696306.6666666666, ans=0.0 2023-11-19 10:36:32,347 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.81 vs. limit=15.0 2023-11-19 10:36:50,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=696573.3333333334, ans=0.07 2023-11-19 10:36:52,155 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8300, loss[loss=0.08417, simple_loss=0.1108, pruned_loss=0.01957, audio_tagging_loss=0.009179, over 15766.00 frames. ], tot_loss[loss=0.08661, simple_loss=0.1051, pruned_loss=0.02361, audio_tagging_loss=0.01043, over 3041921.31 frames. ], batch size: 59, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:36:59,870 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=696573.3333333334, ans=0.125 2023-11-19 10:37:06,103 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.77 vs. limit=15.0 2023-11-19 10:37:06,704 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=696640.0, ans=0.0 2023-11-19 10:37:18,735 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.294e+01 8.396e+01 9.218e+01 1.018e+02 1.275e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 10:37:18,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=696706.6666666666, ans=0.0 2023-11-19 10:37:24,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=696773.3333333334, ans=0.0 2023-11-19 10:37:29,803 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.58 vs. limit=15.0 2023-11-19 10:37:31,663 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=696773.3333333334, ans=0.125 2023-11-19 10:37:38,437 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=696840.0, ans=0.0 2023-11-19 10:37:42,614 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=696840.0, ans=0.035 2023-11-19 10:37:43,117 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=6.0 2023-11-19 10:37:47,185 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8350, loss[loss=0.06807, simple_loss=0.07724, pruned_loss=0.01614, audio_tagging_loss=0.01331, over 15976.00 frames. ], tot_loss[loss=0.08649, simple_loss=0.1051, pruned_loss=0.02345, audio_tagging_loss=0.01049, over 3049866.49 frames. ], batch size: 61, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:37:57,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=696973.3333333334, ans=0.125 2023-11-19 10:38:10,573 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=697040.0, ans=0.0 2023-11-19 10:38:25,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=697106.6666666666, ans=0.125 2023-11-19 10:38:43,137 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8400, loss[loss=0.07335, simple_loss=0.08273, pruned_loss=0.01985, audio_tagging_loss=0.01214, over 14367.00 frames. ], tot_loss[loss=0.08639, simple_loss=0.1051, pruned_loss=0.02336, audio_tagging_loss=0.01048, over 3054843.33 frames. ], batch size: 55, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:39:00,767 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-19 10:39:09,204 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.657e+01 8.184e+01 9.115e+01 9.863e+01 1.459e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-19 10:39:37,755 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8450, loss[loss=0.0929, simple_loss=0.1133, pruned_loss=0.02428, audio_tagging_loss=0.01196, over 15179.00 frames. ], tot_loss[loss=0.08656, simple_loss=0.1054, pruned_loss=0.0234, audio_tagging_loss=0.01046, over 3053068.51 frames. ], batch size: 56, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:39:43,791 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=697573.3333333334, ans=0.125 2023-11-19 10:39:49,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=697640.0, ans=0.0 2023-11-19 10:40:05,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=697706.6666666666, ans=0.125 2023-11-19 10:40:19,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=697773.3333333334, ans=0.1 2023-11-19 10:40:23,854 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=697840.0, ans=0.125 2023-11-19 10:40:25,093 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.19 vs. limit=15.0 2023-11-19 10:40:33,449 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8500, loss[loss=0.07228, simple_loss=0.08037, pruned_loss=0.01654, audio_tagging_loss=0.01555, over 14078.00 frames. ], tot_loss[loss=0.08662, simple_loss=0.1058, pruned_loss=0.02325, audio_tagging_loss=0.01049, over 3053374.42 frames. ], batch size: 54, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:40:42,051 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2023-11-19 10:40:47,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=697973.3333333334, ans=0.1 2023-11-19 10:40:48,500 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=697973.3333333334, ans=0.2 2023-11-19 10:40:51,670 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=697973.3333333334, ans=0.125 2023-11-19 10:40:59,992 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.736e+01 1.015e+02 1.178e+02 2.396e+02, threshold=2.030e+02, percent-clipped=2.0 2023-11-19 10:41:12,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=698106.6666666666, ans=0.07 2023-11-19 10:41:29,462 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8550, loss[loss=0.05662, simple_loss=0.06496, pruned_loss=0.01205, audio_tagging_loss=0.01209, over 14446.00 frames. ], tot_loss[loss=0.08696, simple_loss=0.1058, pruned_loss=0.02349, audio_tagging_loss=0.01055, over 3045970.06 frames. ], batch size: 56, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:41:29,719 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=698240.0, ans=0.125 2023-11-19 10:41:36,225 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.40 vs. limit=15.0 2023-11-19 10:41:38,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=698240.0, ans=0.125 2023-11-19 10:41:44,424 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=698306.6666666666, ans=0.05 2023-11-19 10:41:48,248 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.31 vs. limit=15.0 2023-11-19 10:41:48,331 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.89 vs. limit=15.0 2023-11-19 10:42:13,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=698506.6666666666, ans=0.0 2023-11-19 10:42:23,431 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.02 vs. limit=15.0 2023-11-19 10:42:23,881 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8600, loss[loss=0.12, simple_loss=0.1447, pruned_loss=0.03945, audio_tagging_loss=0.00822, over 14921.00 frames. ], tot_loss[loss=0.08702, simple_loss=0.1059, pruned_loss=0.02353, audio_tagging_loss=0.01054, over 3046624.53 frames. ], batch size: 56, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:42:29,994 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=698573.3333333334, ans=0.1 2023-11-19 10:42:41,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=698640.0, ans=0.0 2023-11-19 10:42:41,490 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2023-11-19 10:42:50,808 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.955e+01 8.414e+01 9.085e+01 1.004e+02 1.428e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 10:42:52,099 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=698706.6666666666, ans=0.125 2023-11-19 10:43:02,188 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=698773.3333333334, ans=0.2 2023-11-19 10:43:05,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=698773.3333333334, ans=0.125 2023-11-19 10:43:06,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=698773.3333333334, ans=0.0 2023-11-19 10:43:07,637 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=698840.0, ans=0.125 2023-11-19 10:43:08,609 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:43:19,388 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8650, loss[loss=0.08245, simple_loss=0.1061, pruned_loss=0.0205, audio_tagging_loss=0.008909, over 16117.00 frames. ], tot_loss[loss=0.08722, simple_loss=0.1062, pruned_loss=0.02356, audio_tagging_loss=0.01057, over 3048128.28 frames. ], batch size: 60, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:43:31,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=698973.3333333334, ans=0.0 2023-11-19 10:43:32,171 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=698973.3333333334, ans=0.05 2023-11-19 10:43:38,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=698973.3333333334, ans=0.125 2023-11-19 10:43:41,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=699040.0, ans=0.125 2023-11-19 10:44:15,097 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8700, loss[loss=0.09083, simple_loss=0.1054, pruned_loss=0.02365, audio_tagging_loss=0.0145, over 14896.00 frames. ], tot_loss[loss=0.08718, simple_loss=0.1061, pruned_loss=0.02341, audio_tagging_loss=0.01074, over 3049238.82 frames. ], batch size: 55, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:44:35,490 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:44:39,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=699373.3333333334, ans=0.125 2023-11-19 10:44:41,447 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.962e+01 8.409e+01 9.264e+01 1.013e+02 1.808e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-19 10:44:43,777 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=699373.3333333334, ans=0.125 2023-11-19 10:45:07,946 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.01 vs. limit=15.0 2023-11-19 10:45:10,501 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8750, loss[loss=0.06886, simple_loss=0.07173, pruned_loss=0.02011, audio_tagging_loss=0.01289, over 15646.00 frames. ], tot_loss[loss=0.08769, simple_loss=0.1069, pruned_loss=0.02363, audio_tagging_loss=0.01063, over 3053463.59 frames. ], batch size: 60, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:45:11,021 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=699573.3333333334, ans=22.5 2023-11-19 10:45:12,727 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=699573.3333333334, ans=0.1 2023-11-19 10:45:14,136 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.61 vs. limit=15.0 2023-11-19 10:45:20,653 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=699640.0, ans=0.125 2023-11-19 10:45:24,927 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=699640.0, ans=0.1 2023-11-19 10:45:44,186 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.94 vs. limit=6.0 2023-11-19 10:45:45,932 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=699773.3333333334, ans=0.125 2023-11-19 10:46:05,643 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8800, loss[loss=0.06476, simple_loss=0.07313, pruned_loss=0.01432, audio_tagging_loss=0.01387, over 16451.00 frames. ], tot_loss[loss=0.08863, simple_loss=0.1076, pruned_loss=0.02408, audio_tagging_loss=0.01075, over 3052890.70 frames. ], batch size: 62, lr: 7.60e-03, grad_scale: 32.0 2023-11-19 10:46:19,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=699973.3333333334, ans=0.0 2023-11-19 10:46:20,943 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.81 vs. limit=10.0 2023-11-19 10:46:33,878 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.476e+01 9.103e+01 9.978e+01 1.212e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-19 10:46:56,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=700173.3333333334, ans=0.2 2023-11-19 10:47:01,761 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8850, loss[loss=0.07592, simple_loss=0.09074, pruned_loss=0.01762, audio_tagging_loss=0.01293, over 15308.00 frames. ], tot_loss[loss=0.08893, simple_loss=0.1083, pruned_loss=0.02416, audio_tagging_loss=0.01063, over 3053525.36 frames. ], batch size: 57, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:47:04,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=700240.0, ans=0.0 2023-11-19 10:47:12,257 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:47:12,467 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=700306.6666666666, ans=0.2 2023-11-19 10:47:17,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=700306.6666666666, ans=0.125 2023-11-19 10:47:54,449 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.12 vs. limit=15.0 2023-11-19 10:47:55,358 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:47:56,186 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8900, loss[loss=0.07722, simple_loss=0.1025, pruned_loss=0.01749, audio_tagging_loss=0.008491, over 15651.00 frames. ], tot_loss[loss=0.08828, simple_loss=0.1078, pruned_loss=0.02392, audio_tagging_loss=0.01044, over 3055412.32 frames. ], batch size: 58, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:48:24,283 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.75 vs. limit=15.0 2023-11-19 10:48:25,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=700706.6666666666, ans=0.05 2023-11-19 10:48:25,771 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.967e+01 8.384e+01 9.220e+01 1.025e+02 1.340e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 10:48:30,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=700773.3333333334, ans=0.125 2023-11-19 10:48:52,100 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 8950, loss[loss=0.07304, simple_loss=0.08364, pruned_loss=0.02066, audio_tagging_loss=0.01056, over 15442.00 frames. ], tot_loss[loss=0.0883, simple_loss=0.1079, pruned_loss=0.02402, audio_tagging_loss=0.01032, over 3054255.98 frames. ], batch size: 58, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:48:59,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=700906.6666666666, ans=0.0 2023-11-19 10:49:29,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=701106.6666666666, ans=0.125 2023-11-19 10:49:31,521 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:49:47,835 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9000, loss[loss=0.08156, simple_loss=0.09803, pruned_loss=0.02131, audio_tagging_loss=0.01124, over 14549.00 frames. ], tot_loss[loss=0.08777, simple_loss=0.1069, pruned_loss=0.02396, audio_tagging_loss=0.01035, over 3052831.86 frames. ], batch size: 56, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:49:47,836 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 10:50:20,517 INFO [train_asr.py:1147] (2/4) Epoch 9, validation: loss=0.06655, simple_loss=0.05588, pruned_loss=0.006694, audio_tagging_loss=0.03192, over 4681554.00 frames. 2023-11-19 10:50:20,518 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 10:50:49,518 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.74 vs. limit=6.0 2023-11-19 10:50:50,123 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.156e+01 8.217e+01 9.187e+01 1.011e+02 1.342e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-19 10:50:57,092 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.05 vs. limit=15.0 2023-11-19 10:51:14,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=701506.6666666666, ans=0.125 2023-11-19 10:51:16,394 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9050, loss[loss=0.09651, simple_loss=0.1171, pruned_loss=0.0259, audio_tagging_loss=0.01206, over 16631.00 frames. ], tot_loss[loss=0.08779, simple_loss=0.1072, pruned_loss=0.02385, audio_tagging_loss=0.01035, over 3049759.40 frames. ], batch size: 64, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:51:25,806 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.85 vs. limit=22.5 2023-11-19 10:51:39,190 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.51 vs. limit=15.0 2023-11-19 10:51:46,563 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=701706.6666666666, ans=0.0 2023-11-19 10:51:57,758 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=701773.3333333334, ans=0.125 2023-11-19 10:52:05,639 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=701840.0, ans=0.125 2023-11-19 10:52:12,353 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9100, loss[loss=0.0905, simple_loss=0.102, pruned_loss=0.02931, audio_tagging_loss=0.0102, over 14439.00 frames. ], tot_loss[loss=0.08792, simple_loss=0.1073, pruned_loss=0.02395, audio_tagging_loss=0.01033, over 3051348.23 frames. ], batch size: 53, lr: 7.59e-03, grad_scale: 8.0 2023-11-19 10:52:14,006 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2023-11-19 10:52:17,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=701906.6666666666, ans=0.0 2023-11-19 10:52:41,734 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.498e+01 9.044e+01 9.975e+01 1.337e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-19 10:52:48,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=702106.6666666666, ans=0.1 2023-11-19 10:53:01,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=702173.3333333334, ans=0.95 2023-11-19 10:53:06,366 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=702240.0, ans=0.125 2023-11-19 10:53:07,270 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9150, loss[loss=0.09898, simple_loss=0.1265, pruned_loss=0.02961, audio_tagging_loss=0.006112, over 15103.00 frames. ], tot_loss[loss=0.08765, simple_loss=0.1069, pruned_loss=0.02386, audio_tagging_loss=0.01032, over 3047215.10 frames. ], batch size: 55, lr: 7.59e-03, grad_scale: 8.0 2023-11-19 10:53:08,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=702240.0, ans=0.125 2023-11-19 10:53:47,672 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:53:59,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=702506.6666666666, ans=0.125 2023-11-19 10:54:02,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=702573.3333333334, ans=0.125 2023-11-19 10:54:02,852 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9200, loss[loss=0.07327, simple_loss=0.0896, pruned_loss=0.01954, audio_tagging_loss=0.008927, over 16286.00 frames. ], tot_loss[loss=0.0871, simple_loss=0.1065, pruned_loss=0.02356, audio_tagging_loss=0.01027, over 3047537.75 frames. ], batch size: 64, lr: 7.59e-03, grad_scale: 16.0 2023-11-19 10:54:16,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=702640.0, ans=0.1 2023-11-19 10:54:29,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=702706.6666666666, ans=0.125 2023-11-19 10:54:32,965 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.81 vs. limit=22.5 2023-11-19 10:54:33,448 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.309e+01 9.173e+01 1.001e+02 1.258e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 10:54:36,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=702773.3333333334, ans=0.1 2023-11-19 10:54:44,013 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.60 vs. limit=12.0 2023-11-19 10:54:44,873 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=702773.3333333334, ans=0.125 2023-11-19 10:54:46,283 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.92 vs. limit=15.0 2023-11-19 10:54:59,956 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9250, loss[loss=0.09008, simple_loss=0.1089, pruned_loss=0.02178, audio_tagging_loss=0.01385, over 14738.00 frames. ], tot_loss[loss=0.08685, simple_loss=0.1059, pruned_loss=0.02361, audio_tagging_loss=0.01031, over 3050096.07 frames. ], batch size: 54, lr: 7.59e-03, grad_scale: 16.0 2023-11-19 10:55:40,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=703106.6666666666, ans=0.125 2023-11-19 10:55:50,322 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=703173.3333333334, ans=0.125 2023-11-19 10:55:51,332 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=703173.3333333334, ans=10.0 2023-11-19 10:55:53,392 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=703240.0, ans=0.125 2023-11-19 10:55:54,233 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9300, loss[loss=0.133, simple_loss=0.1592, pruned_loss=0.04548, audio_tagging_loss=0.007877, over 14761.00 frames. ], tot_loss[loss=0.08654, simple_loss=0.1051, pruned_loss=0.02355, audio_tagging_loss=0.01042, over 3057426.97 frames. ], batch size: 56, lr: 7.59e-03, grad_scale: 16.0 2023-11-19 10:56:05,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=703306.6666666666, ans=0.0 2023-11-19 10:56:17,988 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.34 vs. limit=8.0 2023-11-19 10:56:25,082 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.507e+01 9.203e+01 9.999e+01 1.405e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 10:56:28,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=703440.0, ans=0.125 2023-11-19 10:56:34,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=703440.0, ans=0.125 2023-11-19 10:56:41,987 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=703506.6666666666, ans=0.04949747468305833 2023-11-19 10:56:50,086 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9350, loss[loss=0.08147, simple_loss=0.09787, pruned_loss=0.02025, audio_tagging_loss=0.01229, over 15353.00 frames. ], tot_loss[loss=0.08661, simple_loss=0.1053, pruned_loss=0.02352, audio_tagging_loss=0.01045, over 3055255.75 frames. ], batch size: 58, lr: 7.59e-03, grad_scale: 16.0 2023-11-19 10:56:50,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=703573.3333333334, ans=0.0 2023-11-19 10:57:41,592 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.59 vs. limit=22.5 2023-11-19 10:57:46,473 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9400, loss[loss=0.1015, simple_loss=0.1243, pruned_loss=0.03126, audio_tagging_loss=0.008099, over 15283.00 frames. ], tot_loss[loss=0.0873, simple_loss=0.106, pruned_loss=0.02381, audio_tagging_loss=0.01051, over 3059357.84 frames. ], batch size: 57, lr: 7.58e-03, grad_scale: 16.0 2023-11-19 10:58:02,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=703973.3333333334, ans=0.125 2023-11-19 10:58:06,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=703973.3333333334, ans=0.1 2023-11-19 10:58:12,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=704040.0, ans=0.1 2023-11-19 10:58:15,736 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 8.579e+01 9.388e+01 1.029e+02 1.331e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-19 10:58:15,940 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=704040.0, ans=0.0 2023-11-19 10:58:18,455 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.74 vs. limit=15.0 2023-11-19 10:58:22,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=704106.6666666666, ans=0.125 2023-11-19 10:58:29,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=704106.6666666666, ans=0.0 2023-11-19 10:58:32,593 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=704173.3333333334, ans=0.2 2023-11-19 10:58:39,708 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:58:40,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=704240.0, ans=0.05 2023-11-19 10:58:41,781 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9450, loss[loss=0.07953, simple_loss=0.1013, pruned_loss=0.01948, audio_tagging_loss=0.009384, over 14732.00 frames. ], tot_loss[loss=0.08685, simple_loss=0.1052, pruned_loss=0.02352, audio_tagging_loss=0.01071, over 3064578.46 frames. ], batch size: 55, lr: 7.58e-03, grad_scale: 16.0 2023-11-19 10:58:54,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=704306.6666666666, ans=0.125 2023-11-19 10:58:55,630 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.80 vs. limit=15.0 2023-11-19 10:59:00,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=704306.6666666666, ans=0.0 2023-11-19 10:59:23,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=704440.0, ans=0.2 2023-11-19 10:59:26,830 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=704506.6666666666, ans=0.0 2023-11-19 10:59:30,872 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.62 vs. limit=6.0 2023-11-19 10:59:36,678 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9500, loss[loss=0.109, simple_loss=0.1221, pruned_loss=0.03673, audio_tagging_loss=0.01119, over 15963.00 frames. ], tot_loss[loss=0.08724, simple_loss=0.1056, pruned_loss=0.02376, audio_tagging_loss=0.01068, over 3061643.88 frames. ], batch size: 59, lr: 7.58e-03, grad_scale: 16.0 2023-11-19 10:59:36,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=704573.3333333334, ans=0.2 2023-11-19 10:59:50,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=704640.0, ans=0.125 2023-11-19 10:59:58,963 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.53 vs. limit=10.0 2023-11-19 11:00:06,711 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.471e+01 8.371e+01 9.001e+01 9.881e+01 1.664e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 11:00:15,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=704773.3333333334, ans=0.0 2023-11-19 11:00:17,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=704773.3333333334, ans=0.125 2023-11-19 11:00:29,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=704840.0, ans=0.125 2023-11-19 11:00:30,558 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.28 vs. limit=10.0 2023-11-19 11:00:32,115 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9550, loss[loss=0.08276, simple_loss=0.1053, pruned_loss=0.02092, audio_tagging_loss=0.009206, over 14526.00 frames. ], tot_loss[loss=0.08687, simple_loss=0.1049, pruned_loss=0.0235, audio_tagging_loss=0.01091, over 3055510.34 frames. ], batch size: 53, lr: 7.58e-03, grad_scale: 16.0 2023-11-19 11:00:33,346 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=704906.6666666666, ans=0.1 2023-11-19 11:00:37,007 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=704906.6666666666, ans=0.125 2023-11-19 11:00:45,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=704973.3333333334, ans=10.0 2023-11-19 11:00:47,086 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=704973.3333333334, ans=0.125 2023-11-19 11:00:49,352 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=704973.3333333334, ans=0.125 2023-11-19 11:00:54,291 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2023-11-19 11:01:28,006 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9600, loss[loss=0.09822, simple_loss=0.1236, pruned_loss=0.02671, audio_tagging_loss=0.009716, over 15828.00 frames. ], tot_loss[loss=0.0869, simple_loss=0.1053, pruned_loss=0.02345, audio_tagging_loss=0.01083, over 3064668.63 frames. ], batch size: 62, lr: 7.58e-03, grad_scale: 32.0 2023-11-19 11:01:49,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=705373.3333333334, ans=0.0 2023-11-19 11:01:58,228 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.539e+01 8.373e+01 9.061e+01 9.893e+01 1.304e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-19 11:02:11,526 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=705506.6666666666, ans=0.1 2023-11-19 11:02:16,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=705506.6666666666, ans=0.125 2023-11-19 11:02:22,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=705573.3333333334, ans=0.0 2023-11-19 11:02:22,826 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2023-11-19 11:02:23,462 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9650, loss[loss=0.1008, simple_loss=0.1264, pruned_loss=0.02682, audio_tagging_loss=0.01074, over 15675.00 frames. ], tot_loss[loss=0.0857, simple_loss=0.1034, pruned_loss=0.02308, audio_tagging_loss=0.01089, over 3056040.92 frames. ], batch size: 57, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:02:41,992 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.43 vs. limit=15.0 2023-11-19 11:02:48,340 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=705706.6666666666, ans=0.125 2023-11-19 11:02:57,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=705773.3333333334, ans=0.1 2023-11-19 11:03:01,393 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=705773.3333333334, ans=0.125 2023-11-19 11:03:05,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=705773.3333333334, ans=0.125 2023-11-19 11:03:18,569 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9700, loss[loss=0.07237, simple_loss=0.0785, pruned_loss=0.0203, audio_tagging_loss=0.01282, over 14906.00 frames. ], tot_loss[loss=0.08566, simple_loss=0.1035, pruned_loss=0.02325, audio_tagging_loss=0.01064, over 3049823.70 frames. ], batch size: 56, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:03:48,685 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.927e+01 8.495e+01 9.250e+01 1.013e+02 1.601e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-19 11:03:50,094 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=706040.0, ans=0.125 2023-11-19 11:03:56,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=706106.6666666666, ans=0.1 2023-11-19 11:04:09,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=706173.3333333334, ans=0.125 2023-11-19 11:04:13,978 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9750, loss[loss=0.07374, simple_loss=0.08683, pruned_loss=0.02089, audio_tagging_loss=0.009434, over 15048.00 frames. ], tot_loss[loss=0.08593, simple_loss=0.1042, pruned_loss=0.0234, audio_tagging_loss=0.01046, over 3048310.32 frames. ], batch size: 55, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:04:29,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=706306.6666666666, ans=0.125 2023-11-19 11:04:30,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=706306.6666666666, ans=0.09899494936611666 2023-11-19 11:04:32,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=706306.6666666666, ans=0.2 2023-11-19 11:04:37,792 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.44 vs. limit=10.0 2023-11-19 11:04:45,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=706373.3333333334, ans=0.1 2023-11-19 11:04:51,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=706440.0, ans=0.0 2023-11-19 11:04:59,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=706506.6666666666, ans=0.125 2023-11-19 11:05:00,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=706506.6666666666, ans=0.1 2023-11-19 11:05:02,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=706506.6666666666, ans=0.125 2023-11-19 11:05:04,234 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=706506.6666666666, ans=0.1 2023-11-19 11:05:09,868 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9800, loss[loss=0.1257, simple_loss=0.1601, pruned_loss=0.03611, audio_tagging_loss=0.009539, over 15761.00 frames. ], tot_loss[loss=0.08635, simple_loss=0.1046, pruned_loss=0.02359, audio_tagging_loss=0.01045, over 3039725.38 frames. ], batch size: 55, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:05:12,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=706573.3333333334, ans=0.035 2023-11-19 11:05:18,458 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=706573.3333333334, ans=0.125 2023-11-19 11:05:37,749 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=706706.6666666666, ans=0.125 2023-11-19 11:05:39,679 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.318e+01 9.223e+01 1.040e+02 1.482e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 11:05:54,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=706840.0, ans=0.0 2023-11-19 11:05:55,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=706840.0, ans=0.125 2023-11-19 11:05:59,821 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:06:03,763 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=706840.0, ans=0.125 2023-11-19 11:06:05,689 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9850, loss[loss=0.07905, simple_loss=0.1008, pruned_loss=0.01774, audio_tagging_loss=0.01093, over 14657.00 frames. ], tot_loss[loss=0.0869, simple_loss=0.1057, pruned_loss=0.0237, audio_tagging_loss=0.01036, over 3040153.70 frames. ], batch size: 56, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:06:09,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=706906.6666666666, ans=0.1 2023-11-19 11:06:20,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=706973.3333333334, ans=0.0 2023-11-19 11:06:24,920 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.74 vs. limit=6.0 2023-11-19 11:06:24,943 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.70 vs. limit=5.0 2023-11-19 11:06:38,620 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.08 vs. limit=10.0 2023-11-19 11:07:01,182 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9900, loss[loss=0.05949, simple_loss=0.07099, pruned_loss=0.01561, audio_tagging_loss=0.008384, over 15460.00 frames. ], tot_loss[loss=0.08798, simple_loss=0.1074, pruned_loss=0.02401, audio_tagging_loss=0.01029, over 3036313.86 frames. ], batch size: 60, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:07:06,690 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=707240.0, ans=0.0 2023-11-19 11:07:19,507 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=707306.6666666666, ans=0.125 2023-11-19 11:07:28,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=707373.3333333334, ans=15.0 2023-11-19 11:07:31,311 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.468e+01 8.507e+01 9.296e+01 1.006e+02 1.418e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 11:07:38,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=707440.0, ans=0.1 2023-11-19 11:07:42,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=707440.0, ans=0.07 2023-11-19 11:07:56,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=707573.3333333334, ans=0.125 2023-11-19 11:07:57,329 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 9950, loss[loss=0.09206, simple_loss=0.1025, pruned_loss=0.02723, audio_tagging_loss=0.01359, over 15385.00 frames. ], tot_loss[loss=0.08783, simple_loss=0.1074, pruned_loss=0.0239, audio_tagging_loss=0.01024, over 3037082.66 frames. ], batch size: 57, lr: 7.56e-03, grad_scale: 32.0 2023-11-19 11:08:00,987 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.62 vs. limit=15.0 2023-11-19 11:08:15,966 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=707640.0, ans=0.0 2023-11-19 11:08:16,289 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.96 vs. limit=22.5 2023-11-19 11:08:26,166 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=707706.6666666666, ans=0.0 2023-11-19 11:08:27,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=707706.6666666666, ans=15.0 2023-11-19 11:08:35,116 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.21 vs. limit=22.5 2023-11-19 11:08:41,333 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.51 vs. limit=15.0 2023-11-19 11:08:52,384 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10000, loss[loss=0.08934, simple_loss=0.1109, pruned_loss=0.02256, audio_tagging_loss=0.01135, over 15145.00 frames. ], tot_loss[loss=0.08771, simple_loss=0.1075, pruned_loss=0.02376, audio_tagging_loss=0.01021, over 3039135.53 frames. ], batch size: 57, lr: 7.56e-03, grad_scale: 32.0 2023-11-19 11:09:07,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=707973.3333333334, ans=0.125 2023-11-19 11:09:23,368 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.949e+01 8.309e+01 8.896e+01 1.009e+02 1.315e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-19 11:09:24,866 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=12.0 2023-11-19 11:09:49,104 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10050, loss[loss=0.09801, simple_loss=0.1212, pruned_loss=0.02665, audio_tagging_loss=0.01077, over 15321.00 frames. ], tot_loss[loss=0.08784, simple_loss=0.1076, pruned_loss=0.02383, audio_tagging_loss=0.01021, over 3042695.34 frames. ], batch size: 57, lr: 7.56e-03, grad_scale: 32.0 2023-11-19 11:09:51,454 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=708240.0, ans=0.125 2023-11-19 11:09:59,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=708306.6666666666, ans=0.125 2023-11-19 11:09:59,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=708306.6666666666, ans=0.0 2023-11-19 11:10:21,756 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=708440.0, ans=0.1 2023-11-19 11:10:33,907 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=708506.6666666666, ans=0.1 2023-11-19 11:10:44,168 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10100, loss[loss=0.116, simple_loss=0.1457, pruned_loss=0.03169, audio_tagging_loss=0.01145, over 14810.00 frames. ], tot_loss[loss=0.08826, simple_loss=0.108, pruned_loss=0.02399, audio_tagging_loss=0.01026, over 3046365.71 frames. ], batch size: 56, lr: 7.56e-03, grad_scale: 32.0 2023-11-19 11:11:02,824 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=708640.0, ans=0.125 2023-11-19 11:11:15,749 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.397e+01 8.991e+01 1.000e+02 1.217e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 11:11:29,052 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:11:31,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=708840.0, ans=0.1 2023-11-19 11:11:34,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=708840.0, ans=0.125 2023-11-19 11:11:40,079 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10150, loss[loss=0.1229, simple_loss=0.1514, pruned_loss=0.03831, audio_tagging_loss=0.008906, over 15109.00 frames. ], tot_loss[loss=0.08889, simple_loss=0.1086, pruned_loss=0.02428, audio_tagging_loss=0.0103, over 3046384.17 frames. ], batch size: 55, lr: 7.56e-03, grad_scale: 16.0 2023-11-19 11:11:45,194 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=708906.6666666666, ans=0.1 2023-11-19 11:11:52,047 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=708973.3333333334, ans=0.125 2023-11-19 11:12:05,495 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:12:06,720 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=709040.0, ans=0.125 2023-11-19 11:12:12,530 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=709106.6666666666, ans=0.2 2023-11-19 11:12:12,768 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.72 vs. limit=10.0 2023-11-19 11:12:15,141 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.25 vs. limit=15.0 2023-11-19 11:12:26,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=709173.3333333334, ans=0.035 2023-11-19 11:12:26,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=709173.3333333334, ans=0.125 2023-11-19 11:12:34,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=709173.3333333334, ans=0.2 2023-11-19 11:12:36,051 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10200, loss[loss=0.08988, simple_loss=0.1034, pruned_loss=0.02553, audio_tagging_loss=0.01267, over 15274.00 frames. ], tot_loss[loss=0.08845, simple_loss=0.1077, pruned_loss=0.02419, audio_tagging_loss=0.0104, over 3051529.16 frames. ], batch size: 58, lr: 7.56e-03, grad_scale: 16.0 2023-11-19 11:12:49,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=709306.6666666666, ans=0.0 2023-11-19 11:12:56,608 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:12:59,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=709373.3333333334, ans=0.0 2023-11-19 11:13:06,566 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.611e+01 9.302e+01 1.032e+02 1.464e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 11:13:07,672 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.20 vs. limit=6.0 2023-11-19 11:13:12,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=709440.0, ans=0.0 2023-11-19 11:13:12,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=709440.0, ans=0.2 2023-11-19 11:13:30,927 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10250, loss[loss=0.08839, simple_loss=0.1168, pruned_loss=0.01812, audio_tagging_loss=0.01187, over 14878.00 frames. ], tot_loss[loss=0.08742, simple_loss=0.1062, pruned_loss=0.02378, audio_tagging_loss=0.01054, over 3049835.03 frames. ], batch size: 56, lr: 7.55e-03, grad_scale: 16.0 2023-11-19 11:13:41,498 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.65 vs. limit=15.0 2023-11-19 11:13:43,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=709640.0, ans=0.125 2023-11-19 11:13:50,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=709640.0, ans=0.0 2023-11-19 11:14:00,054 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.80 vs. limit=6.0 2023-11-19 11:14:26,457 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10300, loss[loss=0.0991, simple_loss=0.1259, pruned_loss=0.02652, audio_tagging_loss=0.009632, over 16035.00 frames. ], tot_loss[loss=0.08696, simple_loss=0.1056, pruned_loss=0.02347, audio_tagging_loss=0.0107, over 3053500.49 frames. ], batch size: 60, lr: 7.55e-03, grad_scale: 16.0 2023-11-19 11:14:44,217 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=709973.3333333334, ans=0.125 2023-11-19 11:14:50,539 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=710040.0, ans=0.0 2023-11-19 11:14:51,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=710040.0, ans=0.07 2023-11-19 11:14:58,177 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 8.163e+01 8.880e+01 9.962e+01 1.363e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-19 11:15:15,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=710173.3333333334, ans=0.125 2023-11-19 11:15:23,131 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10350, loss[loss=0.0643, simple_loss=0.07209, pruned_loss=0.01661, audio_tagging_loss=0.01164, over 13750.00 frames. ], tot_loss[loss=0.08813, simple_loss=0.1069, pruned_loss=0.02405, audio_tagging_loss=0.01065, over 3060872.14 frames. ], batch size: 54, lr: 7.55e-03, grad_scale: 16.0 2023-11-19 11:15:37,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=710306.6666666666, ans=0.125 2023-11-19 11:15:37,181 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=710306.6666666666, ans=0.025 2023-11-19 11:15:54,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=710440.0, ans=0.1 2023-11-19 11:16:08,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=710506.6666666666, ans=0.2 2023-11-19 11:16:09,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=710506.6666666666, ans=0.0 2023-11-19 11:16:12,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=710506.6666666666, ans=0.07 2023-11-19 11:16:18,292 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10400, loss[loss=0.09204, simple_loss=0.09767, pruned_loss=0.03045, audio_tagging_loss=0.01275, over 15651.00 frames. ], tot_loss[loss=0.08821, simple_loss=0.1069, pruned_loss=0.02401, audio_tagging_loss=0.01072, over 3064908.13 frames. ], batch size: 59, lr: 7.55e-03, grad_scale: 32.0 2023-11-19 11:16:25,381 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=710573.3333333334, ans=0.125 2023-11-19 11:16:50,350 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.272e+01 8.278e+01 9.028e+01 1.000e+02 1.286e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-19 11:16:51,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=710773.3333333334, ans=0.1 2023-11-19 11:16:52,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=710773.3333333334, ans=0.125 2023-11-19 11:16:55,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=710773.3333333334, ans=0.0 2023-11-19 11:17:01,264 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=710773.3333333334, ans=0.1 2023-11-19 11:17:14,233 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10450, loss[loss=0.08469, simple_loss=0.1087, pruned_loss=0.02172, audio_tagging_loss=0.008601, over 15666.00 frames. ], tot_loss[loss=0.08749, simple_loss=0.1061, pruned_loss=0.0238, audio_tagging_loss=0.01062, over 3054948.55 frames. ], batch size: 57, lr: 7.55e-03, grad_scale: 32.0 2023-11-19 11:17:27,268 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=710973.3333333334, ans=0.025 2023-11-19 11:17:28,768 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2023-11-19 11:17:43,771 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=711040.0, ans=0.0 2023-11-19 11:17:52,869 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=711106.6666666666, ans=0.1 2023-11-19 11:18:02,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=711173.3333333334, ans=0.09899494936611666 2023-11-19 11:18:10,453 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10500, loss[loss=0.0678, simple_loss=0.07571, pruned_loss=0.01719, audio_tagging_loss=0.01276, over 15099.00 frames. ], tot_loss[loss=0.08708, simple_loss=0.1056, pruned_loss=0.02376, audio_tagging_loss=0.01055, over 3057307.06 frames. ], batch size: 60, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:18:28,082 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=711306.6666666666, ans=0.1 2023-11-19 11:18:29,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=711306.6666666666, ans=0.125 2023-11-19 11:18:32,845 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:18:34,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=711373.3333333334, ans=0.125 2023-11-19 11:18:41,179 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.988e+01 8.288e+01 9.011e+01 9.876e+01 1.227e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 11:18:45,106 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=711440.0, ans=0.125 2023-11-19 11:19:00,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=711506.6666666666, ans=0.05 2023-11-19 11:19:01,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=711506.6666666666, ans=0.125 2023-11-19 11:19:02,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=711506.6666666666, ans=0.125 2023-11-19 11:19:06,043 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10550, loss[loss=0.09214, simple_loss=0.1191, pruned_loss=0.02513, audio_tagging_loss=0.007467, over 14641.00 frames. ], tot_loss[loss=0.08661, simple_loss=0.1049, pruned_loss=0.02363, audio_tagging_loss=0.01053, over 3053895.46 frames. ], batch size: 52, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:19:32,053 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2023-11-19 11:19:41,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=711773.3333333334, ans=0.125 2023-11-19 11:19:51,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=711840.0, ans=0.125 2023-11-19 11:19:59,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=711840.0, ans=0.0 2023-11-19 11:20:01,628 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10600, loss[loss=0.1147, simple_loss=0.1469, pruned_loss=0.03168, audio_tagging_loss=0.009541, over 15594.00 frames. ], tot_loss[loss=0.08699, simple_loss=0.1054, pruned_loss=0.02379, audio_tagging_loss=0.01047, over 3047690.81 frames. ], batch size: 57, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:20:12,899 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=711973.3333333334, ans=0.1 2023-11-19 11:20:16,026 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=711973.3333333334, ans=0.0 2023-11-19 11:20:16,032 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=711973.3333333334, ans=0.125 2023-11-19 11:20:26,356 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=712040.0, ans=0.125 2023-11-19 11:20:30,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=712040.0, ans=0.125 2023-11-19 11:20:32,540 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 8.185e+01 9.113e+01 1.014e+02 1.245e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-19 11:20:33,094 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.71 vs. limit=15.0 2023-11-19 11:20:50,215 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=712173.3333333334, ans=0.0 2023-11-19 11:20:56,904 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10650, loss[loss=0.09118, simple_loss=0.1111, pruned_loss=0.02536, audio_tagging_loss=0.01026, over 15123.00 frames. ], tot_loss[loss=0.08785, simple_loss=0.1066, pruned_loss=0.02415, audio_tagging_loss=0.01038, over 3041518.08 frames. ], batch size: 57, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:21:06,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=712240.0, ans=0.95 2023-11-19 11:21:15,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=712306.6666666666, ans=0.125 2023-11-19 11:21:23,499 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:21:36,786 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:21:37,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=712440.0, ans=0.2 2023-11-19 11:21:45,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=712506.6666666666, ans=0.0 2023-11-19 11:21:53,344 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10700, loss[loss=0.08634, simple_loss=0.1008, pruned_loss=0.02348, audio_tagging_loss=0.01245, over 16305.00 frames. ], tot_loss[loss=0.08774, simple_loss=0.1067, pruned_loss=0.02406, audio_tagging_loss=0.01033, over 3046713.59 frames. ], batch size: 61, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:21:59,132 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.04 vs. limit=15.0 2023-11-19 11:22:04,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=712640.0, ans=0.125 2023-11-19 11:22:24,045 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 5.779e+01 8.116e+01 8.823e+01 9.508e+01 1.570e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-19 11:22:24,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=712706.6666666666, ans=0.125 2023-11-19 11:22:40,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=712840.0, ans=0.0 2023-11-19 11:22:49,115 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10750, loss[loss=0.08753, simple_loss=0.1128, pruned_loss=0.02386, audio_tagging_loss=0.007265, over 14461.00 frames. ], tot_loss[loss=0.08728, simple_loss=0.1062, pruned_loss=0.02388, audio_tagging_loss=0.01029, over 3045444.05 frames. ], batch size: 55, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:22:52,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=712906.6666666666, ans=0.2 2023-11-19 11:22:52,596 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=712906.6666666666, ans=0.125 2023-11-19 11:22:53,495 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=712906.6666666666, ans=0.0 2023-11-19 11:23:06,829 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=712973.3333333334, ans=0.0 2023-11-19 11:23:44,947 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10800, loss[loss=0.084, simple_loss=0.1047, pruned_loss=0.01908, audio_tagging_loss=0.0126, over 15430.00 frames. ], tot_loss[loss=0.08681, simple_loss=0.1057, pruned_loss=0.02363, audio_tagging_loss=0.01031, over 3054733.97 frames. ], batch size: 55, lr: 7.53e-03, grad_scale: 32.0 2023-11-19 11:24:10,165 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=713373.3333333334, ans=0.0 2023-11-19 11:24:16,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=713373.3333333334, ans=0.0 2023-11-19 11:24:17,313 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.693e+01 8.285e+01 8.937e+01 9.621e+01 1.564e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-19 11:24:18,856 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.41 vs. limit=15.0 2023-11-19 11:24:40,666 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10850, loss[loss=0.1105, simple_loss=0.1386, pruned_loss=0.03199, audio_tagging_loss=0.009201, over 16748.00 frames. ], tot_loss[loss=0.08693, simple_loss=0.1059, pruned_loss=0.02365, audio_tagging_loss=0.01031, over 3052067.99 frames. ], batch size: 59, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:24:48,930 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:24:56,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=713640.0, ans=0.0 2023-11-19 11:25:02,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=713706.6666666666, ans=0.0 2023-11-19 11:25:32,926 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:25:36,647 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10900, loss[loss=0.08762, simple_loss=0.1186, pruned_loss=0.01992, audio_tagging_loss=0.008385, over 15322.00 frames. ], tot_loss[loss=0.08737, simple_loss=0.1067, pruned_loss=0.02371, audio_tagging_loss=0.0103, over 3046053.10 frames. ], batch size: 56, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:25:43,473 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=15.0 2023-11-19 11:25:49,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=713973.3333333334, ans=0.2 2023-11-19 11:25:55,297 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=713973.3333333334, ans=0.1 2023-11-19 11:26:04,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=714040.0, ans=0.125 2023-11-19 11:26:06,745 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.55 vs. limit=15.0 2023-11-19 11:26:08,747 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.026e+01 8.550e+01 9.422e+01 1.018e+02 1.383e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-19 11:26:08,922 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=714106.6666666666, ans=0.1 2023-11-19 11:26:15,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=714106.6666666666, ans=0.0 2023-11-19 11:26:26,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=714173.3333333334, ans=0.125 2023-11-19 11:26:31,104 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=714240.0, ans=0.125 2023-11-19 11:26:31,289 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=714240.0, ans=15.0 2023-11-19 11:26:31,829 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 10950, loss[loss=0.07696, simple_loss=0.09438, pruned_loss=0.01712, audio_tagging_loss=0.01265, over 15227.00 frames. ], tot_loss[loss=0.08714, simple_loss=0.1064, pruned_loss=0.02362, audio_tagging_loss=0.01035, over 3047651.22 frames. ], batch size: 56, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:26:49,450 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:26:53,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=714373.3333333334, ans=0.1 2023-11-19 11:26:58,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=714373.3333333334, ans=0.1 2023-11-19 11:27:18,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=714506.6666666666, ans=0.2 2023-11-19 11:27:26,999 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11000, loss[loss=0.09888, simple_loss=0.119, pruned_loss=0.03075, audio_tagging_loss=0.008619, over 14715.00 frames. ], tot_loss[loss=0.08694, simple_loss=0.106, pruned_loss=0.02358, audio_tagging_loss=0.01037, over 3048963.46 frames. ], batch size: 56, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:27:33,988 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.51 vs. limit=10.0 2023-11-19 11:27:36,080 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:27:48,561 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=714706.6666666666, ans=0.0 2023-11-19 11:27:57,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=714706.6666666666, ans=0.1 2023-11-19 11:27:59,345 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.973e+01 8.544e+01 9.124e+01 1.001e+02 1.240e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 11:28:10,724 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=714840.0, ans=0.1 2023-11-19 11:28:13,713 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=714840.0, ans=0.0 2023-11-19 11:28:22,460 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11050, loss[loss=0.1026, simple_loss=0.1289, pruned_loss=0.02869, audio_tagging_loss=0.009493, over 15269.00 frames. ], tot_loss[loss=0.08737, simple_loss=0.1065, pruned_loss=0.02364, audio_tagging_loss=0.01046, over 3045329.09 frames. ], batch size: 56, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:28:27,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=714906.6666666666, ans=0.0 2023-11-19 11:28:37,709 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.44 vs. limit=15.0 2023-11-19 11:28:39,602 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=714973.3333333334, ans=0.0 2023-11-19 11:28:41,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=714973.3333333334, ans=0.0 2023-11-19 11:28:45,997 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=715040.0, ans=0.125 2023-11-19 11:29:05,159 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=22.5 2023-11-19 11:29:14,189 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.89 vs. limit=15.0 2023-11-19 11:29:15,076 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2023-11-19 11:29:16,003 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=715173.3333333334, ans=0.1 2023-11-19 11:29:17,830 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11100, loss[loss=0.08013, simple_loss=0.09438, pruned_loss=0.02085, audio_tagging_loss=0.01209, over 13817.00 frames. ], tot_loss[loss=0.08749, simple_loss=0.1063, pruned_loss=0.02369, audio_tagging_loss=0.01065, over 3044617.50 frames. ], batch size: 54, lr: 7.52e-03, grad_scale: 16.0 2023-11-19 11:29:43,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=715373.3333333334, ans=0.0 2023-11-19 11:29:50,610 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.681e+01 9.190e+01 1.002e+02 1.339e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 11:30:05,040 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=715506.6666666666, ans=0.125 2023-11-19 11:30:13,760 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11150, loss[loss=0.09677, simple_loss=0.1147, pruned_loss=0.02947, audio_tagging_loss=0.009969, over 14376.00 frames. ], tot_loss[loss=0.08795, simple_loss=0.1069, pruned_loss=0.02384, audio_tagging_loss=0.01067, over 3042822.30 frames. ], batch size: 57, lr: 7.52e-03, grad_scale: 16.0 2023-11-19 11:30:26,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=715640.0, ans=0.5 2023-11-19 11:30:28,858 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=715640.0, ans=0.2 2023-11-19 11:30:28,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=715640.0, ans=0.0 2023-11-19 11:30:35,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=715706.6666666666, ans=0.125 2023-11-19 11:30:37,267 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=715706.6666666666, ans=0.5 2023-11-19 11:30:38,835 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.55 vs. limit=15.0 2023-11-19 11:30:47,266 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=715773.3333333334, ans=0.125 2023-11-19 11:30:56,535 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.08 vs. limit=15.0 2023-11-19 11:31:01,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=715840.0, ans=0.0 2023-11-19 11:31:08,656 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11200, loss[loss=0.09739, simple_loss=0.1217, pruned_loss=0.02608, audio_tagging_loss=0.01045, over 15294.00 frames. ], tot_loss[loss=0.08769, simple_loss=0.1066, pruned_loss=0.02366, audio_tagging_loss=0.01076, over 3052437.56 frames. ], batch size: 57, lr: 7.52e-03, grad_scale: 32.0 2023-11-19 11:31:16,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=715906.6666666666, ans=0.0 2023-11-19 11:31:38,809 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=716040.0, ans=0.0 2023-11-19 11:31:41,667 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.895e+01 8.422e+01 9.292e+01 1.018e+02 1.279e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-19 11:32:05,054 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11250, loss[loss=0.08967, simple_loss=0.1048, pruned_loss=0.02533, audio_tagging_loss=0.01197, over 14842.00 frames. ], tot_loss[loss=0.0872, simple_loss=0.1058, pruned_loss=0.02353, audio_tagging_loss=0.01076, over 3048069.16 frames. ], batch size: 57, lr: 7.52e-03, grad_scale: 32.0 2023-11-19 11:32:08,902 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=716240.0, ans=0.0 2023-11-19 11:32:11,230 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.24 vs. limit=15.0 2023-11-19 11:32:12,389 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2023-11-19 11:32:15,466 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=22.5 2023-11-19 11:32:28,394 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=716373.3333333334, ans=0.125 2023-11-19 11:33:00,947 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11300, loss[loss=0.07298, simple_loss=0.09092, pruned_loss=0.01974, audio_tagging_loss=0.00778, over 14579.00 frames. ], tot_loss[loss=0.08704, simple_loss=0.1058, pruned_loss=0.02352, audio_tagging_loss=0.01063, over 3050120.83 frames. ], batch size: 55, lr: 7.52e-03, grad_scale: 32.0 2023-11-19 11:33:03,190 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=716573.3333333334, ans=0.04949747468305833 2023-11-19 11:33:17,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=716640.0, ans=0.07 2023-11-19 11:33:25,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=716706.6666666666, ans=0.0 2023-11-19 11:33:27,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=716706.6666666666, ans=0.125 2023-11-19 11:33:33,219 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.199e+01 8.698e+01 9.365e+01 1.016e+02 1.421e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-19 11:33:49,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=716840.0, ans=0.125 2023-11-19 11:33:50,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=716840.0, ans=0.95 2023-11-19 11:33:55,808 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11350, loss[loss=0.1168, simple_loss=0.1369, pruned_loss=0.0386, audio_tagging_loss=0.00972, over 14675.00 frames. ], tot_loss[loss=0.08644, simple_loss=0.1052, pruned_loss=0.0233, audio_tagging_loss=0.01056, over 3049972.13 frames. ], batch size: 54, lr: 7.51e-03, grad_scale: 32.0 2023-11-19 11:33:56,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=716906.6666666666, ans=0.1 2023-11-19 11:33:57,396 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-11-19 11:34:10,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=716973.3333333334, ans=0.2 2023-11-19 11:34:16,959 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.99 vs. limit=22.5 2023-11-19 11:34:31,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=717106.6666666666, ans=0.0 2023-11-19 11:34:40,456 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=717173.3333333334, ans=0.95 2023-11-19 11:34:41,435 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=717173.3333333334, ans=0.125 2023-11-19 11:34:42,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=717173.3333333334, ans=0.125 2023-11-19 11:34:51,227 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11400, loss[loss=0.09, simple_loss=0.1179, pruned_loss=0.02075, audio_tagging_loss=0.01032, over 15209.00 frames. ], tot_loss[loss=0.08653, simple_loss=0.1057, pruned_loss=0.02333, audio_tagging_loss=0.01036, over 3040792.59 frames. ], batch size: 56, lr: 7.51e-03, grad_scale: 32.0 2023-11-19 11:35:02,348 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.21 vs. limit=22.5 2023-11-19 11:35:03,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=717306.6666666666, ans=0.0 2023-11-19 11:35:03,514 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.35 vs. limit=15.0 2023-11-19 11:35:16,334 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=717373.3333333334, ans=0.0 2023-11-19 11:35:24,345 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.257e+01 8.952e+01 9.927e+01 1.340e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 11:35:31,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=717440.0, ans=0.125 2023-11-19 11:35:42,948 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=717506.6666666666, ans=0.125 2023-11-19 11:35:46,994 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11450, loss[loss=0.1244, simple_loss=0.1542, pruned_loss=0.03609, audio_tagging_loss=0.01117, over 15765.00 frames. ], tot_loss[loss=0.08675, simple_loss=0.1059, pruned_loss=0.02342, audio_tagging_loss=0.0104, over 3042917.41 frames. ], batch size: 54, lr: 7.51e-03, grad_scale: 16.0 2023-11-19 11:35:56,042 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:35:56,453 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.68 vs. limit=6.0 2023-11-19 11:36:09,375 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=717706.6666666666, ans=0.2 2023-11-19 11:36:11,758 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.36 vs. limit=15.0 2023-11-19 11:36:14,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=717706.6666666666, ans=0.125 2023-11-19 11:36:28,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=717773.3333333334, ans=0.0 2023-11-19 11:36:34,019 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.35 vs. limit=22.5 2023-11-19 11:36:41,944 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11500, loss[loss=0.08122, simple_loss=0.09454, pruned_loss=0.01828, audio_tagging_loss=0.01566, over 14405.00 frames. ], tot_loss[loss=0.08659, simple_loss=0.1056, pruned_loss=0.02328, audio_tagging_loss=0.0105, over 3039265.97 frames. ], batch size: 57, lr: 7.51e-03, grad_scale: 16.0 2023-11-19 11:36:50,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=717906.6666666666, ans=0.1 2023-11-19 11:36:51,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=717906.6666666666, ans=0.125 2023-11-19 11:37:15,739 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.085e+01 8.484e+01 9.226e+01 9.872e+01 1.474e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 11:37:21,318 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=718106.6666666666, ans=0.125 2023-11-19 11:37:36,698 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=718240.0, ans=0.125 2023-11-19 11:37:37,507 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11550, loss[loss=0.05405, simple_loss=0.06123, pruned_loss=0.01453, audio_tagging_loss=0.008901, over 14840.00 frames. ], tot_loss[loss=0.0863, simple_loss=0.105, pruned_loss=0.02319, audio_tagging_loss=0.01062, over 3038435.31 frames. ], batch size: 56, lr: 7.51e-03, grad_scale: 16.0 2023-11-19 11:37:45,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=718240.0, ans=0.2 2023-11-19 11:37:50,832 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.62 vs. limit=6.0 2023-11-19 11:37:56,325 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.04 vs. limit=12.0 2023-11-19 11:38:07,623 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=718373.3333333334, ans=0.125 2023-11-19 11:38:09,762 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=718440.0, ans=0.0 2023-11-19 11:38:11,672 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:38:25,975 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.21 vs. limit=12.0 2023-11-19 11:38:33,212 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11600, loss[loss=0.09892, simple_loss=0.1309, pruned_loss=0.02495, audio_tagging_loss=0.008538, over 16401.00 frames. ], tot_loss[loss=0.08706, simple_loss=0.1059, pruned_loss=0.02353, audio_tagging_loss=0.01058, over 3044552.99 frames. ], batch size: 57, lr: 7.51e-03, grad_scale: 32.0 2023-11-19 11:38:47,606 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=718640.0, ans=0.125 2023-11-19 11:39:06,019 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.247e+01 8.169e+01 9.328e+01 1.011e+02 1.273e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-19 11:39:12,033 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=718773.3333333334, ans=0.0 2023-11-19 11:39:21,697 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=718840.0, ans=0.2 2023-11-19 11:39:28,769 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11650, loss[loss=0.1147, simple_loss=0.1378, pruned_loss=0.03847, audio_tagging_loss=0.0074, over 15187.00 frames. ], tot_loss[loss=0.08726, simple_loss=0.1063, pruned_loss=0.02365, audio_tagging_loss=0.01044, over 3047121.74 frames. ], batch size: 57, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:39:30,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=718906.6666666666, ans=0.0 2023-11-19 11:39:33,337 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=718906.6666666666, ans=0.0 2023-11-19 11:40:15,536 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=719173.3333333334, ans=0.2 2023-11-19 11:40:24,341 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11700, loss[loss=0.09183, simple_loss=0.1199, pruned_loss=0.02509, audio_tagging_loss=0.006779, over 15105.00 frames. ], tot_loss[loss=0.08748, simple_loss=0.1065, pruned_loss=0.02377, audio_tagging_loss=0.01047, over 3049100.90 frames. ], batch size: 56, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:40:35,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=719306.6666666666, ans=0.07 2023-11-19 11:40:38,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=719306.6666666666, ans=0.1 2023-11-19 11:40:57,451 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 8.259e+01 8.836e+01 9.705e+01 1.355e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-19 11:41:19,872 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11750, loss[loss=0.0851, simple_loss=0.105, pruned_loss=0.01959, audio_tagging_loss=0.01303, over 16309.00 frames. ], tot_loss[loss=0.08791, simple_loss=0.1067, pruned_loss=0.02406, audio_tagging_loss=0.01048, over 3049256.80 frames. ], batch size: 59, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:41:25,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=719573.3333333334, ans=0.2 2023-11-19 11:41:28,153 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=719573.3333333334, ans=0.2 2023-11-19 11:41:45,554 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=719706.6666666666, ans=0.0 2023-11-19 11:41:46,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=719706.6666666666, ans=0.125 2023-11-19 11:41:49,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=719706.6666666666, ans=0.025 2023-11-19 11:41:53,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=719773.3333333334, ans=0.125 2023-11-19 11:42:08,865 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.11 vs. limit=15.0 2023-11-19 11:42:14,818 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11800, loss[loss=0.1081, simple_loss=0.1254, pruned_loss=0.03386, audio_tagging_loss=0.01158, over 14931.00 frames. ], tot_loss[loss=0.08781, simple_loss=0.1063, pruned_loss=0.02412, audio_tagging_loss=0.01054, over 3053847.16 frames. ], batch size: 56, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:42:15,037 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=719906.6666666666, ans=0.125 2023-11-19 11:42:44,882 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=720040.0, ans=0.1 2023-11-19 11:42:49,865 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.669e+01 9.441e+01 1.062e+02 1.406e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-19 11:43:03,440 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.67 vs. limit=15.0 2023-11-19 11:43:11,861 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11850, loss[loss=0.07969, simple_loss=0.09264, pruned_loss=0.01968, audio_tagging_loss=0.01368, over 14546.00 frames. ], tot_loss[loss=0.08795, simple_loss=0.1064, pruned_loss=0.02415, audio_tagging_loss=0.01061, over 3054299.27 frames. ], batch size: 57, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:43:30,640 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.56 vs. limit=22.5 2023-11-19 11:43:37,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=720373.3333333334, ans=10.0 2023-11-19 11:43:41,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=720373.3333333334, ans=0.2 2023-11-19 11:43:43,281 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:43:47,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=720440.0, ans=0.0 2023-11-19 11:43:50,977 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.09 vs. limit=15.0 2023-11-19 11:44:06,684 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.00 vs. limit=22.5 2023-11-19 11:44:07,205 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11900, loss[loss=0.1067, simple_loss=0.1321, pruned_loss=0.03068, audio_tagging_loss=0.009922, over 15853.00 frames. ], tot_loss[loss=0.08815, simple_loss=0.1066, pruned_loss=0.02406, audio_tagging_loss=0.01081, over 3056760.79 frames. ], batch size: 57, lr: 7.50e-03, grad_scale: 16.0 2023-11-19 11:44:12,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=720573.3333333334, ans=0.0 2023-11-19 11:44:22,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=720640.0, ans=0.0 2023-11-19 11:44:34,708 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.25 vs. limit=6.0 2023-11-19 11:44:38,517 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=720706.6666666666, ans=0.2 2023-11-19 11:44:41,510 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.859e+01 8.394e+01 8.886e+01 9.944e+01 1.266e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-19 11:44:48,685 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=720773.3333333334, ans=0.0 2023-11-19 11:45:02,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=720906.6666666666, ans=15.0 2023-11-19 11:45:03,435 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 11950, loss[loss=0.07998, simple_loss=0.08706, pruned_loss=0.02425, audio_tagging_loss=0.01219, over 16081.00 frames. ], tot_loss[loss=0.08784, simple_loss=0.1059, pruned_loss=0.02396, audio_tagging_loss=0.0109, over 3054771.12 frames. ], batch size: 61, lr: 7.49e-03, grad_scale: 16.0 2023-11-19 11:45:08,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=720906.6666666666, ans=0.0 2023-11-19 11:45:23,211 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=720973.3333333334, ans=0.125 2023-11-19 11:45:49,847 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=721173.3333333334, ans=0.09899494936611666 2023-11-19 11:45:50,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=721173.3333333334, ans=0.125 2023-11-19 11:45:51,857 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:45:56,784 INFO [train_asr.py:1115] (2/4) Epoch 9, batch 12000, loss[loss=0.08792, simple_loss=0.1042, pruned_loss=0.02318, audio_tagging_loss=0.01265, over 15363.00 frames. ], tot_loss[loss=0.08815, simple_loss=0.1062, pruned_loss=0.02403, audio_tagging_loss=0.01099, over 3057551.23 frames. ], batch size: 58, lr: 7.49e-03, grad_scale: 32.0 2023-11-19 11:45:56,785 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 11:46:14,097 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([0.7582, 3.3138, 2.5575, 2.7447, 3.5605, 3.5400, 2.7226, 3.6277], device='cuda:2') 2023-11-19 11:46:16,916 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3900, 3.7791, 2.5305, 3.6577], device='cuda:2') 2023-11-19 11:46:26,060 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3742, 3.6765, 2.5199, 3.7156], device='cuda:2') 2023-11-19 11:46:29,179 INFO [train_asr.py:1147] (2/4) Epoch 9, validation: loss=0.06606, simple_loss=0.05578, pruned_loss=0.006612, audio_tagging_loss=0.03155, over 4681554.00 frames. 2023-11-19 11:46:29,180 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 11:46:44,841 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=721306.6666666666, ans=0.125 2023-11-19 11:46:50,443 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=721373.3333333334, ans=0.2 2023-11-19 11:47:32,061 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 0, loss[loss=0.08449, simple_loss=0.09213, pruned_loss=0.01463, audio_tagging_loss=0.02379, over 14816.00 frames. ], tot_loss[loss=0.08449, simple_loss=0.09213, pruned_loss=0.01463, audio_tagging_loss=0.02379, over 14816.00 frames. ], batch size: 55, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:47:32,062 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 11:48:03,904 INFO [train_asr.py:1147] (2/4) Epoch 10, validation: loss=0.06458, simple_loss=0.05578, pruned_loss=0.006606, audio_tagging_loss=0.03009, over 4681554.00 frames. 2023-11-19 11:48:03,905 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 11:48:08,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=721400.0, ans=0.125 2023-11-19 11:48:11,157 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.461e+01 9.125e+01 9.697e+01 1.516e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 11:48:22,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=721466.6666666666, ans=0.125 2023-11-19 11:48:35,976 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=721600.0, ans=0.2 2023-11-19 11:48:37,711 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=721600.0, ans=0.125 2023-11-19 11:48:48,444 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.83 vs. limit=15.0 2023-11-19 11:48:57,950 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=721666.6666666666, ans=0.0 2023-11-19 11:48:59,744 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 50, loss[loss=0.07753, simple_loss=0.08304, pruned_loss=0.01521, audio_tagging_loss=0.0208, over 16131.00 frames. ], tot_loss[loss=0.09461, simple_loss=0.1035, pruned_loss=0.02246, audio_tagging_loss=0.02039, over 686404.04 frames. ], batch size: 62, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:49:22,248 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=721866.6666666666, ans=0.025 2023-11-19 11:49:23,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=721866.6666666666, ans=0.0 2023-11-19 11:49:36,500 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.580e-03 2023-11-19 11:49:53,466 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=722000.0, ans=0.125 2023-11-19 11:49:55,459 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 100, loss[loss=0.1078, simple_loss=0.1333, pruned_loss=0.02651, audio_tagging_loss=0.01464, over 14543.00 frames. ], tot_loss[loss=0.09394, simple_loss=0.104, pruned_loss=0.02246, audio_tagging_loss=0.01949, over 1201697.42 frames. ], batch size: 55, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:50:03,450 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 8.830e+01 9.521e+01 1.052e+02 1.360e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 11:50:22,557 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=722200.0, ans=0.2 2023-11-19 11:50:26,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=722200.0, ans=0.0 2023-11-19 11:50:39,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=722333.3333333334, ans=0.125 2023-11-19 11:50:51,945 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 150, loss[loss=0.08929, simple_loss=0.1098, pruned_loss=0.02216, audio_tagging_loss=0.0122, over 14840.00 frames. ], tot_loss[loss=0.09246, simple_loss=0.1046, pruned_loss=0.02268, audio_tagging_loss=0.01749, over 1617348.41 frames. ], batch size: 53, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:51:05,492 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=722466.6666666666, ans=0.1 2023-11-19 11:51:17,134 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=722533.3333333334, ans=0.0 2023-11-19 11:51:34,752 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=722600.0, ans=0.0 2023-11-19 11:51:36,177 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=722666.6666666666, ans=0.2 2023-11-19 11:51:36,281 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=722666.6666666666, ans=0.125 2023-11-19 11:51:46,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=722666.6666666666, ans=0.125 2023-11-19 11:51:47,938 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 200, loss[loss=0.1182, simple_loss=0.1435, pruned_loss=0.0371, audio_tagging_loss=0.00935, over 15714.00 frames. ], tot_loss[loss=0.09076, simple_loss=0.1051, pruned_loss=0.02285, audio_tagging_loss=0.01535, over 1931386.33 frames. ], batch size: 56, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:51:50,440 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=722733.3333333334, ans=0.07 2023-11-19 11:51:56,596 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.422e+01 8.351e+01 9.298e+01 1.028e+02 1.327e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 11:52:00,116 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=722800.0, ans=0.1 2023-11-19 11:52:01,170 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=722800.0, ans=0.125 2023-11-19 11:52:05,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=722800.0, ans=0.125 2023-11-19 11:52:25,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=722933.3333333334, ans=0.125 2023-11-19 11:52:36,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=723000.0, ans=0.125 2023-11-19 11:52:39,478 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=723000.0, ans=0.125 2023-11-19 11:52:44,661 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 250, loss[loss=0.06104, simple_loss=0.0619, pruned_loss=0.01385, audio_tagging_loss=0.01624, over 14536.00 frames. ], tot_loss[loss=0.08984, simple_loss=0.1053, pruned_loss=0.02329, audio_tagging_loss=0.0139, over 2177798.19 frames. ], batch size: 58, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:52:50,001 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=723066.6666666666, ans=0.04949747468305833 2023-11-19 11:53:09,250 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=723200.0, ans=0.125 2023-11-19 11:53:17,769 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=723266.6666666666, ans=0.2 2023-11-19 11:53:17,831 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=723266.6666666666, ans=0.0 2023-11-19 11:53:26,111 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=723266.6666666666, ans=0.025 2023-11-19 11:53:39,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=723400.0, ans=0.0 2023-11-19 11:53:40,162 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 300, loss[loss=0.08525, simple_loss=0.1091, pruned_loss=0.02133, audio_tagging_loss=0.009383, over 16319.00 frames. ], tot_loss[loss=0.08981, simple_loss=0.1065, pruned_loss=0.02374, audio_tagging_loss=0.01284, over 2373589.51 frames. ], batch size: 59, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:53:46,151 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=723400.0, ans=0.0 2023-11-19 11:53:48,085 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.717e+01 8.399e+01 9.137e+01 9.924e+01 1.644e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 11:53:52,665 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=723466.6666666666, ans=0.0 2023-11-19 11:54:02,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=723533.3333333334, ans=0.125 2023-11-19 11:54:05,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=723533.3333333334, ans=0.125 2023-11-19 11:54:10,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=723533.3333333334, ans=0.1 2023-11-19 11:54:17,208 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=723600.0, ans=0.125 2023-11-19 11:54:23,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=723600.0, ans=0.125 2023-11-19 11:54:36,132 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 350, loss[loss=0.08277, simple_loss=0.1064, pruned_loss=0.0216, audio_tagging_loss=0.007982, over 15587.00 frames. ], tot_loss[loss=0.0886, simple_loss=0.1058, pruned_loss=0.02352, audio_tagging_loss=0.0122, over 2528418.71 frames. ], batch size: 57, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:54:44,741 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.26 vs. limit=6.0 2023-11-19 11:54:49,112 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=723800.0, ans=0.2 2023-11-19 11:55:12,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=723933.3333333334, ans=0.1 2023-11-19 11:55:27,871 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=724000.0, ans=0.1 2023-11-19 11:55:32,496 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 400, loss[loss=0.07715, simple_loss=0.09286, pruned_loss=0.01578, audio_tagging_loss=0.01495, over 14392.00 frames. ], tot_loss[loss=0.08852, simple_loss=0.107, pruned_loss=0.02339, audio_tagging_loss=0.01163, over 2643750.50 frames. ], batch size: 54, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:55:33,745 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=724066.6666666666, ans=0.125 2023-11-19 11:55:33,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=724066.6666666666, ans=0.1 2023-11-19 11:55:38,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=724066.6666666666, ans=0.125 2023-11-19 11:55:39,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=724066.6666666666, ans=0.125 2023-11-19 11:55:39,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=724066.6666666666, ans=22.5 2023-11-19 11:55:39,903 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.670e+01 8.520e+01 9.395e+01 1.002e+02 1.359e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 11:56:03,389 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=724200.0, ans=0.125 2023-11-19 11:56:05,419 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=724266.6666666666, ans=0.125 2023-11-19 11:56:20,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=724333.3333333334, ans=0.1 2023-11-19 11:56:28,203 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 450, loss[loss=0.08952, simple_loss=0.1068, pruned_loss=0.02166, audio_tagging_loss=0.01445, over 14760.00 frames. ], tot_loss[loss=0.08776, simple_loss=0.1063, pruned_loss=0.02326, audio_tagging_loss=0.01133, over 2729257.42 frames. ], batch size: 56, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:56:28,378 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:57:10,147 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.92 vs. limit=15.0 2023-11-19 11:57:11,043 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=724600.0, ans=0.125 2023-11-19 11:57:23,927 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 500, loss[loss=0.06849, simple_loss=0.07859, pruned_loss=0.01826, audio_tagging_loss=0.01093, over 14989.00 frames. ], tot_loss[loss=0.08669, simple_loss=0.1053, pruned_loss=0.02299, audio_tagging_loss=0.01107, over 2797344.19 frames. ], batch size: 57, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:57:31,349 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.489e+01 8.496e+01 9.010e+01 1.030e+02 1.418e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 11:57:31,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=724733.3333333334, ans=0.0 2023-11-19 11:57:33,622 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.31 vs. limit=15.0 2023-11-19 11:57:40,503 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=724800.0, ans=0.125 2023-11-19 11:57:49,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=724866.6666666666, ans=0.125 2023-11-19 11:58:07,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=725000.0, ans=0.1 2023-11-19 11:58:07,796 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=725000.0, ans=0.1 2023-11-19 11:58:08,785 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=725000.0, ans=10.0 2023-11-19 11:58:19,508 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 550, loss[loss=0.0844, simple_loss=0.1091, pruned_loss=0.01995, audio_tagging_loss=0.009894, over 15839.00 frames. ], tot_loss[loss=0.08729, simple_loss=0.1064, pruned_loss=0.02327, audio_tagging_loss=0.0108, over 2855185.45 frames. ], batch size: 60, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:58:24,088 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=725066.6666666666, ans=0.125 2023-11-19 11:58:29,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=725133.3333333334, ans=0.125 2023-11-19 11:58:53,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=725266.6666666666, ans=0.0 2023-11-19 11:59:05,594 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=725333.3333333334, ans=0.0 2023-11-19 11:59:15,991 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 600, loss[loss=0.1059, simple_loss=0.1313, pruned_loss=0.02724, audio_tagging_loss=0.013, over 16804.00 frames. ], tot_loss[loss=0.08699, simple_loss=0.1061, pruned_loss=0.02322, audio_tagging_loss=0.01073, over 2897744.88 frames. ], batch size: 61, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 11:59:23,901 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.497e+01 8.301e+01 8.690e+01 9.584e+01 1.385e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-19 11:59:27,676 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2023-11-19 11:59:32,060 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.67 vs. limit=6.0 2023-11-19 11:59:33,673 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=725466.6666666666, ans=0.125 2023-11-19 11:59:52,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=725600.0, ans=0.2 2023-11-19 12:00:01,883 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=725666.6666666666, ans=0.1 2023-11-19 12:00:11,789 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 650, loss[loss=0.07612, simple_loss=0.09167, pruned_loss=0.02096, audio_tagging_loss=0.009328, over 15185.00 frames. ], tot_loss[loss=0.08714, simple_loss=0.1062, pruned_loss=0.02335, audio_tagging_loss=0.01068, over 2929534.75 frames. ], batch size: 57, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:00:24,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=725800.0, ans=0.125 2023-11-19 12:00:26,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=725800.0, ans=0.05 2023-11-19 12:00:31,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=725800.0, ans=0.07 2023-11-19 12:00:58,262 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=726000.0, ans=0.0 2023-11-19 12:01:06,896 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 700, loss[loss=0.09564, simple_loss=0.129, pruned_loss=0.0233, audio_tagging_loss=0.007842, over 16386.00 frames. ], tot_loss[loss=0.08666, simple_loss=0.1058, pruned_loss=0.023, audio_tagging_loss=0.01074, over 2956502.72 frames. ], batch size: 61, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:01:14,254 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.015e+01 8.278e+01 8.967e+01 9.847e+01 1.279e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 12:01:17,930 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.44 vs. limit=15.0 2023-11-19 12:01:26,517 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=22.5 2023-11-19 12:01:28,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=726200.0, ans=0.2 2023-11-19 12:01:33,079 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:01:38,333 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=726200.0, ans=0.0 2023-11-19 12:01:40,590 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=726266.6666666666, ans=0.0 2023-11-19 12:01:42,765 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer_ff3.min_abs, batch_count=726266.6666666666, ans=0.2 2023-11-19 12:02:02,291 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 750, loss[loss=0.0825, simple_loss=0.1061, pruned_loss=0.0171, audio_tagging_loss=0.01237, over 14706.00 frames. ], tot_loss[loss=0.08725, simple_loss=0.1068, pruned_loss=0.02316, audio_tagging_loss=0.01067, over 2981469.65 frames. ], batch size: 55, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:02:16,984 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=726466.6666666666, ans=0.0 2023-11-19 12:02:42,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=726600.0, ans=0.125 2023-11-19 12:02:43,197 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.36 vs. limit=15.0 2023-11-19 12:02:59,398 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 800, loss[loss=0.08285, simple_loss=0.0897, pruned_loss=0.02222, audio_tagging_loss=0.01579, over 15161.00 frames. ], tot_loss[loss=0.0872, simple_loss=0.1064, pruned_loss=0.02324, audio_tagging_loss=0.01074, over 2995578.37 frames. ], batch size: 58, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:03:06,736 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.428e+01 9.274e+01 1.007e+02 1.434e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-19 12:03:07,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=726733.3333333334, ans=0.1 2023-11-19 12:03:21,010 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=726866.6666666666, ans=0.125 2023-11-19 12:03:24,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=726866.6666666666, ans=0.0 2023-11-19 12:03:30,120 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=726866.6666666666, ans=0.0 2023-11-19 12:03:47,324 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.06 vs. limit=10.0 2023-11-19 12:03:54,767 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 850, loss[loss=0.0708, simple_loss=0.08316, pruned_loss=0.01808, audio_tagging_loss=0.01115, over 15014.00 frames. ], tot_loss[loss=0.08656, simple_loss=0.1054, pruned_loss=0.02311, audio_tagging_loss=0.01076, over 3015970.27 frames. ], batch size: 59, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:04:21,135 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=727200.0, ans=0.1 2023-11-19 12:04:26,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=727200.0, ans=0.0 2023-11-19 12:04:50,019 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 900, loss[loss=0.07054, simple_loss=0.09288, pruned_loss=0.01408, audio_tagging_loss=0.01002, over 15693.00 frames. ], tot_loss[loss=0.08632, simple_loss=0.1051, pruned_loss=0.02303, audio_tagging_loss=0.01074, over 3019200.90 frames. ], batch size: 61, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:04:53,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=727400.0, ans=0.1 2023-11-19 12:04:57,840 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.263e+01 8.793e+01 9.779e+01 1.235e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-19 12:04:59,113 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=727400.0, ans=0.0 2023-11-19 12:05:03,510 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=727466.6666666666, ans=0.0 2023-11-19 12:05:04,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=727466.6666666666, ans=0.125 2023-11-19 12:05:04,505 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=727466.6666666666, ans=0.05 2023-11-19 12:05:16,250 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2023-11-19 12:05:34,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=727666.6666666666, ans=0.125 2023-11-19 12:05:37,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=727666.6666666666, ans=0.125 2023-11-19 12:05:45,503 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.35 vs. limit=22.5 2023-11-19 12:05:46,727 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 950, loss[loss=0.09338, simple_loss=0.1128, pruned_loss=0.02822, audio_tagging_loss=0.008789, over 15884.00 frames. ], tot_loss[loss=0.08631, simple_loss=0.1052, pruned_loss=0.02304, audio_tagging_loss=0.01069, over 3029334.13 frames. ], batch size: 58, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:06:38,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=728000.0, ans=0.1 2023-11-19 12:06:42,008 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1000, loss[loss=0.09071, simple_loss=0.1092, pruned_loss=0.02763, audio_tagging_loss=0.00848, over 15351.00 frames. ], tot_loss[loss=0.08613, simple_loss=0.105, pruned_loss=0.02319, audio_tagging_loss=0.01043, over 3032914.51 frames. ], batch size: 56, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:06:42,714 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.75 vs. limit=22.5 2023-11-19 12:06:49,834 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 5.784e+01 8.258e+01 8.941e+01 9.779e+01 1.255e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-19 12:07:03,488 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.55 vs. limit=15.0 2023-11-19 12:07:05,172 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:07:06,552 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=728200.0, ans=0.0 2023-11-19 12:07:14,219 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.18 vs. limit=22.5 2023-11-19 12:07:32,635 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=728333.3333333334, ans=0.2 2023-11-19 12:07:34,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=728333.3333333334, ans=0.125 2023-11-19 12:07:37,669 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1050, loss[loss=0.1061, simple_loss=0.1293, pruned_loss=0.03095, audio_tagging_loss=0.01051, over 14830.00 frames. ], tot_loss[loss=0.08566, simple_loss=0.1046, pruned_loss=0.02294, audio_tagging_loss=0.0104, over 3029898.03 frames. ], batch size: 54, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:07:39,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=728400.0, ans=0.0 2023-11-19 12:07:53,917 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=728466.6666666666, ans=0.0 2023-11-19 12:08:26,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=728666.6666666666, ans=0.0 2023-11-19 12:08:32,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=728733.3333333334, ans=0.0 2023-11-19 12:08:34,194 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1100, loss[loss=0.05916, simple_loss=0.07499, pruned_loss=0.01006, audio_tagging_loss=0.0116, over 15254.00 frames. ], tot_loss[loss=0.08607, simple_loss=0.1052, pruned_loss=0.02321, audio_tagging_loss=0.01025, over 3032820.17 frames. ], batch size: 58, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:08:36,373 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:08:37,848 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.53 vs. limit=12.0 2023-11-19 12:08:42,199 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.136e+01 8.991e+01 9.834e+01 1.618e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 12:08:46,836 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:08:59,292 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.46 vs. limit=22.5 2023-11-19 12:09:01,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=728866.6666666666, ans=0.0 2023-11-19 12:09:06,474 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=728933.3333333334, ans=0.125 2023-11-19 12:09:26,484 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=729000.0, ans=0.125 2023-11-19 12:09:27,763 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.20 vs. limit=22.5 2023-11-19 12:09:30,429 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1150, loss[loss=0.08611, simple_loss=0.1124, pruned_loss=0.02146, audio_tagging_loss=0.008476, over 15560.00 frames. ], tot_loss[loss=0.08683, simple_loss=0.1063, pruned_loss=0.02344, audio_tagging_loss=0.01022, over 3035757.61 frames. ], batch size: 57, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:09:46,351 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2023-11-19 12:09:46,922 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:10:06,141 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=729266.6666666666, ans=0.09899494936611666 2023-11-19 12:10:26,229 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1200, loss[loss=0.08197, simple_loss=0.09869, pruned_loss=0.02361, audio_tagging_loss=0.009016, over 15888.00 frames. ], tot_loss[loss=0.08647, simple_loss=0.106, pruned_loss=0.02328, audio_tagging_loss=0.01021, over 3036528.05 frames. ], batch size: 57, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 12:10:30,836 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=729400.0, ans=0.2 2023-11-19 12:10:33,699 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.56 vs. limit=15.0 2023-11-19 12:10:35,184 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 8.399e+01 9.041e+01 1.012e+02 1.294e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 12:10:54,414 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=729533.3333333334, ans=0.0 2023-11-19 12:11:08,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=729600.0, ans=0.1 2023-11-19 12:11:11,730 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=729666.6666666666, ans=0.0 2023-11-19 12:11:15,538 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=729666.6666666666, ans=0.125 2023-11-19 12:11:21,630 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1250, loss[loss=0.09129, simple_loss=0.1124, pruned_loss=0.02487, audio_tagging_loss=0.01024, over 15974.00 frames. ], tot_loss[loss=0.08651, simple_loss=0.106, pruned_loss=0.02328, audio_tagging_loss=0.01023, over 3041543.68 frames. ], batch size: 59, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 12:11:28,734 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=729733.3333333334, ans=0.0 2023-11-19 12:11:28,814 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=729733.3333333334, ans=0.1 2023-11-19 12:11:58,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=729933.3333333334, ans=0.125 2023-11-19 12:12:01,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=729933.3333333334, ans=0.125 2023-11-19 12:12:17,092 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1300, loss[loss=0.1028, simple_loss=0.1322, pruned_loss=0.02984, audio_tagging_loss=0.006831, over 15906.00 frames. ], tot_loss[loss=0.08661, simple_loss=0.106, pruned_loss=0.02328, audio_tagging_loss=0.01034, over 3040404.14 frames. ], batch size: 61, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:12:26,876 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=730066.6666666666, ans=0.2 2023-11-19 12:12:27,632 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.101e+01 8.789e+01 9.869e+01 1.258e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-19 12:12:29,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=730133.3333333334, ans=22.5 2023-11-19 12:12:32,492 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.67 vs. limit=15.0 2023-11-19 12:12:38,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=730200.0, ans=0.0 2023-11-19 12:12:50,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=730266.6666666666, ans=0.125 2023-11-19 12:12:51,164 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=730266.6666666666, ans=0.1 2023-11-19 12:12:56,432 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=730266.6666666666, ans=0.0 2023-11-19 12:13:13,112 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1350, loss[loss=0.1014, simple_loss=0.1299, pruned_loss=0.02899, audio_tagging_loss=0.007498, over 15010.00 frames. ], tot_loss[loss=0.08713, simple_loss=0.1069, pruned_loss=0.02347, audio_tagging_loss=0.01023, over 3042994.58 frames. ], batch size: 55, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:13:33,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=730466.6666666666, ans=0.07 2023-11-19 12:13:41,844 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=730533.3333333334, ans=0.125 2023-11-19 12:13:52,815 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:14:05,609 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=730666.6666666666, ans=0.125 2023-11-19 12:14:08,540 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1400, loss[loss=0.08061, simple_loss=0.1051, pruned_loss=0.01654, audio_tagging_loss=0.01154, over 15231.00 frames. ], tot_loss[loss=0.0864, simple_loss=0.1058, pruned_loss=0.02314, audio_tagging_loss=0.01035, over 3039347.46 frames. ], batch size: 55, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:14:18,522 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.609e+01 8.095e+01 8.801e+01 9.622e+01 1.373e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-19 12:14:19,370 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.01 vs. limit=22.5 2023-11-19 12:14:28,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=730800.0, ans=0.0 2023-11-19 12:14:38,944 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=730866.6666666666, ans=0.035 2023-11-19 12:14:42,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=730933.3333333334, ans=0.0 2023-11-19 12:14:56,884 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=731000.0, ans=0.1 2023-11-19 12:15:04,106 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1450, loss[loss=0.07986, simple_loss=0.1017, pruned_loss=0.01935, audio_tagging_loss=0.009655, over 14910.00 frames. ], tot_loss[loss=0.08587, simple_loss=0.1051, pruned_loss=0.02285, audio_tagging_loss=0.01048, over 3037939.24 frames. ], batch size: 57, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:15:12,679 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.67 vs. limit=15.0 2023-11-19 12:15:15,879 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=731133.3333333334, ans=0.09899494936611666 2023-11-19 12:15:22,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=731133.3333333334, ans=0.2 2023-11-19 12:15:41,314 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=731266.6666666666, ans=0.125 2023-11-19 12:15:53,057 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=731333.3333333334, ans=0.0 2023-11-19 12:16:00,106 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1500, loss[loss=0.08409, simple_loss=0.09654, pruned_loss=0.0248, audio_tagging_loss=0.01102, over 15109.00 frames. ], tot_loss[loss=0.0864, simple_loss=0.1054, pruned_loss=0.02317, audio_tagging_loss=0.01051, over 3040788.96 frames. ], batch size: 57, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:16:04,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=731400.0, ans=0.125 2023-11-19 12:16:09,649 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.686e+01 9.376e+01 1.030e+02 1.552e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-19 12:16:12,583 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=731466.6666666666, ans=0.1 2023-11-19 12:16:17,776 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=731466.6666666666, ans=0.0 2023-11-19 12:16:21,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=731533.3333333334, ans=0.125 2023-11-19 12:16:21,971 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2023-11-19 12:16:25,159 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.54 vs. limit=15.0 2023-11-19 12:16:31,067 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=731533.3333333334, ans=0.0 2023-11-19 12:16:32,202 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=731600.0, ans=0.125 2023-11-19 12:16:36,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=731600.0, ans=0.5 2023-11-19 12:16:50,687 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.04 vs. limit=15.0 2023-11-19 12:16:55,767 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1550, loss[loss=0.09746, simple_loss=0.1263, pruned_loss=0.02575, audio_tagging_loss=0.00854, over 14545.00 frames. ], tot_loss[loss=0.08557, simple_loss=0.1042, pruned_loss=0.02281, audio_tagging_loss=0.01067, over 3040166.10 frames. ], batch size: 54, lr: 7.07e-03, grad_scale: 16.0 2023-11-19 12:16:57,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=731733.3333333334, ans=0.125 2023-11-19 12:17:00,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=731733.3333333334, ans=0.125 2023-11-19 12:17:19,720 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2023-11-19 12:17:28,527 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=731933.3333333334, ans=0.0 2023-11-19 12:17:41,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=732000.0, ans=0.1 2023-11-19 12:17:44,611 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=732000.0, ans=0.125 2023-11-19 12:17:51,803 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1600, loss[loss=0.08749, simple_loss=0.1128, pruned_loss=0.02076, audio_tagging_loss=0.01035, over 16055.00 frames. ], tot_loss[loss=0.0862, simple_loss=0.1048, pruned_loss=0.02302, audio_tagging_loss=0.01077, over 3044395.44 frames. ], batch size: 59, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:17:52,016 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=732066.6666666666, ans=0.2 2023-11-19 12:17:55,108 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=732066.6666666666, ans=0.0 2023-11-19 12:17:56,590 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.69 vs. limit=15.0 2023-11-19 12:18:01,802 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.642e+01 8.544e+01 9.122e+01 1.002e+02 1.471e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 12:18:03,053 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=732133.3333333334, ans=0.125 2023-11-19 12:18:03,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=732133.3333333334, ans=0.125 2023-11-19 12:18:16,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=732200.0, ans=0.1 2023-11-19 12:18:23,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=732200.0, ans=0.2 2023-11-19 12:18:25,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=732266.6666666666, ans=0.125 2023-11-19 12:18:35,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=732333.3333333334, ans=10.0 2023-11-19 12:18:47,116 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1650, loss[loss=0.1104, simple_loss=0.1364, pruned_loss=0.03381, audio_tagging_loss=0.008374, over 15754.00 frames. ], tot_loss[loss=0.08671, simple_loss=0.1054, pruned_loss=0.02332, audio_tagging_loss=0.01071, over 3048539.47 frames. ], batch size: 57, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:18:55,724 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.73 vs. limit=15.0 2023-11-19 12:19:04,667 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.34 vs. limit=10.0 2023-11-19 12:19:08,968 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=732533.3333333334, ans=0.0 2023-11-19 12:19:09,223 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.21 vs. limit=22.5 2023-11-19 12:19:15,186 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=732533.3333333334, ans=0.0 2023-11-19 12:19:25,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=732600.0, ans=0.125 2023-11-19 12:19:27,339 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=732600.0, ans=0.0 2023-11-19 12:19:38,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=732666.6666666666, ans=0.5 2023-11-19 12:19:42,566 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1700, loss[loss=0.07946, simple_loss=0.09745, pruned_loss=0.02001, audio_tagging_loss=0.01073, over 14967.00 frames. ], tot_loss[loss=0.0863, simple_loss=0.1048, pruned_loss=0.02309, audio_tagging_loss=0.01081, over 3046485.06 frames. ], batch size: 56, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:19:45,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=732733.3333333334, ans=0.1 2023-11-19 12:19:53,003 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.193e+01 8.787e+01 9.627e+01 1.247e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-19 12:19:54,625 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.57 vs. limit=22.5 2023-11-19 12:20:07,684 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.474e-01 2023-11-19 12:20:12,223 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.04 vs. limit=10.0 2023-11-19 12:20:35,817 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=733000.0, ans=0.125 2023-11-19 12:20:38,897 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1750, loss[loss=0.08651, simple_loss=0.1042, pruned_loss=0.02415, audio_tagging_loss=0.01026, over 15816.00 frames. ], tot_loss[loss=0.08568, simple_loss=0.1042, pruned_loss=0.02293, audio_tagging_loss=0.01067, over 3047642.18 frames. ], batch size: 61, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:20:40,578 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.49 vs. limit=10.0 2023-11-19 12:20:55,592 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.40 vs. limit=15.0 2023-11-19 12:20:57,399 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=733133.3333333334, ans=0.0 2023-11-19 12:21:18,182 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=733266.6666666666, ans=0.125 2023-11-19 12:21:18,551 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.01 vs. limit=22.5 2023-11-19 12:21:24,992 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=733333.3333333334, ans=0.2 2023-11-19 12:21:34,739 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1800, loss[loss=0.06959, simple_loss=0.09098, pruned_loss=0.01448, audio_tagging_loss=0.009619, over 15504.00 frames. ], tot_loss[loss=0.08621, simple_loss=0.1049, pruned_loss=0.0231, audio_tagging_loss=0.01065, over 3046723.63 frames. ], batch size: 58, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:21:40,179 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=733400.0, ans=0.125 2023-11-19 12:21:44,111 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.335e+01 9.001e+01 1.003e+02 1.279e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 12:21:46,873 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.52 vs. limit=15.0 2023-11-19 12:21:57,944 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.64 vs. limit=6.0 2023-11-19 12:22:29,661 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1850, loss[loss=0.07429, simple_loss=0.09226, pruned_loss=0.01639, audio_tagging_loss=0.01177, over 15680.00 frames. ], tot_loss[loss=0.08644, simple_loss=0.1055, pruned_loss=0.02319, audio_tagging_loss=0.01048, over 3047997.96 frames. ], batch size: 58, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:22:38,914 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=733733.3333333334, ans=0.0 2023-11-19 12:22:59,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=733866.6666666666, ans=0.125 2023-11-19 12:23:16,988 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-19 12:23:22,418 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=734000.0, ans=0.125 2023-11-19 12:23:23,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734000.0, ans=0.1 2023-11-19 12:23:24,797 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.30 vs. limit=22.5 2023-11-19 12:23:26,247 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1900, loss[loss=0.07755, simple_loss=0.09657, pruned_loss=0.01946, audio_tagging_loss=0.00981, over 14604.00 frames. ], tot_loss[loss=0.08529, simple_loss=0.1043, pruned_loss=0.02268, audio_tagging_loss=0.01047, over 3054942.97 frames. ], batch size: 57, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:23:36,333 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.533e+01 9.158e+01 9.922e+01 1.269e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 12:23:47,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=734200.0, ans=0.1 2023-11-19 12:23:48,204 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=734200.0, ans=0.125 2023-11-19 12:23:53,415 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=734200.0, ans=0.0 2023-11-19 12:23:57,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=734200.0, ans=0.125 2023-11-19 12:24:01,709 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=734266.6666666666, ans=0.125 2023-11-19 12:24:01,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=734266.6666666666, ans=0.0 2023-11-19 12:24:01,991 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.12 vs. limit=10.0 2023-11-19 12:24:21,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=734400.0, ans=0.125 2023-11-19 12:24:21,912 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 1950, loss[loss=0.09827, simple_loss=0.1178, pruned_loss=0.02924, audio_tagging_loss=0.01014, over 14047.00 frames. ], tot_loss[loss=0.08546, simple_loss=0.1045, pruned_loss=0.0228, audio_tagging_loss=0.01042, over 3049933.14 frames. ], batch size: 53, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:24:28,696 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=734400.0, ans=15.0 2023-11-19 12:24:33,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=734466.6666666666, ans=0.125 2023-11-19 12:25:00,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=734600.0, ans=0.125 2023-11-19 12:25:02,647 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=734600.0, ans=0.0 2023-11-19 12:25:17,480 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2000, loss[loss=0.09699, simple_loss=0.12, pruned_loss=0.02706, audio_tagging_loss=0.009931, over 16448.00 frames. ], tot_loss[loss=0.08563, simple_loss=0.1046, pruned_loss=0.02288, audio_tagging_loss=0.01047, over 3046005.68 frames. ], batch size: 60, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:25:22,354 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=734733.3333333334, ans=0.125 2023-11-19 12:25:22,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=734733.3333333334, ans=0.125 2023-11-19 12:25:28,539 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.084e+01 8.826e+01 9.531e+01 1.443e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-19 12:25:37,143 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=734800.0, ans=0.2 2023-11-19 12:25:51,918 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=734933.3333333334, ans=15.0 2023-11-19 12:26:13,999 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2050, loss[loss=0.05917, simple_loss=0.0667, pruned_loss=0.01324, audio_tagging_loss=0.01257, over 14987.00 frames. ], tot_loss[loss=0.08579, simple_loss=0.1048, pruned_loss=0.02296, audio_tagging_loss=0.01041, over 3051162.70 frames. ], batch size: 59, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:26:44,317 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=735200.0, ans=10.0 2023-11-19 12:26:47,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=735266.6666666666, ans=0.1 2023-11-19 12:26:57,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=735333.3333333334, ans=0.125 2023-11-19 12:27:03,811 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=735333.3333333334, ans=0.125 2023-11-19 12:27:06,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=735333.3333333334, ans=0.1 2023-11-19 12:27:09,237 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2100, loss[loss=0.08831, simple_loss=0.1039, pruned_loss=0.0279, audio_tagging_loss=0.008437, over 15335.00 frames. ], tot_loss[loss=0.08604, simple_loss=0.105, pruned_loss=0.0232, audio_tagging_loss=0.01032, over 3048509.84 frames. ], batch size: 56, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:27:18,793 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.692e+01 8.614e+01 9.149e+01 1.029e+02 1.234e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 12:27:23,315 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=735466.6666666666, ans=0.125 2023-11-19 12:27:40,982 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.97 vs. limit=22.5 2023-11-19 12:27:48,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=735600.0, ans=0.1 2023-11-19 12:27:58,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=735666.6666666666, ans=0.125 2023-11-19 12:27:58,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=735666.6666666666, ans=0.125 2023-11-19 12:28:04,277 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2150, loss[loss=0.07436, simple_loss=0.0862, pruned_loss=0.01708, audio_tagging_loss=0.01419, over 14789.00 frames. ], tot_loss[loss=0.08652, simple_loss=0.1059, pruned_loss=0.02331, audio_tagging_loss=0.01025, over 3048753.47 frames. ], batch size: 57, lr: 7.05e-03, grad_scale: 32.0 2023-11-19 12:28:12,822 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.10 vs. limit=22.5 2023-11-19 12:28:19,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=735800.0, ans=0.125 2023-11-19 12:28:38,258 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:28:46,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=735933.3333333334, ans=0.0 2023-11-19 12:28:49,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=736000.0, ans=0.1 2023-11-19 12:29:00,631 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2200, loss[loss=0.07227, simple_loss=0.09357, pruned_loss=0.01652, audio_tagging_loss=0.00896, over 14344.00 frames. ], tot_loss[loss=0.08695, simple_loss=0.1063, pruned_loss=0.02346, audio_tagging_loss=0.01032, over 3050543.16 frames. ], batch size: 56, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 12:29:03,042 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=736066.6666666666, ans=0.125 2023-11-19 12:29:03,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=736066.6666666666, ans=0.125 2023-11-19 12:29:11,898 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.518e+01 8.663e+01 9.454e+01 1.034e+02 1.518e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-19 12:29:18,924 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=736133.3333333334, ans=0.1 2023-11-19 12:29:22,133 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=736200.0, ans=0.125 2023-11-19 12:29:37,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=736266.6666666666, ans=0.125 2023-11-19 12:29:56,572 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2250, loss[loss=0.09338, simple_loss=0.1181, pruned_loss=0.02561, audio_tagging_loss=0.008722, over 14936.00 frames. ], tot_loss[loss=0.08739, simple_loss=0.1074, pruned_loss=0.02345, audio_tagging_loss=0.01024, over 3057047.04 frames. ], batch size: 55, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 12:30:13,276 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=736466.6666666666, ans=0.05 2023-11-19 12:30:13,335 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=736466.6666666666, ans=0.125 2023-11-19 12:30:25,878 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=736533.3333333334, ans=0.0 2023-11-19 12:30:40,462 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=736666.6666666666, ans=0.125 2023-11-19 12:30:44,095 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.44 vs. limit=15.0 2023-11-19 12:30:51,740 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2300, loss[loss=0.07632, simple_loss=0.09311, pruned_loss=0.01918, audio_tagging_loss=0.01059, over 16356.00 frames. ], tot_loss[loss=0.08797, simple_loss=0.108, pruned_loss=0.02362, audio_tagging_loss=0.01037, over 3056690.60 frames. ], batch size: 60, lr: 7.05e-03, grad_scale: 8.0 2023-11-19 12:31:03,994 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.615e+01 8.238e+01 9.177e+01 1.028e+02 1.469e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 12:31:13,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=736866.6666666666, ans=0.05 2023-11-19 12:31:21,173 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff2.min_abs, batch_count=736866.6666666666, ans=0.1 2023-11-19 12:31:24,767 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=736933.3333333334, ans=0.0 2023-11-19 12:31:40,607 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:31:48,023 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2350, loss[loss=0.07634, simple_loss=0.09515, pruned_loss=0.0216, audio_tagging_loss=0.00717, over 14739.00 frames. ], tot_loss[loss=0.08814, simple_loss=0.1081, pruned_loss=0.0236, audio_tagging_loss=0.01047, over 3058385.20 frames. ], batch size: 58, lr: 7.05e-03, grad_scale: 8.0 2023-11-19 12:31:59,866 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=737133.3333333334, ans=0.1 2023-11-19 12:32:05,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737133.3333333334, ans=0.1 2023-11-19 12:32:13,370 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=737200.0, ans=0.125 2023-11-19 12:32:22,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=737266.6666666666, ans=0.2 2023-11-19 12:32:24,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=737266.6666666666, ans=0.0 2023-11-19 12:32:31,581 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:32:34,889 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=737333.3333333334, ans=0.0 2023-11-19 12:32:43,225 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2400, loss[loss=0.07582, simple_loss=0.09664, pruned_loss=0.01785, audio_tagging_loss=0.00965, over 14748.00 frames. ], tot_loss[loss=0.08857, simple_loss=0.1084, pruned_loss=0.02382, audio_tagging_loss=0.01053, over 3049873.43 frames. ], batch size: 58, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 12:32:43,514 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=737400.0, ans=0.125 2023-11-19 12:32:46,035 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=737400.0, ans=0.125 2023-11-19 12:32:55,917 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.415e+01 9.190e+01 1.007e+02 1.395e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 12:33:13,225 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.88 vs. limit=10.0 2023-11-19 12:33:18,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=737600.0, ans=0.125 2023-11-19 12:33:19,372 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:33:19,377 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=737600.0, ans=0.125 2023-11-19 12:33:38,940 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2450, loss[loss=0.09278, simple_loss=0.1226, pruned_loss=0.02257, audio_tagging_loss=0.008937, over 15253.00 frames. ], tot_loss[loss=0.08817, simple_loss=0.1076, pruned_loss=0.02374, audio_tagging_loss=0.01062, over 3056059.36 frames. ], batch size: 56, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:33:59,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=737800.0, ans=0.0 2023-11-19 12:34:25,030 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=738000.0, ans=0.125 2023-11-19 12:34:28,789 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=15.0 2023-11-19 12:34:33,836 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2500, loss[loss=0.09929, simple_loss=0.112, pruned_loss=0.03129, audio_tagging_loss=0.01202, over 16338.00 frames. ], tot_loss[loss=0.08776, simple_loss=0.1068, pruned_loss=0.02363, audio_tagging_loss=0.01074, over 3056418.26 frames. ], batch size: 61, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:34:37,764 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=738066.6666666666, ans=0.0 2023-11-19 12:34:45,794 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.644e+01 9.382e+01 1.016e+02 1.260e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-19 12:35:02,156 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.82 vs. limit=15.0 2023-11-19 12:35:04,075 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=738200.0, ans=0.0 2023-11-19 12:35:11,816 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.87 vs. limit=6.0 2023-11-19 12:35:14,504 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2023-11-19 12:35:29,254 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2550, loss[loss=0.09175, simple_loss=0.1157, pruned_loss=0.02561, audio_tagging_loss=0.008302, over 15241.00 frames. ], tot_loss[loss=0.08826, simple_loss=0.1075, pruned_loss=0.02389, audio_tagging_loss=0.01062, over 3048104.47 frames. ], batch size: 57, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:35:39,000 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=738400.0, ans=0.125 2023-11-19 12:35:43,213 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=738466.6666666666, ans=0.2 2023-11-19 12:35:53,917 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:36:09,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=738600.0, ans=0.125 2023-11-19 12:36:13,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=738666.6666666666, ans=0.125 2023-11-19 12:36:26,086 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2600, loss[loss=0.1028, simple_loss=0.1261, pruned_loss=0.03049, audio_tagging_loss=0.00927, over 15337.00 frames. ], tot_loss[loss=0.0877, simple_loss=0.1071, pruned_loss=0.02373, audio_tagging_loss=0.0104, over 3048901.60 frames. ], batch size: 56, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:36:29,515 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=738733.3333333334, ans=0.125 2023-11-19 12:36:33,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=738733.3333333334, ans=0.2 2023-11-19 12:36:37,711 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.428e+01 9.293e+01 1.021e+02 1.415e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 12:37:04,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=738933.3333333334, ans=0.125 2023-11-19 12:37:16,002 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=739000.0, ans=0.2 2023-11-19 12:37:21,570 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2650, loss[loss=0.06969, simple_loss=0.0844, pruned_loss=0.01508, audio_tagging_loss=0.01241, over 15493.00 frames. ], tot_loss[loss=0.08707, simple_loss=0.1067, pruned_loss=0.0234, audio_tagging_loss=0.01034, over 3048901.88 frames. ], batch size: 59, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:37:26,032 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.30 vs. limit=22.5 2023-11-19 12:37:27,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=739066.6666666666, ans=0.125 2023-11-19 12:37:43,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=739200.0, ans=0.125 2023-11-19 12:37:55,556 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.71 vs. limit=15.0 2023-11-19 12:37:59,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=739266.6666666666, ans=0.125 2023-11-19 12:38:00,686 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2023-11-19 12:38:05,079 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=739333.3333333334, ans=0.125 2023-11-19 12:38:11,821 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:38:16,948 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2700, loss[loss=0.07017, simple_loss=0.08227, pruned_loss=0.01664, audio_tagging_loss=0.0124, over 14450.00 frames. ], tot_loss[loss=0.08734, simple_loss=0.1069, pruned_loss=0.02351, audio_tagging_loss=0.01038, over 3046845.83 frames. ], batch size: 56, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:38:29,026 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.043e+01 8.415e+01 9.342e+01 1.060e+02 2.991e+02, threshold=1.868e+02, percent-clipped=1.0 2023-11-19 12:38:30,696 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.56 vs. limit=15.0 2023-11-19 12:38:36,193 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=739466.6666666666, ans=0.125 2023-11-19 12:38:38,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=739533.3333333334, ans=0.125 2023-11-19 12:38:40,421 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=739533.3333333334, ans=0.1 2023-11-19 12:38:43,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=739533.3333333334, ans=0.125 2023-11-19 12:39:12,500 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2750, loss[loss=0.08452, simple_loss=0.1024, pruned_loss=0.02159, audio_tagging_loss=0.0117, over 16700.00 frames. ], tot_loss[loss=0.08748, simple_loss=0.107, pruned_loss=0.0236, audio_tagging_loss=0.01036, over 3039706.45 frames. ], batch size: 64, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:39:22,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=739800.0, ans=0.125 2023-11-19 12:39:31,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=739800.0, ans=0.0 2023-11-19 12:39:55,258 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=739933.3333333334, ans=0.1 2023-11-19 12:39:57,572 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=740000.0, ans=0.2 2023-11-19 12:39:59,484 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:40:03,469 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=740000.0, ans=0.125 2023-11-19 12:40:08,439 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2800, loss[loss=0.09275, simple_loss=0.1058, pruned_loss=0.02717, audio_tagging_loss=0.0127, over 14245.00 frames. ], tot_loss[loss=0.08648, simple_loss=0.1055, pruned_loss=0.02333, audio_tagging_loss=0.01043, over 3034359.72 frames. ], batch size: 56, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:40:09,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=740066.6666666666, ans=0.125 2023-11-19 12:40:21,269 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.890e+01 8.368e+01 8.840e+01 9.465e+01 1.289e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-19 12:40:22,917 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.49 vs. limit=15.0 2023-11-19 12:40:25,061 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.61 vs. limit=12.0 2023-11-19 12:40:26,029 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2023-11-19 12:40:40,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=740200.0, ans=0.125 2023-11-19 12:40:57,566 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=740333.3333333334, ans=0.125 2023-11-19 12:40:59,701 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=740333.3333333334, ans=0.125 2023-11-19 12:41:04,777 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2850, loss[loss=0.09789, simple_loss=0.1295, pruned_loss=0.02513, audio_tagging_loss=0.008004, over 16211.00 frames. ], tot_loss[loss=0.08639, simple_loss=0.1057, pruned_loss=0.02328, audio_tagging_loss=0.01026, over 3034817.33 frames. ], batch size: 58, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:41:04,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=740400.0, ans=0.125 2023-11-19 12:41:10,237 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=740400.0, ans=0.1 2023-11-19 12:41:13,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=740400.0, ans=0.0 2023-11-19 12:41:21,633 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:41:37,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=740600.0, ans=0.125 2023-11-19 12:41:37,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=740600.0, ans=0.125 2023-11-19 12:41:39,062 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=740600.0, ans=0.2 2023-11-19 12:41:44,544 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.93 vs. limit=15.0 2023-11-19 12:42:00,347 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2900, loss[loss=0.1016, simple_loss=0.1275, pruned_loss=0.02913, audio_tagging_loss=0.008677, over 15412.00 frames. ], tot_loss[loss=0.08638, simple_loss=0.1056, pruned_loss=0.02329, audio_tagging_loss=0.01028, over 3044864.64 frames. ], batch size: 57, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:42:00,488 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=740733.3333333334, ans=0.125 2023-11-19 12:42:02,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=740733.3333333334, ans=0.0 2023-11-19 12:42:03,789 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=740733.3333333334, ans=0.125 2023-11-19 12:42:12,379 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.633e+01 9.349e+01 9.901e+01 1.327e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-19 12:42:24,278 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=740866.6666666666, ans=0.1 2023-11-19 12:42:55,861 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 2950, loss[loss=0.07337, simple_loss=0.08296, pruned_loss=0.02092, audio_tagging_loss=0.01097, over 14409.00 frames. ], tot_loss[loss=0.08654, simple_loss=0.1059, pruned_loss=0.02336, audio_tagging_loss=0.01021, over 3049816.08 frames. ], batch size: 54, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:43:12,864 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=741133.3333333334, ans=10.0 2023-11-19 12:43:16,461 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2023-11-19 12:43:21,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=741200.0, ans=0.125 2023-11-19 12:43:34,648 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=741266.6666666666, ans=0.1 2023-11-19 12:43:50,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=741333.3333333334, ans=0.0 2023-11-19 12:43:52,232 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3000, loss[loss=0.1013, simple_loss=0.1313, pruned_loss=0.02437, audio_tagging_loss=0.01125, over 14747.00 frames. ], tot_loss[loss=0.08709, simple_loss=0.1067, pruned_loss=0.02348, audio_tagging_loss=0.01027, over 3055417.15 frames. ], batch size: 57, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:43:52,232 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 12:44:04,852 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.4073, 4.0216, 2.4622, 3.5742], device='cuda:2') 2023-11-19 12:44:16,470 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([0.8759, 3.2410, 2.5691, 2.8472, 3.5298, 3.5981, 3.0237, 3.5823], device='cuda:2') 2023-11-19 12:44:22,058 INFO [zipformer.py:1873] (2/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8866, 3.2645, 4.8404, 4.4046], device='cuda:2') 2023-11-19 12:44:24,106 INFO [train_asr.py:1147] (2/4) Epoch 10, validation: loss=0.06403, simple_loss=0.05543, pruned_loss=0.006395, audio_tagging_loss=0.02992, over 4681554.00 frames. 2023-11-19 12:44:24,107 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 12:44:25,792 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2023-11-19 12:44:35,741 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.397e+01 8.400e+01 9.242e+01 1.017e+02 1.416e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 12:44:43,976 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.12 vs. limit=15.0 2023-11-19 12:44:48,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=741533.3333333334, ans=0.0 2023-11-19 12:45:14,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=741666.6666666666, ans=0.1 2023-11-19 12:45:19,200 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3050, loss[loss=0.08123, simple_loss=0.103, pruned_loss=0.02096, audio_tagging_loss=0.008764, over 15461.00 frames. ], tot_loss[loss=0.08699, simple_loss=0.1066, pruned_loss=0.02341, audio_tagging_loss=0.0103, over 3053831.80 frames. ], batch size: 58, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:45:25,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=741733.3333333334, ans=0.0 2023-11-19 12:45:32,122 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=741800.0, ans=0.125 2023-11-19 12:45:37,481 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=741800.0, ans=0.95 2023-11-19 12:45:45,224 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=741866.6666666666, ans=0.125 2023-11-19 12:45:51,388 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:45:54,912 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=741933.3333333334, ans=0.0 2023-11-19 12:46:01,761 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=741933.3333333334, ans=0.125 2023-11-19 12:46:05,476 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=742000.0, ans=0.0 2023-11-19 12:46:07,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=742000.0, ans=0.125 2023-11-19 12:46:14,658 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3100, loss[loss=0.1002, simple_loss=0.1186, pruned_loss=0.03054, audio_tagging_loss=0.01041, over 15791.00 frames. ], tot_loss[loss=0.08679, simple_loss=0.1059, pruned_loss=0.02339, audio_tagging_loss=0.01044, over 3049497.40 frames. ], batch size: 59, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 12:46:26,074 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=742133.3333333334, ans=0.0 2023-11-19 12:46:26,933 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.581e+01 8.703e+01 9.646e+01 1.063e+02 1.410e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-19 12:46:28,279 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=742133.3333333334, ans=0.125 2023-11-19 12:47:10,461 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3150, loss[loss=0.07665, simple_loss=0.08364, pruned_loss=0.02364, audio_tagging_loss=0.01119, over 14527.00 frames. ], tot_loss[loss=0.08671, simple_loss=0.1056, pruned_loss=0.02333, audio_tagging_loss=0.01057, over 3047360.26 frames. ], batch size: 56, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 12:47:14,909 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=742400.0, ans=0.125 2023-11-19 12:47:17,129 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=742400.0, ans=0.125 2023-11-19 12:47:26,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=742466.6666666666, ans=0.1 2023-11-19 12:47:29,826 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=742466.6666666666, ans=0.0 2023-11-19 12:47:34,680 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:47:35,134 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.26 vs. limit=22.5 2023-11-19 12:47:35,712 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=742533.3333333334, ans=0.2 2023-11-19 12:47:48,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=742600.0, ans=0.125 2023-11-19 12:47:56,909 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.52 vs. limit=10.0 2023-11-19 12:48:00,903 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=742666.6666666666, ans=0.0 2023-11-19 12:48:06,057 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3200, loss[loss=0.09993, simple_loss=0.12, pruned_loss=0.02619, audio_tagging_loss=0.01372, over 14693.00 frames. ], tot_loss[loss=0.08711, simple_loss=0.1064, pruned_loss=0.02331, audio_tagging_loss=0.0106, over 3049589.28 frames. ], batch size: 54, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 12:48:12,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=742733.3333333334, ans=0.125 2023-11-19 12:48:14,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=742733.3333333334, ans=0.125 2023-11-19 12:48:17,489 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=742800.0, ans=0.125 2023-11-19 12:48:20,452 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.457e+01 8.995e+01 9.869e+01 1.157e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-19 12:49:02,760 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3250, loss[loss=0.08372, simple_loss=0.1051, pruned_loss=0.02121, audio_tagging_loss=0.009969, over 15027.00 frames. ], tot_loss[loss=0.08663, simple_loss=0.1058, pruned_loss=0.0231, audio_tagging_loss=0.01065, over 3052813.96 frames. ], batch size: 56, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 12:49:05,107 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=743066.6666666666, ans=0.2 2023-11-19 12:49:10,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=743066.6666666666, ans=0.0 2023-11-19 12:49:27,716 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:49:44,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=743266.6666666666, ans=0.125 2023-11-19 12:49:44,066 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=743266.6666666666, ans=0.0 2023-11-19 12:49:48,214 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=743333.3333333334, ans=0.1 2023-11-19 12:49:53,451 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=743333.3333333334, ans=0.125 2023-11-19 12:49:57,962 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3300, loss[loss=0.09291, simple_loss=0.1168, pruned_loss=0.02504, audio_tagging_loss=0.009449, over 15264.00 frames. ], tot_loss[loss=0.08752, simple_loss=0.1068, pruned_loss=0.02341, audio_tagging_loss=0.01068, over 3050576.63 frames. ], batch size: 56, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 12:50:10,521 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.235e+01 8.992e+01 9.663e+01 1.572e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 12:50:18,570 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2023-11-19 12:50:19,347 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=743533.3333333334, ans=0.125 2023-11-19 12:50:31,445 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=743600.0, ans=0.125 2023-11-19 12:50:33,615 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=743600.0, ans=0.125 2023-11-19 12:50:49,388 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.45 vs. limit=15.0 2023-11-19 12:50:50,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=743666.6666666666, ans=0.0 2023-11-19 12:50:52,845 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3350, loss[loss=0.1128, simple_loss=0.1377, pruned_loss=0.035, audio_tagging_loss=0.008997, over 15752.00 frames. ], tot_loss[loss=0.08732, simple_loss=0.1068, pruned_loss=0.02337, audio_tagging_loss=0.01054, over 3057308.46 frames. ], batch size: 58, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 12:51:08,938 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=743800.0, ans=0.2 2023-11-19 12:51:27,065 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.87 vs. limit=6.0 2023-11-19 12:51:29,990 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=743933.3333333334, ans=0.2 2023-11-19 12:51:35,768 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=743933.3333333334, ans=15.0 2023-11-19 12:51:35,769 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.30 vs. limit=15.0 2023-11-19 12:51:39,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=744000.0, ans=0.07 2023-11-19 12:51:46,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=744000.0, ans=0.125 2023-11-19 12:51:49,084 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3400, loss[loss=0.09038, simple_loss=0.1035, pruned_loss=0.02744, audio_tagging_loss=0.01117, over 14522.00 frames. ], tot_loss[loss=0.08667, simple_loss=0.1059, pruned_loss=0.02321, audio_tagging_loss=0.01049, over 3056539.51 frames. ], batch size: 56, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 12:51:50,420 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=744066.6666666666, ans=0.125 2023-11-19 12:52:04,016 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.485e+01 9.102e+01 1.010e+02 1.792e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-19 12:52:42,916 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=744333.3333333334, ans=0.1 2023-11-19 12:52:44,249 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.98 vs. limit=10.0 2023-11-19 12:52:45,406 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3450, loss[loss=0.08471, simple_loss=0.1122, pruned_loss=0.0225, audio_tagging_loss=0.006094, over 14659.00 frames. ], tot_loss[loss=0.08663, simple_loss=0.1063, pruned_loss=0.02322, audio_tagging_loss=0.01028, over 3046612.91 frames. ], batch size: 55, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 12:53:00,426 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=744466.6666666666, ans=0.1 2023-11-19 12:53:05,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=744466.6666666666, ans=0.025 2023-11-19 12:53:05,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=744466.6666666666, ans=0.0 2023-11-19 12:53:17,429 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=744600.0, ans=0.2 2023-11-19 12:53:40,480 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3500, loss[loss=0.06935, simple_loss=0.08448, pruned_loss=0.01738, audio_tagging_loss=0.00973, over 15173.00 frames. ], tot_loss[loss=0.08655, simple_loss=0.1063, pruned_loss=0.02323, audio_tagging_loss=0.01016, over 3044752.64 frames. ], batch size: 59, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 12:53:55,412 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.325e+01 9.039e+01 9.791e+01 1.416e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 12:54:08,672 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:54:19,055 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=744933.3333333334, ans=0.0 2023-11-19 12:54:22,185 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=744933.3333333334, ans=0.0 2023-11-19 12:54:30,493 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.36 vs. limit=6.0 2023-11-19 12:54:36,800 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3550, loss[loss=0.1175, simple_loss=0.1363, pruned_loss=0.03935, audio_tagging_loss=0.01002, over 15091.00 frames. ], tot_loss[loss=0.0858, simple_loss=0.1051, pruned_loss=0.02305, audio_tagging_loss=0.01018, over 3035616.87 frames. ], batch size: 54, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 12:54:43,299 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=745066.6666666666, ans=0.125 2023-11-19 12:54:46,899 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.08 vs. limit=6.0 2023-11-19 12:54:47,761 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.29 vs. limit=12.0 2023-11-19 12:54:50,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=745133.3333333334, ans=0.0 2023-11-19 12:54:53,717 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff2.min_abs, batch_count=745133.3333333334, ans=0.1 2023-11-19 12:55:00,560 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2023-11-19 12:55:07,621 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=745200.0, ans=0.1 2023-11-19 12:55:11,326 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=745266.6666666666, ans=0.125 2023-11-19 12:55:12,252 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=745266.6666666666, ans=0.5 2023-11-19 12:55:20,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=745333.3333333334, ans=0.0 2023-11-19 12:55:23,960 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.89 vs. limit=22.5 2023-11-19 12:55:25,810 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=745333.3333333334, ans=0.0 2023-11-19 12:55:31,985 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3600, loss[loss=0.06465, simple_loss=0.0725, pruned_loss=0.01596, audio_tagging_loss=0.01244, over 15331.00 frames. ], tot_loss[loss=0.08655, simple_loss=0.1061, pruned_loss=0.02337, audio_tagging_loss=0.01014, over 3040087.36 frames. ], batch size: 57, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 12:55:40,595 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=745400.0, ans=0.2 2023-11-19 12:55:45,905 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=745466.6666666666, ans=0.0 2023-11-19 12:55:46,777 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.231e+01 8.451e+01 8.878e+01 9.732e+01 1.759e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-19 12:55:56,988 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2023-11-19 12:56:06,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=745600.0, ans=0.0 2023-11-19 12:56:08,234 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.594e-03 2023-11-19 12:56:25,475 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=745666.6666666666, ans=0.0 2023-11-19 12:56:27,492 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3650, loss[loss=0.133, simple_loss=0.1611, pruned_loss=0.04178, audio_tagging_loss=0.01063, over 15967.00 frames. ], tot_loss[loss=0.08683, simple_loss=0.1064, pruned_loss=0.02361, audio_tagging_loss=0.01002, over 3037703.30 frames. ], batch size: 56, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 12:56:40,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=745800.0, ans=0.0 2023-11-19 12:56:42,542 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=745800.0, ans=0.125 2023-11-19 12:56:45,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=745800.0, ans=0.1 2023-11-19 12:56:47,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=745800.0, ans=0.5 2023-11-19 12:56:53,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=745866.6666666666, ans=0.125 2023-11-19 12:57:05,021 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.69 vs. limit=12.0 2023-11-19 12:57:23,634 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3700, loss[loss=0.08109, simple_loss=0.09911, pruned_loss=0.02318, audio_tagging_loss=0.008357, over 15249.00 frames. ], tot_loss[loss=0.08659, simple_loss=0.1063, pruned_loss=0.02343, audio_tagging_loss=0.01003, over 3044142.19 frames. ], batch size: 58, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 12:57:37,612 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.774e+01 9.543e+01 1.094e+02 1.492e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-19 12:57:44,273 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=746133.3333333334, ans=0.0 2023-11-19 12:57:55,808 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=746266.6666666666, ans=0.125 2023-11-19 12:57:57,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=746266.6666666666, ans=0.125 2023-11-19 12:57:59,993 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=746266.6666666666, ans=12.0 2023-11-19 12:58:00,692 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746266.6666666666, ans=0.1 2023-11-19 12:58:19,054 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3750, loss[loss=0.08128, simple_loss=0.09411, pruned_loss=0.02099, audio_tagging_loss=0.01323, over 13793.00 frames. ], tot_loss[loss=0.08674, simple_loss=0.1064, pruned_loss=0.02339, audio_tagging_loss=0.01016, over 3045269.30 frames. ], batch size: 55, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 12:58:19,295 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=746400.0, ans=0.0 2023-11-19 12:58:19,319 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:58:36,288 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=746466.6666666666, ans=0.125 2023-11-19 12:58:46,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=746533.3333333334, ans=0.125 2023-11-19 12:58:52,974 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746600.0, ans=0.1 2023-11-19 12:58:56,951 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:59:09,943 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-19 12:59:17,450 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3800, loss[loss=0.08736, simple_loss=0.1031, pruned_loss=0.02471, audio_tagging_loss=0.01108, over 15634.00 frames. ], tot_loss[loss=0.08684, simple_loss=0.1063, pruned_loss=0.02341, audio_tagging_loss=0.01029, over 3046383.95 frames. ], batch size: 56, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 12:59:20,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=746733.3333333334, ans=0.0 2023-11-19 12:59:29,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746800.0, ans=0.1 2023-11-19 12:59:32,188 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.736e+01 8.389e+01 8.999e+01 1.009e+02 1.684e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 12:59:33,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=746800.0, ans=0.0 2023-11-19 12:59:40,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746866.6666666666, ans=0.1 2023-11-19 12:59:46,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=746866.6666666666, ans=0.125 2023-11-19 13:00:09,792 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=747000.0, ans=0.125 2023-11-19 13:00:13,375 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3850, loss[loss=0.08216, simple_loss=0.1077, pruned_loss=0.01813, audio_tagging_loss=0.01017, over 14941.00 frames. ], tot_loss[loss=0.08667, simple_loss=0.1059, pruned_loss=0.02325, audio_tagging_loss=0.01047, over 3042296.92 frames. ], batch size: 55, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 13:00:25,228 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=747133.3333333334, ans=0.125 2023-11-19 13:00:26,767 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.47 vs. limit=15.0 2023-11-19 13:00:29,413 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=747133.3333333334, ans=0.125 2023-11-19 13:00:30,017 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.32 vs. limit=5.0 2023-11-19 13:00:36,097 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=747200.0, ans=0.125 2023-11-19 13:01:04,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=747333.3333333334, ans=0.0 2023-11-19 13:01:08,398 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3900, loss[loss=0.1342, simple_loss=0.1438, pruned_loss=0.05538, audio_tagging_loss=0.006955, over 15115.00 frames. ], tot_loss[loss=0.08625, simple_loss=0.1048, pruned_loss=0.0233, audio_tagging_loss=0.01055, over 3042549.70 frames. ], batch size: 55, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 13:01:23,249 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.643e+01 8.736e+01 9.302e+01 1.013e+02 3.038e+02, threshold=1.860e+02, percent-clipped=1.0 2023-11-19 13:01:23,453 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747466.6666666666, ans=0.1 2023-11-19 13:01:24,976 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.92 vs. limit=12.0 2023-11-19 13:01:34,633 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=747533.3333333334, ans=0.0 2023-11-19 13:01:34,643 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747533.3333333334, ans=0.1 2023-11-19 13:01:38,344 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=747533.3333333334, ans=0.2 2023-11-19 13:01:41,525 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=747600.0, ans=0.125 2023-11-19 13:01:56,353 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=747666.6666666666, ans=0.125 2023-11-19 13:02:04,250 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 3950, loss[loss=0.09994, simple_loss=0.1345, pruned_loss=0.02503, audio_tagging_loss=0.007654, over 15895.00 frames. ], tot_loss[loss=0.0857, simple_loss=0.1044, pruned_loss=0.02292, audio_tagging_loss=0.01057, over 3057202.22 frames. ], batch size: 56, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 13:02:09,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=747733.3333333334, ans=0.2 2023-11-19 13:02:09,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=747733.3333333334, ans=0.125 2023-11-19 13:02:20,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=747800.0, ans=0.125 2023-11-19 13:02:24,071 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=747800.0, ans=0.2 2023-11-19 13:02:25,085 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=747800.0, ans=0.125 2023-11-19 13:02:27,054 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=747866.6666666666, ans=0.035 2023-11-19 13:02:32,349 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=747866.6666666666, ans=0.1 2023-11-19 13:02:38,754 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=747933.3333333334, ans=0.0 2023-11-19 13:02:42,518 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=747933.3333333334, ans=0.125 2023-11-19 13:03:01,252 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4000, loss[loss=0.0818, simple_loss=0.0937, pruned_loss=0.02009, audio_tagging_loss=0.01485, over 15140.00 frames. ], tot_loss[loss=0.08667, simple_loss=0.1052, pruned_loss=0.02332, audio_tagging_loss=0.01077, over 3050677.10 frames. ], batch size: 57, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 13:03:05,781 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:03:08,177 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2023-11-19 13:03:08,757 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=748066.6666666666, ans=0.0 2023-11-19 13:03:14,917 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.731e+01 8.515e+01 9.235e+01 1.023e+02 1.834e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 13:03:20,378 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=748133.3333333334, ans=0.125 2023-11-19 13:03:35,333 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:03:56,340 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4050, loss[loss=0.08439, simple_loss=0.1028, pruned_loss=0.02362, audio_tagging_loss=0.009352, over 15681.00 frames. ], tot_loss[loss=0.08721, simple_loss=0.1058, pruned_loss=0.02355, audio_tagging_loss=0.01074, over 3047565.74 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:03:58,493 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:04:02,961 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=748400.0, ans=0.125 2023-11-19 13:04:06,582 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=748466.6666666666, ans=0.0 2023-11-19 13:04:42,804 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.10 vs. limit=22.5 2023-11-19 13:04:49,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=748666.6666666666, ans=0.2 2023-11-19 13:04:51,379 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4100, loss[loss=0.09186, simple_loss=0.1128, pruned_loss=0.02425, audio_tagging_loss=0.01122, over 15380.00 frames. ], tot_loss[loss=0.08669, simple_loss=0.1053, pruned_loss=0.0233, audio_tagging_loss=0.01074, over 3036278.04 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:04:51,537 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=748733.3333333334, ans=0.0 2023-11-19 13:05:06,282 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.287e+01 8.382e+01 9.039e+01 9.605e+01 1.210e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 13:05:07,547 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=748800.0, ans=0.1 2023-11-19 13:05:09,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=748800.0, ans=0.125 2023-11-19 13:05:11,780 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=748800.0, ans=0.0 2023-11-19 13:05:16,018 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=748866.6666666666, ans=0.0 2023-11-19 13:05:19,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=748866.6666666666, ans=0.1 2023-11-19 13:05:32,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=748933.3333333334, ans=0.0 2023-11-19 13:05:33,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=748933.3333333334, ans=0.025 2023-11-19 13:05:41,265 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=749000.0, ans=0.1 2023-11-19 13:05:46,730 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4150, loss[loss=0.07603, simple_loss=0.08756, pruned_loss=0.02215, audio_tagging_loss=0.0101, over 15915.00 frames. ], tot_loss[loss=0.08611, simple_loss=0.1049, pruned_loss=0.02311, audio_tagging_loss=0.01057, over 3034373.37 frames. ], batch size: 59, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:05:59,504 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=749133.3333333334, ans=0.125 2023-11-19 13:06:05,006 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=749133.3333333334, ans=0.1 2023-11-19 13:06:22,628 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=749266.6666666666, ans=0.125 2023-11-19 13:06:25,583 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:06:30,189 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.60 vs. limit=15.0 2023-11-19 13:06:41,686 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4200, loss[loss=0.06519, simple_loss=0.07755, pruned_loss=0.01561, audio_tagging_loss=0.0108, over 15144.00 frames. ], tot_loss[loss=0.08628, simple_loss=0.1053, pruned_loss=0.02327, audio_tagging_loss=0.01036, over 3030988.24 frames. ], batch size: 59, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:06:55,966 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.871e+01 8.337e+01 9.071e+01 1.010e+02 1.242e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-19 13:07:12,515 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2023-11-19 13:07:14,096 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2023-11-19 13:07:23,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=749600.0, ans=0.0 2023-11-19 13:07:25,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=749666.6666666666, ans=0.125 2023-11-19 13:07:26,640 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=749666.6666666666, ans=0.125 2023-11-19 13:07:37,282 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4250, loss[loss=0.06375, simple_loss=0.07136, pruned_loss=0.01435, audio_tagging_loss=0.01372, over 16014.00 frames. ], tot_loss[loss=0.08528, simple_loss=0.1042, pruned_loss=0.02284, audio_tagging_loss=0.01037, over 3032640.13 frames. ], batch size: 63, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:07:44,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=749733.3333333334, ans=0.125 2023-11-19 13:08:14,201 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=749933.3333333334, ans=0.07 2023-11-19 13:08:30,575 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=750000.0, ans=0.125 2023-11-19 13:08:32,573 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4300, loss[loss=0.08831, simple_loss=0.1104, pruned_loss=0.0214, audio_tagging_loss=0.0117, over 17138.00 frames. ], tot_loss[loss=0.08604, simple_loss=0.1055, pruned_loss=0.02314, audio_tagging_loss=0.01013, over 3034890.89 frames. ], batch size: 65, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:08:37,335 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=15.0 2023-11-19 13:08:42,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=750066.6666666666, ans=0.0 2023-11-19 13:08:43,901 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=750133.3333333334, ans=0.0 2023-11-19 13:08:47,738 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.890e+01 8.411e+01 9.252e+01 1.004e+02 1.296e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-19 13:09:14,980 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=750266.6666666666, ans=0.125 2023-11-19 13:09:28,938 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4350, loss[loss=0.1017, simple_loss=0.1283, pruned_loss=0.02763, audio_tagging_loss=0.00998, over 15599.00 frames. ], tot_loss[loss=0.08573, simple_loss=0.105, pruned_loss=0.02304, audio_tagging_loss=0.01018, over 3030619.49 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 16.0 2023-11-19 13:09:33,363 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=750400.0, ans=0.1 2023-11-19 13:09:47,599 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=750466.6666666666, ans=0.0 2023-11-19 13:09:48,591 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=750466.6666666666, ans=0.125 2023-11-19 13:09:53,843 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=750533.3333333334, ans=0.2 2023-11-19 13:10:12,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=750666.6666666666, ans=0.125 2023-11-19 13:10:16,678 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=750666.6666666666, ans=0.125 2023-11-19 13:10:24,883 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4400, loss[loss=0.06922, simple_loss=0.08346, pruned_loss=0.01677, audio_tagging_loss=0.01071, over 13634.00 frames. ], tot_loss[loss=0.08582, simple_loss=0.1049, pruned_loss=0.02312, audio_tagging_loss=0.01027, over 3025095.60 frames. ], batch size: 53, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:10:31,660 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:10:39,851 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.801e+01 8.194e+01 9.026e+01 9.942e+01 1.275e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 13:10:49,972 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=750866.6666666666, ans=0.0 2023-11-19 13:11:12,223 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=751000.0, ans=0.125 2023-11-19 13:11:20,612 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4450, loss[loss=0.1314, simple_loss=0.1667, pruned_loss=0.04047, audio_tagging_loss=0.007568, over 15910.00 frames. ], tot_loss[loss=0.08597, simple_loss=0.105, pruned_loss=0.02316, audio_tagging_loss=0.0103, over 3032100.05 frames. ], batch size: 54, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:12:16,423 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4500, loss[loss=0.07469, simple_loss=0.08938, pruned_loss=0.02041, audio_tagging_loss=0.009595, over 15890.00 frames. ], tot_loss[loss=0.08606, simple_loss=0.1052, pruned_loss=0.02315, audio_tagging_loss=0.01029, over 3037854.94 frames. ], batch size: 61, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:12:32,371 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.486e+01 8.202e+01 9.163e+01 9.982e+01 1.315e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-19 13:12:51,334 INFO [scaling.py:1022] (2/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.07 vs. limit=8.0 2023-11-19 13:13:00,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=751666.6666666666, ans=0.07 2023-11-19 13:13:12,551 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4550, loss[loss=0.08752, simple_loss=0.1094, pruned_loss=0.02196, audio_tagging_loss=0.01086, over 15510.00 frames. ], tot_loss[loss=0.08604, simple_loss=0.1052, pruned_loss=0.02316, audio_tagging_loss=0.01026, over 3037806.20 frames. ], batch size: 57, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:13:44,433 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=751933.3333333334, ans=0.07 2023-11-19 13:13:54,724 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:13:56,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=752000.0, ans=0.2 2023-11-19 13:14:06,969 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=752066.6666666666, ans=0.0 2023-11-19 13:14:07,804 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4600, loss[loss=0.08676, simple_loss=0.106, pruned_loss=0.02355, audio_tagging_loss=0.0102, over 14998.00 frames. ], tot_loss[loss=0.08582, simple_loss=0.1049, pruned_loss=0.02305, audio_tagging_loss=0.01032, over 3035553.94 frames. ], batch size: 55, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:14:15,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=752066.6666666666, ans=0.125 2023-11-19 13:14:23,752 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.499e+01 8.092e+01 8.852e+01 9.685e+01 1.325e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-19 13:14:27,533 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2023-11-19 13:14:37,774 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=752200.0, ans=0.125 2023-11-19 13:14:44,118 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=752266.6666666666, ans=0.2 2023-11-19 13:14:53,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=752333.3333333334, ans=0.1 2023-11-19 13:15:04,122 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4650, loss[loss=0.09283, simple_loss=0.1071, pruned_loss=0.02716, audio_tagging_loss=0.01213, over 14302.00 frames. ], tot_loss[loss=0.08533, simple_loss=0.1038, pruned_loss=0.02296, audio_tagging_loss=0.01047, over 3041770.78 frames. ], batch size: 57, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:15:08,978 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.06 vs. limit=15.0 2023-11-19 13:15:09,534 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=752400.0, ans=0.125 2023-11-19 13:15:27,588 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=752533.3333333334, ans=0.1 2023-11-19 13:15:29,783 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=752533.3333333334, ans=0.125 2023-11-19 13:15:32,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=752533.3333333334, ans=0.0 2023-11-19 13:15:59,523 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4700, loss[loss=0.08464, simple_loss=0.1011, pruned_loss=0.02173, audio_tagging_loss=0.01237, over 15603.00 frames. ], tot_loss[loss=0.08545, simple_loss=0.104, pruned_loss=0.0229, audio_tagging_loss=0.01056, over 3043489.01 frames. ], batch size: 57, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:16:03,431 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=752733.3333333334, ans=10.0 2023-11-19 13:16:10,348 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=752800.0, ans=15.0 2023-11-19 13:16:14,819 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.706e+01 9.585e+01 1.066e+02 1.440e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-19 13:16:19,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=752800.0, ans=0.0 2023-11-19 13:16:26,743 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.13 vs. limit=6.0 2023-11-19 13:16:39,422 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=752933.3333333334, ans=0.2 2023-11-19 13:16:41,919 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=752933.3333333334, ans=0.125 2023-11-19 13:16:43,093 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=753000.0, ans=0.2 2023-11-19 13:16:51,867 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=753000.0, ans=0.125 2023-11-19 13:16:52,182 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.63 vs. limit=12.0 2023-11-19 13:16:54,892 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4750, loss[loss=0.1203, simple_loss=0.1484, pruned_loss=0.0353, audio_tagging_loss=0.01083, over 15206.00 frames. ], tot_loss[loss=0.08496, simple_loss=0.1033, pruned_loss=0.02281, audio_tagging_loss=0.01049, over 3037242.89 frames. ], batch size: 54, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:17:28,315 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.06 vs. limit=15.0 2023-11-19 13:17:31,162 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=753266.6666666666, ans=0.1 2023-11-19 13:17:47,461 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=753333.3333333334, ans=0.0 2023-11-19 13:17:51,341 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4800, loss[loss=0.08003, simple_loss=0.09597, pruned_loss=0.02047, audio_tagging_loss=0.01157, over 14953.00 frames. ], tot_loss[loss=0.08535, simple_loss=0.1037, pruned_loss=0.02299, audio_tagging_loss=0.01052, over 3039048.01 frames. ], batch size: 58, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:18:02,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=753466.6666666666, ans=0.09899494936611666 2023-11-19 13:18:04,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=753466.6666666666, ans=0.125 2023-11-19 13:18:06,144 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.271e+01 9.115e+01 1.017e+02 1.442e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-19 13:18:19,966 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.09 vs. limit=22.5 2023-11-19 13:18:20,778 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=15.0 2023-11-19 13:18:21,689 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=753533.3333333334, ans=0.125 2023-11-19 13:18:37,994 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.04 vs. limit=15.0 2023-11-19 13:18:45,782 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4850, loss[loss=0.06645, simple_loss=0.07088, pruned_loss=0.01833, audio_tagging_loss=0.01269, over 14694.00 frames. ], tot_loss[loss=0.08583, simple_loss=0.1044, pruned_loss=0.02299, audio_tagging_loss=0.01064, over 3041877.61 frames. ], batch size: 56, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:18:53,516 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=753733.3333333334, ans=0.09899494936611666 2023-11-19 13:19:05,128 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=753800.0, ans=0.0 2023-11-19 13:19:06,511 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.47 vs. limit=15.0 2023-11-19 13:19:07,598 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.85 vs. limit=10.0 2023-11-19 13:19:17,849 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=753866.6666666666, ans=0.0 2023-11-19 13:19:24,060 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=753933.3333333334, ans=0.125 2023-11-19 13:19:41,817 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4900, loss[loss=0.08147, simple_loss=0.1016, pruned_loss=0.01864, audio_tagging_loss=0.01202, over 14938.00 frames. ], tot_loss[loss=0.08594, simple_loss=0.1049, pruned_loss=0.02291, audio_tagging_loss=0.0106, over 3035038.07 frames. ], batch size: 58, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:19:57,513 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.439e+01 8.307e+01 9.002e+01 9.755e+01 1.261e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 13:19:58,900 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=754133.3333333334, ans=0.0 2023-11-19 13:20:18,759 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=754266.6666666666, ans=0.125 2023-11-19 13:20:20,941 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=754266.6666666666, ans=0.0 2023-11-19 13:20:24,200 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=754266.6666666666, ans=0.125 2023-11-19 13:20:37,465 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 4950, loss[loss=0.08905, simple_loss=0.1145, pruned_loss=0.02133, audio_tagging_loss=0.01049, over 14977.00 frames. ], tot_loss[loss=0.08579, simple_loss=0.105, pruned_loss=0.02284, audio_tagging_loss=0.01045, over 3039240.62 frames. ], batch size: 54, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:20:37,716 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=754400.0, ans=0.125 2023-11-19 13:20:49,397 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=754466.6666666666, ans=0.2 2023-11-19 13:21:03,256 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=754533.3333333334, ans=0.125 2023-11-19 13:21:09,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=754600.0, ans=0.05 2023-11-19 13:21:29,982 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=754666.6666666666, ans=0.125 2023-11-19 13:21:32,749 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5000, loss[loss=0.07449, simple_loss=0.09431, pruned_loss=0.01826, audio_tagging_loss=0.009082, over 16077.00 frames. ], tot_loss[loss=0.08586, simple_loss=0.1054, pruned_loss=0.02284, audio_tagging_loss=0.01032, over 3044968.06 frames. ], batch size: 61, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:21:45,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=754800.0, ans=0.125 2023-11-19 13:21:48,521 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.750e+01 8.260e+01 8.986e+01 1.009e+02 1.320e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-19 13:21:57,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=754866.6666666666, ans=0.125 2023-11-19 13:22:12,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=754933.3333333334, ans=0.125 2023-11-19 13:22:16,092 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=755000.0, ans=0.125 2023-11-19 13:22:16,355 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.45 vs. limit=15.0 2023-11-19 13:22:28,171 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5050, loss[loss=0.0762, simple_loss=0.105, pruned_loss=0.01745, audio_tagging_loss=0.00626, over 15300.00 frames. ], tot_loss[loss=0.08599, simple_loss=0.1056, pruned_loss=0.02292, audio_tagging_loss=0.01028, over 3052362.14 frames. ], batch size: 58, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:22:30,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=755066.6666666666, ans=0.125 2023-11-19 13:22:35,410 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=755066.6666666666, ans=0.2 2023-11-19 13:22:54,423 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=755200.0, ans=0.1 2023-11-19 13:22:55,418 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:23:08,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=755266.6666666666, ans=0.125 2023-11-19 13:23:15,850 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=755333.3333333334, ans=0.04949747468305833 2023-11-19 13:23:24,484 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5100, loss[loss=0.1104, simple_loss=0.1385, pruned_loss=0.03166, audio_tagging_loss=0.009482, over 16054.00 frames. ], tot_loss[loss=0.08601, simple_loss=0.1057, pruned_loss=0.02296, audio_tagging_loss=0.01018, over 3043915.56 frames. ], batch size: 56, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:23:31,548 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=755400.0, ans=0.0 2023-11-19 13:23:32,629 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=755400.0, ans=0.0 2023-11-19 13:23:34,939 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-19 13:23:39,694 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.354e+01 8.218e+01 8.903e+01 1.035e+02 1.339e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-19 13:23:48,981 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=755533.3333333334, ans=0.2 2023-11-19 13:23:51,286 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=755533.3333333334, ans=0.125 2023-11-19 13:23:53,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=755533.3333333334, ans=0.125 2023-11-19 13:23:56,963 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=755600.0, ans=0.125 2023-11-19 13:24:20,073 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5150, loss[loss=0.06899, simple_loss=0.0819, pruned_loss=0.01799, audio_tagging_loss=0.01005, over 14647.00 frames. ], tot_loss[loss=0.08521, simple_loss=0.1048, pruned_loss=0.0226, audio_tagging_loss=0.01021, over 3047800.64 frames. ], batch size: 58, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:24:50,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=755866.6666666666, ans=0.5 2023-11-19 13:24:58,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=755933.3333333334, ans=0.125 2023-11-19 13:25:04,084 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2023-11-19 13:25:05,139 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756000.0, ans=0.1 2023-11-19 13:25:09,877 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=756000.0, ans=0.125 2023-11-19 13:25:15,940 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5200, loss[loss=0.1007, simple_loss=0.1292, pruned_loss=0.02836, audio_tagging_loss=0.007725, over 15551.00 frames. ], tot_loss[loss=0.08623, simple_loss=0.1058, pruned_loss=0.0231, audio_tagging_loss=0.01021, over 3046093.93 frames. ], batch size: 59, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:25:16,218 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=756066.6666666666, ans=0.125 2023-11-19 13:25:21,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=756066.6666666666, ans=0.125 2023-11-19 13:25:31,606 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.033e+01 8.473e+01 9.085e+01 1.039e+02 1.273e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 13:25:55,069 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=756266.6666666666, ans=0.125 2023-11-19 13:26:11,835 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5250, loss[loss=0.09033, simple_loss=0.1131, pruned_loss=0.0237, audio_tagging_loss=0.01008, over 15921.00 frames. ], tot_loss[loss=0.08586, simple_loss=0.1056, pruned_loss=0.0229, audio_tagging_loss=0.01017, over 3044437.60 frames. ], batch size: 59, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:26:18,928 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=756400.0, ans=0.0 2023-11-19 13:26:31,498 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=756466.6666666666, ans=0.125 2023-11-19 13:26:34,068 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=756533.3333333334, ans=0.0 2023-11-19 13:26:36,316 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=756533.3333333334, ans=0.125 2023-11-19 13:27:02,061 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=756666.6666666666, ans=0.0 2023-11-19 13:27:04,168 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=756666.6666666666, ans=0.125 2023-11-19 13:27:07,157 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5300, loss[loss=0.09393, simple_loss=0.1068, pruned_loss=0.02787, audio_tagging_loss=0.01266, over 14908.00 frames. ], tot_loss[loss=0.08616, simple_loss=0.106, pruned_loss=0.02295, audio_tagging_loss=0.01023, over 3034908.55 frames. ], batch size: 56, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:27:08,427 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=756733.3333333334, ans=0.125 2023-11-19 13:27:17,883 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-11-19 13:27:22,582 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.871e+01 8.251e+01 9.151e+01 1.020e+02 1.250e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 13:27:41,722 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2023-11-19 13:27:44,613 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=756933.3333333334, ans=0.125 2023-11-19 13:27:47,658 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=756933.3333333334, ans=0.0 2023-11-19 13:27:56,192 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=757000.0, ans=0.025 2023-11-19 13:28:01,979 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=757066.6666666666, ans=0.125 2023-11-19 13:28:02,882 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5350, loss[loss=0.06888, simple_loss=0.08246, pruned_loss=0.02025, audio_tagging_loss=0.007401, over 15106.00 frames. ], tot_loss[loss=0.08612, simple_loss=0.1058, pruned_loss=0.02293, audio_tagging_loss=0.01027, over 3039284.74 frames. ], batch size: 57, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:28:17,430 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=757133.3333333334, ans=0.125 2023-11-19 13:28:23,311 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=757133.3333333334, ans=0.125 2023-11-19 13:28:24,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=757200.0, ans=0.0 2023-11-19 13:28:27,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=757200.0, ans=0.2 2023-11-19 13:28:29,581 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=757200.0, ans=0.0 2023-11-19 13:28:33,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=757200.0, ans=0.0 2023-11-19 13:28:41,159 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=5.98 vs. limit=15.0 2023-11-19 13:28:54,021 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2023-11-19 13:28:55,860 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=757333.3333333334, ans=0.125 2023-11-19 13:28:57,745 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5400, loss[loss=0.0929, simple_loss=0.1109, pruned_loss=0.02568, audio_tagging_loss=0.01176, over 14720.00 frames. ], tot_loss[loss=0.0861, simple_loss=0.106, pruned_loss=0.02285, audio_tagging_loss=0.01026, over 3044591.43 frames. ], batch size: 57, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:29:14,080 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.379e+01 8.876e+01 9.570e+01 1.112e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-19 13:29:21,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=757533.3333333334, ans=0.07 2023-11-19 13:29:21,842 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=757533.3333333334, ans=0.0 2023-11-19 13:29:24,807 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.32 vs. limit=22.5 2023-11-19 13:29:30,798 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=757600.0, ans=0.1 2023-11-19 13:29:38,220 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=757600.0, ans=0.2 2023-11-19 13:29:54,348 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5450, loss[loss=0.09367, simple_loss=0.1112, pruned_loss=0.02613, audio_tagging_loss=0.01193, over 14338.00 frames. ], tot_loss[loss=0.08673, simple_loss=0.1066, pruned_loss=0.02314, audio_tagging_loss=0.01027, over 3041539.12 frames. ], batch size: 57, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:30:02,167 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=757733.3333333334, ans=0.125 2023-11-19 13:30:09,988 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=757800.0, ans=0.125 2023-11-19 13:30:12,065 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=757800.0, ans=0.0 2023-11-19 13:30:14,191 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=757800.0, ans=0.125 2023-11-19 13:30:15,115 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=757866.6666666666, ans=0.1 2023-11-19 13:30:20,551 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=757866.6666666666, ans=0.0 2023-11-19 13:30:25,238 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=757866.6666666666, ans=0.125 2023-11-19 13:30:35,786 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=757933.3333333334, ans=10.0 2023-11-19 13:30:36,823 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=757933.3333333334, ans=0.125 2023-11-19 13:30:49,727 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5500, loss[loss=0.08733, simple_loss=0.109, pruned_loss=0.01968, audio_tagging_loss=0.01316, over 14657.00 frames. ], tot_loss[loss=0.08605, simple_loss=0.1057, pruned_loss=0.02287, audio_tagging_loss=0.01035, over 3038101.52 frames. ], batch size: 57, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:30:51,096 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=758066.6666666666, ans=0.0 2023-11-19 13:30:51,999 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=758066.6666666666, ans=0.125 2023-11-19 13:31:01,609 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.580e-02 2023-11-19 13:31:03,664 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=758133.3333333334, ans=0.125 2023-11-19 13:31:03,813 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=758133.3333333334, ans=0.125 2023-11-19 13:31:04,522 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.775e+01 8.323e+01 9.025e+01 9.961e+01 1.664e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 13:31:22,319 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=758266.6666666666, ans=0.2 2023-11-19 13:31:28,464 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=758266.6666666666, ans=10.0 2023-11-19 13:31:30,283 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.59 vs. limit=15.0 2023-11-19 13:31:44,672 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5550, loss[loss=0.08479, simple_loss=0.1028, pruned_loss=0.02216, audio_tagging_loss=0.01124, over 13639.00 frames. ], tot_loss[loss=0.08564, simple_loss=0.1049, pruned_loss=0.02277, audio_tagging_loss=0.01041, over 3036596.32 frames. ], batch size: 53, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:32:05,438 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=758466.6666666666, ans=0.125 2023-11-19 13:32:06,730 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.08 vs. limit=15.0 2023-11-19 13:32:10,080 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.26 vs. limit=15.0 2023-11-19 13:32:13,290 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=758533.3333333334, ans=0.5 2023-11-19 13:32:20,797 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=758600.0, ans=0.2 2023-11-19 13:32:32,023 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=758666.6666666666, ans=0.2 2023-11-19 13:32:40,815 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5600, loss[loss=0.08541, simple_loss=0.1075, pruned_loss=0.02148, audio_tagging_loss=0.01019, over 15424.00 frames. ], tot_loss[loss=0.08662, simple_loss=0.1062, pruned_loss=0.02307, audio_tagging_loss=0.01047, over 3042381.65 frames. ], batch size: 57, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:32:45,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=758733.3333333334, ans=0.2 2023-11-19 13:32:56,429 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.309e+01 8.349e+01 9.140e+01 1.023e+02 1.369e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 13:33:01,967 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=758866.6666666666, ans=0.1 2023-11-19 13:33:18,596 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:33:20,197 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.59 vs. limit=15.0 2023-11-19 13:33:22,046 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=758933.3333333334, ans=0.125 2023-11-19 13:33:36,687 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5650, loss[loss=0.08217, simple_loss=0.09901, pruned_loss=0.02137, audio_tagging_loss=0.01129, over 14954.00 frames. ], tot_loss[loss=0.08694, simple_loss=0.1064, pruned_loss=0.0232, audio_tagging_loss=0.01056, over 3043135.94 frames. ], batch size: 56, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:33:36,820 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=759066.6666666666, ans=0.0 2023-11-19 13:33:51,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=759133.3333333334, ans=0.125 2023-11-19 13:33:59,087 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=759200.0, ans=0.125 2023-11-19 13:33:59,488 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.33 vs. limit=15.0 2023-11-19 13:34:20,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=759333.3333333334, ans=0.0 2023-11-19 13:34:26,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=759333.3333333334, ans=0.125 2023-11-19 13:34:26,800 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=759333.3333333334, ans=0.0 2023-11-19 13:34:31,772 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5700, loss[loss=0.08983, simple_loss=0.1172, pruned_loss=0.02266, audio_tagging_loss=0.008569, over 15648.00 frames. ], tot_loss[loss=0.0865, simple_loss=0.1056, pruned_loss=0.02318, audio_tagging_loss=0.01053, over 3036515.27 frames. ], batch size: 55, lr: 6.94e-03, grad_scale: 32.0 2023-11-19 13:34:47,526 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.640e+01 8.102e+01 8.841e+01 9.614e+01 1.155e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-19 13:34:54,694 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=759533.3333333334, ans=0.0 2023-11-19 13:34:56,661 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=759533.3333333334, ans=0.125 2023-11-19 13:34:56,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=759533.3333333334, ans=0.125 2023-11-19 13:35:06,700 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=759600.0, ans=0.2 2023-11-19 13:35:14,160 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=759600.0, ans=0.2 2023-11-19 13:35:19,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=759666.6666666666, ans=0.125 2023-11-19 13:35:27,571 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5750, loss[loss=0.08127, simple_loss=0.09722, pruned_loss=0.01927, audio_tagging_loss=0.01339, over 13920.00 frames. ], tot_loss[loss=0.08615, simple_loss=0.1053, pruned_loss=0.02304, audio_tagging_loss=0.01047, over 3039667.85 frames. ], batch size: 52, lr: 6.94e-03, grad_scale: 32.0 2023-11-19 13:35:36,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=759733.3333333334, ans=0.07 2023-11-19 13:35:42,656 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=759800.0, ans=0.125 2023-11-19 13:35:50,964 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=759866.6666666666, ans=0.035 2023-11-19 13:36:13,768 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.96 vs. limit=6.0 2023-11-19 13:36:16,715 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=760000.0, ans=0.0 2023-11-19 13:36:23,404 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5800, loss[loss=0.05857, simple_loss=0.06481, pruned_loss=0.01389, audio_tagging_loss=0.01227, over 13933.00 frames. ], tot_loss[loss=0.08652, simple_loss=0.1058, pruned_loss=0.02327, audio_tagging_loss=0.01033, over 3038581.58 frames. ], batch size: 55, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 13:36:25,615 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.17 vs. limit=10.0 2023-11-19 13:36:26,251 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=760066.6666666666, ans=0.125 2023-11-19 13:36:39,823 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.949e+01 8.575e+01 9.143e+01 9.990e+01 1.422e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-19 13:36:42,300 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=760133.3333333334, ans=0.125 2023-11-19 13:36:50,657 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=760200.0, ans=0.0 2023-11-19 13:36:50,674 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=760200.0, ans=0.035 2023-11-19 13:37:08,921 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:37:09,953 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=760333.3333333334, ans=0.1 2023-11-19 13:37:19,217 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5850, loss[loss=0.08839, simple_loss=0.1049, pruned_loss=0.02424, audio_tagging_loss=0.01168, over 14731.00 frames. ], tot_loss[loss=0.08631, simple_loss=0.1056, pruned_loss=0.02319, audio_tagging_loss=0.01034, over 3034430.96 frames. ], batch size: 54, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 13:37:30,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=760466.6666666666, ans=0.2 2023-11-19 13:37:33,039 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=760466.6666666666, ans=0.125 2023-11-19 13:37:33,101 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=760466.6666666666, ans=0.0 2023-11-19 13:37:36,210 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=760466.6666666666, ans=0.0 2023-11-19 13:37:47,763 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.72 vs. limit=22.5 2023-11-19 13:37:50,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=760533.3333333334, ans=0.0 2023-11-19 13:38:06,077 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=760666.6666666666, ans=0.125 2023-11-19 13:38:15,436 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5900, loss[loss=0.07821, simple_loss=0.0985, pruned_loss=0.02068, audio_tagging_loss=0.008278, over 16282.00 frames. ], tot_loss[loss=0.0859, simple_loss=0.1053, pruned_loss=0.02303, audio_tagging_loss=0.01022, over 3040589.55 frames. ], batch size: 60, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 13:38:32,424 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.639e+01 8.376e+01 9.188e+01 1.002e+02 1.553e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 13:38:32,655 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=760800.0, ans=0.1 2023-11-19 13:38:40,943 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=760866.6666666666, ans=0.125 2023-11-19 13:38:55,292 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.29 vs. limit=12.0 2023-11-19 13:38:58,875 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=761000.0, ans=0.125 2023-11-19 13:39:10,373 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 5950, loss[loss=0.08543, simple_loss=0.1069, pruned_loss=0.02148, audio_tagging_loss=0.01049, over 16678.00 frames. ], tot_loss[loss=0.08578, simple_loss=0.1054, pruned_loss=0.02296, audio_tagging_loss=0.01013, over 3045208.99 frames. ], batch size: 62, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 13:39:23,930 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=761133.3333333334, ans=0.125 2023-11-19 13:39:24,897 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=761133.3333333334, ans=0.125 2023-11-19 13:39:33,977 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=761200.0, ans=0.0 2023-11-19 13:39:38,728 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=761200.0, ans=0.0 2023-11-19 13:40:05,368 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:40:06,249 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6000, loss[loss=0.1002, simple_loss=0.1311, pruned_loss=0.02568, audio_tagging_loss=0.008979, over 15758.00 frames. ], tot_loss[loss=0.08549, simple_loss=0.1048, pruned_loss=0.02293, audio_tagging_loss=0.01017, over 3039695.78 frames. ], batch size: 58, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:40:06,250 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 13:40:38,522 INFO [train_asr.py:1147] (2/4) Epoch 10, validation: loss=0.06367, simple_loss=0.05534, pruned_loss=0.00639, audio_tagging_loss=0.02961, over 4681554.00 frames. 2023-11-19 13:40:38,523 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 13:40:44,015 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=761400.0, ans=0.05 2023-11-19 13:40:55,647 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.484e+01 8.242e+01 8.869e+01 9.811e+01 1.293e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-19 13:41:17,854 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:41:22,225 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=761666.6666666666, ans=0.0 2023-11-19 13:41:34,258 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6050, loss[loss=0.05402, simple_loss=0.0568, pruned_loss=0.01344, audio_tagging_loss=0.01218, over 14905.00 frames. ], tot_loss[loss=0.08523, simple_loss=0.1045, pruned_loss=0.0228, audio_tagging_loss=0.01019, over 3033762.09 frames. ], batch size: 59, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:41:44,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=761800.0, ans=0.0 2023-11-19 13:41:51,952 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=761800.0, ans=0.0 2023-11-19 13:42:04,837 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=761866.6666666666, ans=0.1 2023-11-19 13:42:21,362 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=762000.0, ans=0.125 2023-11-19 13:42:30,045 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6100, loss[loss=0.07398, simple_loss=0.08716, pruned_loss=0.02006, audio_tagging_loss=0.01034, over 14492.00 frames. ], tot_loss[loss=0.08584, simple_loss=0.1052, pruned_loss=0.02302, audio_tagging_loss=0.01023, over 3042047.24 frames. ], batch size: 53, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:42:39,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=762066.6666666666, ans=0.125 2023-11-19 13:42:46,879 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.906e+01 8.561e+01 9.384e+01 1.050e+02 1.586e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-19 13:42:47,203 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=762133.3333333334, ans=0.1 2023-11-19 13:42:53,329 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=762200.0, ans=0.0 2023-11-19 13:42:57,542 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.21 vs. limit=15.0 2023-11-19 13:43:25,977 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6150, loss[loss=0.08292, simple_loss=0.09372, pruned_loss=0.02493, audio_tagging_loss=0.01113, over 16103.00 frames. ], tot_loss[loss=0.08558, simple_loss=0.1046, pruned_loss=0.02295, audio_tagging_loss=0.01033, over 3048810.22 frames. ], batch size: 62, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:43:35,706 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=762466.6666666666, ans=0.125 2023-11-19 13:43:45,499 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=762466.6666666666, ans=0.125 2023-11-19 13:43:48,886 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=12.0 2023-11-19 13:43:50,189 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762533.3333333334, ans=0.1 2023-11-19 13:43:56,565 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762533.3333333334, ans=0.1 2023-11-19 13:44:03,544 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=762600.0, ans=0.1 2023-11-19 13:44:06,073 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=762600.0, ans=0.0 2023-11-19 13:44:12,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=762666.6666666666, ans=0.0 2023-11-19 13:44:18,081 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2023-11-19 13:44:20,119 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=762733.3333333334, ans=0.0 2023-11-19 13:44:21,445 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6200, loss[loss=0.1102, simple_loss=0.1411, pruned_loss=0.03304, audio_tagging_loss=0.006573, over 14867.00 frames. ], tot_loss[loss=0.08575, simple_loss=0.1048, pruned_loss=0.02295, audio_tagging_loss=0.01038, over 3048917.22 frames. ], batch size: 53, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:44:26,949 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=762733.3333333334, ans=0.125 2023-11-19 13:44:27,024 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=762733.3333333334, ans=0.125 2023-11-19 13:44:37,805 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.950e+01 8.474e+01 9.020e+01 9.808e+01 1.345e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 13:44:57,110 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=762933.3333333334, ans=0.2 2023-11-19 13:45:00,277 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=762933.3333333334, ans=0.125 2023-11-19 13:45:02,529 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.whiten.whitening_limit, batch_count=762933.3333333334, ans=12.0 2023-11-19 13:45:07,536 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.11 vs. limit=15.0 2023-11-19 13:45:17,048 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6250, loss[loss=0.1005, simple_loss=0.1337, pruned_loss=0.02606, audio_tagging_loss=0.007546, over 15063.00 frames. ], tot_loss[loss=0.08543, simple_loss=0.1043, pruned_loss=0.02284, audio_tagging_loss=0.01046, over 3043798.17 frames. ], batch size: 56, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:45:17,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=763066.6666666666, ans=0.125 2023-11-19 13:45:23,519 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=763066.6666666666, ans=0.1 2023-11-19 13:45:30,411 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=763133.3333333334, ans=0.125 2023-11-19 13:45:51,511 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=763266.6666666666, ans=0.1 2023-11-19 13:45:53,071 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2023-11-19 13:46:12,445 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6300, loss[loss=0.07537, simple_loss=0.09985, pruned_loss=0.01714, audio_tagging_loss=0.008301, over 15251.00 frames. ], tot_loss[loss=0.08553, simple_loss=0.1044, pruned_loss=0.02268, audio_tagging_loss=0.01063, over 3040097.33 frames. ], batch size: 59, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:46:29,493 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.022e+01 8.302e+01 8.988e+01 1.019e+02 1.261e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 13:46:53,146 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:47:05,391 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=763666.6666666666, ans=0.0 2023-11-19 13:47:08,310 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6350, loss[loss=0.107, simple_loss=0.1314, pruned_loss=0.03229, audio_tagging_loss=0.008989, over 14550.00 frames. ], tot_loss[loss=0.08582, simple_loss=0.1052, pruned_loss=0.02264, audio_tagging_loss=0.01059, over 3038804.03 frames. ], batch size: 57, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 13:47:11,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=763733.3333333334, ans=0.125 2023-11-19 13:47:30,718 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=763866.6666666666, ans=0.0 2023-11-19 13:47:32,401 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=763866.6666666666, ans=0.0 2023-11-19 13:47:36,604 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=763866.6666666666, ans=0.125 2023-11-19 13:47:38,227 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=763866.6666666666, ans=0.2 2023-11-19 13:47:42,408 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.023e-02 2023-11-19 13:48:01,117 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=764000.0, ans=0.125 2023-11-19 13:48:03,970 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6400, loss[loss=0.08035, simple_loss=0.08869, pruned_loss=0.02129, audio_tagging_loss=0.01471, over 15482.00 frames. ], tot_loss[loss=0.08496, simple_loss=0.104, pruned_loss=0.02224, audio_tagging_loss=0.0107, over 3035391.40 frames. ], batch size: 59, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 13:48:07,750 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-11-19 13:48:22,448 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 7.945e+01 8.669e+01 9.496e+01 1.172e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-19 13:48:27,083 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=764200.0, ans=0.1 2023-11-19 13:48:59,540 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6450, loss[loss=0.07066, simple_loss=0.08654, pruned_loss=0.01638, audio_tagging_loss=0.01101, over 15152.00 frames. ], tot_loss[loss=0.08483, simple_loss=0.1037, pruned_loss=0.0222, audio_tagging_loss=0.01078, over 3026046.71 frames. ], batch size: 56, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 13:49:09,838 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=764466.6666666666, ans=0.0 2023-11-19 13:49:21,773 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=764533.3333333334, ans=0.5 2023-11-19 13:49:34,485 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=764600.0, ans=0.125 2023-11-19 13:49:50,736 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=764666.6666666666, ans=0.5 2023-11-19 13:49:54,758 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6500, loss[loss=0.1122, simple_loss=0.1368, pruned_loss=0.03391, audio_tagging_loss=0.009922, over 15783.00 frames. ], tot_loss[loss=0.0853, simple_loss=0.1044, pruned_loss=0.02237, audio_tagging_loss=0.01076, over 3034694.02 frames. ], batch size: 61, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 13:50:09,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=764800.0, ans=0.125 2023-11-19 13:50:13,155 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.395e+01 9.151e+01 9.992e+01 1.336e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 13:50:28,075 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.68 vs. limit=15.0 2023-11-19 13:50:37,147 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=764933.3333333334, ans=0.2 2023-11-19 13:50:39,574 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.71 vs. limit=6.0 2023-11-19 13:50:50,152 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6550, loss[loss=0.09044, simple_loss=0.1061, pruned_loss=0.02818, audio_tagging_loss=0.00919, over 16481.00 frames. ], tot_loss[loss=0.08465, simple_loss=0.1036, pruned_loss=0.02228, audio_tagging_loss=0.01057, over 3039795.33 frames. ], batch size: 63, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 13:51:11,705 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.53 vs. limit=15.0 2023-11-19 13:51:27,932 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.02 vs. limit=15.0 2023-11-19 13:51:31,806 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=765266.6666666666, ans=0.1 2023-11-19 13:51:39,532 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:51:45,099 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6600, loss[loss=0.08778, simple_loss=0.1095, pruned_loss=0.0219, audio_tagging_loss=0.0111, over 14940.00 frames. ], tot_loss[loss=0.08445, simple_loss=0.1035, pruned_loss=0.02217, audio_tagging_loss=0.01054, over 3033787.83 frames. ], batch size: 58, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 13:51:46,296 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=765400.0, ans=0.125 2023-11-19 13:52:03,381 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.780e+01 7.948e+01 8.762e+01 9.685e+01 1.504e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-19 13:52:33,310 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=765666.6666666666, ans=0.125 2023-11-19 13:52:40,466 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6650, loss[loss=0.1023, simple_loss=0.129, pruned_loss=0.03077, audio_tagging_loss=0.007002, over 15275.00 frames. ], tot_loss[loss=0.08456, simple_loss=0.1037, pruned_loss=0.0223, audio_tagging_loss=0.0104, over 3032276.33 frames. ], batch size: 58, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 13:53:02,271 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=765866.6666666666, ans=0.125 2023-11-19 13:53:10,219 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=765866.6666666666, ans=0.0 2023-11-19 13:53:17,721 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=765933.3333333334, ans=0.1 2023-11-19 13:53:35,927 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6700, loss[loss=0.06993, simple_loss=0.0891, pruned_loss=0.0178, audio_tagging_loss=0.007579, over 14810.00 frames. ], tot_loss[loss=0.08474, simple_loss=0.1041, pruned_loss=0.02237, audio_tagging_loss=0.01031, over 3032799.31 frames. ], batch size: 57, lr: 6.91e-03, grad_scale: 16.0 2023-11-19 13:53:40,373 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=766066.6666666666, ans=0.1 2023-11-19 13:53:43,417 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=766066.6666666666, ans=0.125 2023-11-19 13:53:55,027 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.241e+01 9.083e+01 1.005e+02 1.420e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 13:54:01,017 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=766200.0, ans=0.125 2023-11-19 13:54:14,620 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.04 vs. limit=22.5 2023-11-19 13:54:22,781 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=766333.3333333334, ans=0.125 2023-11-19 13:54:31,101 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6750, loss[loss=0.09455, simple_loss=0.1245, pruned_loss=0.02367, audio_tagging_loss=0.008645, over 15731.00 frames. ], tot_loss[loss=0.0846, simple_loss=0.1037, pruned_loss=0.02239, audio_tagging_loss=0.01034, over 3030454.69 frames. ], batch size: 59, lr: 6.91e-03, grad_scale: 16.0 2023-11-19 13:54:37,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=766400.0, ans=0.125 2023-11-19 13:54:51,448 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.99 vs. limit=6.0 2023-11-19 13:55:27,976 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6800, loss[loss=0.06773, simple_loss=0.09193, pruned_loss=0.01222, audio_tagging_loss=0.009543, over 16145.00 frames. ], tot_loss[loss=0.08496, simple_loss=0.1039, pruned_loss=0.0226, audio_tagging_loss=0.0104, over 3030559.58 frames. ], batch size: 58, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 13:55:34,612 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=766733.3333333334, ans=0.125 2023-11-19 13:55:37,430 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=22.5 2023-11-19 13:55:46,472 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.136e+01 8.286e+01 8.985e+01 9.839e+01 1.456e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-19 13:55:48,956 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=766866.6666666666, ans=0.125 2023-11-19 13:55:50,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=766866.6666666666, ans=0.125 2023-11-19 13:56:19,620 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=767000.0, ans=0.125 2023-11-19 13:56:23,560 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6850, loss[loss=0.08183, simple_loss=0.1082, pruned_loss=0.01985, audio_tagging_loss=0.007899, over 16367.00 frames. ], tot_loss[loss=0.08428, simple_loss=0.1031, pruned_loss=0.02235, audio_tagging_loss=0.01039, over 3035278.65 frames. ], batch size: 61, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 13:56:33,358 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=767133.3333333334, ans=0.125 2023-11-19 13:56:40,144 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=767133.3333333334, ans=0.04949747468305833 2023-11-19 13:56:41,306 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=767133.3333333334, ans=0.2 2023-11-19 13:57:10,400 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:57:14,681 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=767333.3333333334, ans=0.0 2023-11-19 13:57:18,759 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6900, loss[loss=0.09442, simple_loss=0.1148, pruned_loss=0.02865, audio_tagging_loss=0.008378, over 14573.00 frames. ], tot_loss[loss=0.08549, simple_loss=0.105, pruned_loss=0.02278, audio_tagging_loss=0.0102, over 3037752.07 frames. ], batch size: 56, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 13:57:26,650 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.44 vs. limit=6.0 2023-11-19 13:57:29,493 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=767466.6666666666, ans=0.125 2023-11-19 13:57:34,682 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=767466.6666666666, ans=0.1 2023-11-19 13:57:37,633 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.298e+01 8.115e+01 8.697e+01 9.342e+01 1.240e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-19 13:57:44,361 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=767533.3333333334, ans=0.2 2023-11-19 13:57:51,679 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=767600.0, ans=0.125 2023-11-19 13:57:53,095 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.63 vs. limit=22.5 2023-11-19 13:58:00,771 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:58:10,607 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=767666.6666666666, ans=0.1 2023-11-19 13:58:14,577 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 6950, loss[loss=0.11, simple_loss=0.142, pruned_loss=0.03201, audio_tagging_loss=0.006965, over 14544.00 frames. ], tot_loss[loss=0.08553, simple_loss=0.1048, pruned_loss=0.02276, audio_tagging_loss=0.01037, over 3035186.45 frames. ], batch size: 52, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 13:58:16,070 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2023-11-19 13:58:18,904 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=767733.3333333334, ans=0.125 2023-11-19 13:58:27,425 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=767800.0, ans=0.0 2023-11-19 13:58:35,799 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=767866.6666666666, ans=0.2 2023-11-19 13:58:40,195 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=767866.6666666666, ans=0.125 2023-11-19 13:58:47,947 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=767933.3333333334, ans=0.125 2023-11-19 13:58:49,174 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=767933.3333333334, ans=0.125 2023-11-19 13:59:10,782 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7000, loss[loss=0.08319, simple_loss=0.1008, pruned_loss=0.02284, audio_tagging_loss=0.009938, over 15612.00 frames. ], tot_loss[loss=0.08504, simple_loss=0.1038, pruned_loss=0.02263, audio_tagging_loss=0.01051, over 3036971.17 frames. ], batch size: 59, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 13:59:22,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=768133.3333333334, ans=0.0 2023-11-19 13:59:28,004 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.14 vs. limit=6.0 2023-11-19 13:59:29,027 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.706e+01 8.254e+01 9.022e+01 1.015e+02 1.308e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 13:59:38,235 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=768200.0, ans=0.0 2023-11-19 13:59:59,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=768333.3333333334, ans=0.0 2023-11-19 14:00:01,892 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=768333.3333333334, ans=0.125 2023-11-19 14:00:04,857 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7050, loss[loss=0.07929, simple_loss=0.09245, pruned_loss=0.02159, audio_tagging_loss=0.01147, over 14055.00 frames. ], tot_loss[loss=0.08459, simple_loss=0.1033, pruned_loss=0.02247, audio_tagging_loss=0.01044, over 3034903.99 frames. ], batch size: 55, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:00:25,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=768466.6666666666, ans=0.1 2023-11-19 14:00:59,350 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=768733.3333333334, ans=0.0 2023-11-19 14:01:00,280 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7100, loss[loss=0.05962, simple_loss=0.05804, pruned_loss=0.01231, audio_tagging_loss=0.01829, over 15543.00 frames. ], tot_loss[loss=0.08571, simple_loss=0.1044, pruned_loss=0.0229, audio_tagging_loss=0.01059, over 3039865.32 frames. ], batch size: 62, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:01:00,556 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=768733.3333333334, ans=0.125 2023-11-19 14:01:01,780 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.51 vs. limit=15.0 2023-11-19 14:01:19,248 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.436e+01 9.096e+01 9.960e+01 1.200e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-19 14:01:21,605 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=768866.6666666666, ans=0.125 2023-11-19 14:01:28,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=768866.6666666666, ans=0.125 2023-11-19 14:01:56,314 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7150, loss[loss=0.08495, simple_loss=0.1121, pruned_loss=0.01719, audio_tagging_loss=0.01172, over 14943.00 frames. ], tot_loss[loss=0.08602, simple_loss=0.1045, pruned_loss=0.02303, audio_tagging_loss=0.01074, over 3039258.02 frames. ], batch size: 54, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:01:58,686 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=769066.6666666666, ans=0.1 2023-11-19 14:01:58,839 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=6.76 vs. limit=15.0 2023-11-19 14:02:11,308 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.37 vs. limit=15.0 2023-11-19 14:02:11,821 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=769133.3333333334, ans=0.2 2023-11-19 14:02:52,145 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7200, loss[loss=0.06665, simple_loss=0.08087, pruned_loss=0.01313, audio_tagging_loss=0.01309, over 15652.00 frames. ], tot_loss[loss=0.08518, simple_loss=0.1034, pruned_loss=0.02269, audio_tagging_loss=0.01078, over 3046403.08 frames. ], batch size: 60, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:02:53,374 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=769400.0, ans=0.1 2023-11-19 14:03:00,371 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=769400.0, ans=0.05 2023-11-19 14:03:06,562 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2023-11-19 14:03:08,405 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=769466.6666666666, ans=0.125 2023-11-19 14:03:11,329 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.379e+01 9.176e+01 1.015e+02 1.604e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 14:03:18,002 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:03:25,232 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.01 vs. limit=22.5 2023-11-19 14:03:29,601 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=769600.0, ans=0.125 2023-11-19 14:03:31,760 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=769600.0, ans=0.125 2023-11-19 14:03:48,453 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7250, loss[loss=0.1003, simple_loss=0.1192, pruned_loss=0.02987, audio_tagging_loss=0.01087, over 15353.00 frames. ], tot_loss[loss=0.08504, simple_loss=0.1033, pruned_loss=0.02252, audio_tagging_loss=0.01085, over 3046550.57 frames. ], batch size: 57, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:03:52,791 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:04:08,301 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=769800.0, ans=0.125 2023-11-19 14:04:30,985 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=769933.3333333334, ans=0.125 2023-11-19 14:04:35,293 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=770000.0, ans=0.0 2023-11-19 14:04:43,486 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7300, loss[loss=0.07089, simple_loss=0.08942, pruned_loss=0.01485, audio_tagging_loss=0.01133, over 15357.00 frames. ], tot_loss[loss=0.08465, simple_loss=0.1033, pruned_loss=0.02228, audio_tagging_loss=0.0107, over 3039866.09 frames. ], batch size: 57, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:04:43,617 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=770066.6666666666, ans=0.0 2023-11-19 14:04:50,496 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=770066.6666666666, ans=0.125 2023-11-19 14:04:50,585 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=770066.6666666666, ans=0.125 2023-11-19 14:05:02,642 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.433e+01 8.339e+01 9.048e+01 1.014e+02 1.411e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 14:05:21,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=770266.6666666666, ans=0.125 2023-11-19 14:05:31,012 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=770333.3333333334, ans=0.95 2023-11-19 14:05:39,783 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7350, loss[loss=0.07299, simple_loss=0.0907, pruned_loss=0.01606, audio_tagging_loss=0.01158, over 14846.00 frames. ], tot_loss[loss=0.08518, simple_loss=0.1042, pruned_loss=0.02253, audio_tagging_loss=0.01053, over 3045788.53 frames. ], batch size: 56, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:05:44,161 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=770400.0, ans=0.125 2023-11-19 14:05:53,868 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=770466.6666666666, ans=0.125 2023-11-19 14:06:03,450 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:06:06,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=770533.3333333334, ans=0.2 2023-11-19 14:06:19,881 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=770600.0, ans=0.125 2023-11-19 14:06:27,925 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=770666.6666666666, ans=0.125 2023-11-19 14:06:29,509 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=770666.6666666666, ans=0.125 2023-11-19 14:06:36,272 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7400, loss[loss=0.07566, simple_loss=0.09183, pruned_loss=0.0166, audio_tagging_loss=0.01315, over 15708.00 frames. ], tot_loss[loss=0.0857, simple_loss=0.105, pruned_loss=0.02273, audio_tagging_loss=0.01045, over 3046032.06 frames. ], batch size: 59, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:06:51,415 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.64 vs. limit=15.0 2023-11-19 14:06:54,606 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.661e+01 9.537e+01 1.042e+02 1.641e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-19 14:06:57,978 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=770866.6666666666, ans=0.125 2023-11-19 14:07:04,331 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=770866.6666666666, ans=0.125 2023-11-19 14:07:10,199 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=770933.3333333334, ans=0.125 2023-11-19 14:07:31,157 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7450, loss[loss=0.06042, simple_loss=0.07457, pruned_loss=0.01347, audio_tagging_loss=0.009661, over 15012.00 frames. ], tot_loss[loss=0.08567, simple_loss=0.1048, pruned_loss=0.02285, audio_tagging_loss=0.01041, over 3049740.05 frames. ], batch size: 60, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:07:45,147 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.00 vs. limit=15.0 2023-11-19 14:08:12,622 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=771266.6666666666, ans=0.5 2023-11-19 14:08:26,766 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7500, loss[loss=0.0724, simple_loss=0.08867, pruned_loss=0.01908, audio_tagging_loss=0.00899, over 15006.00 frames. ], tot_loss[loss=0.08586, simple_loss=0.1054, pruned_loss=0.02288, audio_tagging_loss=0.01026, over 3052010.27 frames. ], batch size: 58, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:08:46,252 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 8.223e+01 8.899e+01 9.673e+01 1.181e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 14:08:48,521 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=771533.3333333334, ans=0.0 2023-11-19 14:08:49,575 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:08:50,877 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.50 vs. limit=10.0 2023-11-19 14:08:50,987 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.02 vs. limit=15.0 2023-11-19 14:09:01,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=771600.0, ans=0.125 2023-11-19 14:09:23,261 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7550, loss[loss=0.08974, simple_loss=0.1108, pruned_loss=0.0254, audio_tagging_loss=0.008932, over 13601.00 frames. ], tot_loss[loss=0.08509, simple_loss=0.1044, pruned_loss=0.02262, audio_tagging_loss=0.01028, over 3045947.31 frames. ], batch size: 53, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:09:27,807 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=771733.3333333334, ans=0.125 2023-11-19 14:09:34,051 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=771800.0, ans=0.125 2023-11-19 14:09:50,468 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=771866.6666666666, ans=0.125 2023-11-19 14:09:51,862 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.60 vs. limit=22.5 2023-11-19 14:10:00,396 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=771933.3333333334, ans=0.125 2023-11-19 14:10:06,148 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=772000.0, ans=0.0 2023-11-19 14:10:16,923 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=772066.6666666666, ans=0.125 2023-11-19 14:10:17,844 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7600, loss[loss=0.07356, simple_loss=0.08984, pruned_loss=0.01724, audio_tagging_loss=0.0114, over 15446.00 frames. ], tot_loss[loss=0.08494, simple_loss=0.1042, pruned_loss=0.02254, audio_tagging_loss=0.01028, over 3047324.19 frames. ], batch size: 58, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:10:22,691 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=772066.6666666666, ans=0.1 2023-11-19 14:10:36,184 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.972e+01 8.387e+01 9.154e+01 1.016e+02 1.447e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 14:10:44,945 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=772200.0, ans=0.125 2023-11-19 14:10:46,597 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=772200.0, ans=0.1 2023-11-19 14:11:00,707 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten.whitening_limit, batch_count=772266.6666666666, ans=15.0 2023-11-19 14:11:09,372 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=772333.3333333334, ans=0.0 2023-11-19 14:11:13,459 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7650, loss[loss=0.0754, simple_loss=0.08726, pruned_loss=0.02016, audio_tagging_loss=0.01161, over 14854.00 frames. ], tot_loss[loss=0.08538, simple_loss=0.1051, pruned_loss=0.02263, audio_tagging_loss=0.01018, over 3047578.75 frames. ], batch size: 58, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 14:11:36,365 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=772533.3333333334, ans=0.1 2023-11-19 14:11:39,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=772533.3333333334, ans=0.125 2023-11-19 14:12:01,728 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.05 vs. limit=6.0 2023-11-19 14:12:08,893 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7700, loss[loss=0.08891, simple_loss=0.1157, pruned_loss=0.02314, audio_tagging_loss=0.007903, over 15410.00 frames. ], tot_loss[loss=0.08523, simple_loss=0.1051, pruned_loss=0.02255, audio_tagging_loss=0.01015, over 3048812.45 frames. ], batch size: 56, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 14:12:12,542 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.46 vs. limit=15.0 2023-11-19 14:12:14,304 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=772733.3333333334, ans=0.125 2023-11-19 14:12:15,472 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=772733.3333333334, ans=0.0 2023-11-19 14:12:29,297 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.117e+01 8.040e+01 8.605e+01 9.398e+01 1.279e+02, threshold=1.721e+02, percent-clipped=0.0 2023-11-19 14:12:30,645 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=772866.6666666666, ans=0.125 2023-11-19 14:12:57,041 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=773000.0, ans=0.125 2023-11-19 14:13:00,285 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=773000.0, ans=0.125 2023-11-19 14:13:04,321 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7750, loss[loss=0.09124, simple_loss=0.1085, pruned_loss=0.02732, audio_tagging_loss=0.009664, over 14621.00 frames. ], tot_loss[loss=0.0848, simple_loss=0.1045, pruned_loss=0.02238, audio_tagging_loss=0.01015, over 3045574.91 frames. ], batch size: 54, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 14:13:37,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=773266.6666666666, ans=0.125 2023-11-19 14:13:38,294 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=773266.6666666666, ans=0.125 2023-11-19 14:13:54,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=773333.3333333334, ans=0.2 2023-11-19 14:14:01,956 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7800, loss[loss=0.07047, simple_loss=0.07758, pruned_loss=0.01716, audio_tagging_loss=0.01452, over 14597.00 frames. ], tot_loss[loss=0.0848, simple_loss=0.1042, pruned_loss=0.02247, audio_tagging_loss=0.01022, over 3042441.01 frames. ], batch size: 56, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 14:14:13,033 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.57 vs. limit=10.0 2023-11-19 14:14:23,111 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.774e+01 8.739e+01 9.425e+01 1.048e+02 2.167e+02, threshold=1.885e+02, percent-clipped=1.0 2023-11-19 14:14:34,201 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.67 vs. limit=15.0 2023-11-19 14:14:35,031 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=773600.0, ans=0.0 2023-11-19 14:14:38,747 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=773600.0, ans=0.0 2023-11-19 14:14:41,825 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=773600.0, ans=0.0 2023-11-19 14:14:45,270 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=773666.6666666666, ans=15.0 2023-11-19 14:14:46,216 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=773666.6666666666, ans=0.2 2023-11-19 14:14:57,124 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7850, loss[loss=0.0646, simple_loss=0.07556, pruned_loss=0.01383, audio_tagging_loss=0.01299, over 14771.00 frames. ], tot_loss[loss=0.08398, simple_loss=0.103, pruned_loss=0.02215, audio_tagging_loss=0.01035, over 3036859.26 frames. ], batch size: 56, lr: 6.88e-03, grad_scale: 8.0 2023-11-19 14:14:59,564 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=773733.3333333334, ans=0.2 2023-11-19 14:15:06,788 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=773733.3333333334, ans=0.0 2023-11-19 14:15:09,157 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.06 vs. limit=6.0 2023-11-19 14:15:17,229 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=773800.0, ans=0.125 2023-11-19 14:15:40,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=774000.0, ans=0.07 2023-11-19 14:15:53,098 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7900, loss[loss=0.08974, simple_loss=0.1178, pruned_loss=0.0215, audio_tagging_loss=0.009327, over 14614.00 frames. ], tot_loss[loss=0.08443, simple_loss=0.1032, pruned_loss=0.02239, audio_tagging_loss=0.01046, over 3034427.28 frames. ], batch size: 57, lr: 6.88e-03, grad_scale: 8.0 2023-11-19 14:15:59,680 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=774066.6666666666, ans=0.07 2023-11-19 14:16:05,309 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=774133.3333333334, ans=0.0 2023-11-19 14:16:07,376 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=774133.3333333334, ans=0.0 2023-11-19 14:16:13,449 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.211e+01 8.938e+01 9.745e+01 1.285e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-19 14:16:16,898 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=774200.0, ans=0.125 2023-11-19 14:16:28,327 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=774266.6666666666, ans=0.125 2023-11-19 14:16:40,679 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.62 vs. limit=15.0 2023-11-19 14:16:43,020 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.45 vs. limit=22.5 2023-11-19 14:16:48,186 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 7950, loss[loss=0.07566, simple_loss=0.08712, pruned_loss=0.02032, audio_tagging_loss=0.01178, over 15719.00 frames. ], tot_loss[loss=0.08545, simple_loss=0.1042, pruned_loss=0.02278, audio_tagging_loss=0.01055, over 3040864.41 frames. ], batch size: 60, lr: 6.88e-03, grad_scale: 8.0 2023-11-19 14:16:49,532 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=774400.0, ans=0.1 2023-11-19 14:16:59,897 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 14:17:00,036 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=774466.6666666666, ans=0.125 2023-11-19 14:17:02,130 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=774466.6666666666, ans=0.0 2023-11-19 14:17:24,063 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=774600.0, ans=0.125 2023-11-19 14:17:26,651 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=774600.0, ans=0.125 2023-11-19 14:17:36,560 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=774666.6666666666, ans=0.0 2023-11-19 14:17:41,880 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=774666.6666666666, ans=0.2 2023-11-19 14:17:43,714 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8000, loss[loss=0.07191, simple_loss=0.08189, pruned_loss=0.02003, audio_tagging_loss=0.01093, over 14797.00 frames. ], tot_loss[loss=0.08568, simple_loss=0.1046, pruned_loss=0.02279, audio_tagging_loss=0.01059, over 3035445.31 frames. ], batch size: 56, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 14:17:43,983 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=774733.3333333334, ans=0.1 2023-11-19 14:17:48,013 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.85 vs. limit=10.0 2023-11-19 14:17:48,048 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.23 vs. limit=22.5 2023-11-19 14:17:51,403 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=774733.3333333334, ans=0.2 2023-11-19 14:18:05,287 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.487e+01 8.348e+01 9.114e+01 9.901e+01 1.524e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-19 14:18:22,028 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=774933.3333333334, ans=0.025 2023-11-19 14:18:23,038 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=774933.3333333334, ans=0.125 2023-11-19 14:18:24,005 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=774933.3333333334, ans=0.0 2023-11-19 14:18:28,913 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=775000.0, ans=0.125 2023-11-19 14:18:31,058 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=775000.0, ans=0.125 2023-11-19 14:18:39,865 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8050, loss[loss=0.09305, simple_loss=0.1283, pruned_loss=0.0205, audio_tagging_loss=0.008421, over 15746.00 frames. ], tot_loss[loss=0.08587, simple_loss=0.1051, pruned_loss=0.02275, audio_tagging_loss=0.01057, over 3041305.44 frames. ], batch size: 55, lr: 6.87e-03, grad_scale: 16.0 2023-11-19 14:18:40,045 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=775066.6666666666, ans=0.0 2023-11-19 14:18:52,895 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=775133.3333333334, ans=0.1 2023-11-19 14:19:06,810 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.00 vs. limit=15.0 2023-11-19 14:19:20,313 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=775266.6666666666, ans=0.1 2023-11-19 14:19:22,486 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=775266.6666666666, ans=0.2 2023-11-19 14:19:27,242 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=775333.3333333334, ans=0.2 2023-11-19 14:19:35,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=775400.0, ans=0.125 2023-11-19 14:19:36,026 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8100, loss[loss=0.06778, simple_loss=0.08153, pruned_loss=0.01541, audio_tagging_loss=0.01161, over 14914.00 frames. ], tot_loss[loss=0.0852, simple_loss=0.1045, pruned_loss=0.02244, audio_tagging_loss=0.01053, over 3046358.09 frames. ], batch size: 57, lr: 6.87e-03, grad_scale: 16.0 2023-11-19 14:19:36,163 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=775400.0, ans=0.2 2023-11-19 14:19:56,602 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.298e+01 8.945e+01 9.680e+01 1.266e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-19 14:19:57,641 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=775533.3333333334, ans=0.05 2023-11-19 14:20:13,233 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.97 vs. limit=12.0 2023-11-19 14:20:19,819 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=775666.6666666666, ans=0.1 2023-11-19 14:20:27,198 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=775666.6666666666, ans=0.125 2023-11-19 14:20:31,253 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8150, loss[loss=0.06169, simple_loss=0.07014, pruned_loss=0.01716, audio_tagging_loss=0.009458, over 16876.00 frames. ], tot_loss[loss=0.0861, simple_loss=0.1059, pruned_loss=0.02274, audio_tagging_loss=0.01041, over 3051041.07 frames. ], batch size: 66, lr: 6.87e-03, grad_scale: 8.0 2023-11-19 14:20:33,666 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=775733.3333333334, ans=0.1 2023-11-19 14:20:40,056 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=775733.3333333334, ans=0.2 2023-11-19 14:20:45,577 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.35 vs. limit=15.0 2023-11-19 14:21:02,176 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=775866.6666666666, ans=0.125 2023-11-19 14:21:05,497 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=775933.3333333334, ans=0.2 2023-11-19 14:21:13,295 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.62 vs. limit=12.0 2023-11-19 14:21:15,022 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=776000.0, ans=0.1 2023-11-19 14:21:26,704 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 14:21:27,678 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8200, loss[loss=0.07711, simple_loss=0.0896, pruned_loss=0.0214, audio_tagging_loss=0.01091, over 15189.00 frames. ], tot_loss[loss=0.08666, simple_loss=0.107, pruned_loss=0.02289, audio_tagging_loss=0.01026, over 3065716.95 frames. ], batch size: 58, lr: 6.87e-03, grad_scale: 8.0 2023-11-19 14:21:46,971 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=776133.3333333334, ans=0.125 2023-11-19 14:21:49,752 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.188e+01 8.524e+01 9.249e+01 1.061e+02 1.477e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-19 14:21:51,438 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.08 vs. limit=10.0 2023-11-19 14:21:55,241 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=776200.0, ans=0.2 2023-11-19 14:21:59,835 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=776266.6666666666, ans=0.0 2023-11-19 14:22:23,482 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8250, loss[loss=0.07044, simple_loss=0.09677, pruned_loss=0.0153, audio_tagging_loss=0.006765, over 15724.00 frames. ], tot_loss[loss=0.086, simple_loss=0.1061, pruned_loss=0.02263, audio_tagging_loss=0.01031, over 3063227.85 frames. ], batch size: 61, lr: 6.87e-03, grad_scale: 8.0 2023-11-19 14:22:29,081 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=776400.0, ans=0.1 2023-11-19 14:22:35,324 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=776466.6666666666, ans=0.0 2023-11-19 14:22:53,284 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=776533.3333333334, ans=0.125 2023-11-19 14:23:00,997 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.21 vs. limit=15.0 2023-11-19 14:23:18,348 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8300, loss[loss=0.0946, simple_loss=0.1297, pruned_loss=0.02479, audio_tagging_loss=0.004988, over 15855.00 frames. ], tot_loss[loss=0.08628, simple_loss=0.1064, pruned_loss=0.02282, audio_tagging_loss=0.01026, over 3062040.56 frames. ], batch size: 60, lr: 6.87e-03, grad_scale: 8.0 2023-11-19 14:23:25,428 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=776733.3333333334, ans=0.125 2023-11-19 14:23:40,362 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.122e+01 8.080e+01 8.886e+01 9.811e+01 1.458e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-19 14:23:42,654 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=776866.6666666666, ans=0.04949747468305833 2023-11-19 14:23:46,577 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=776866.6666666666, ans=0.1 2023-11-19 14:23:50,206 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=776866.6666666666, ans=0.0 2023-11-19 14:24:03,885 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=777000.0, ans=0.0 2023-11-19 14:24:14,327 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8350, loss[loss=0.08859, simple_loss=0.11, pruned_loss=0.02641, audio_tagging_loss=0.007187, over 15355.00 frames. ], tot_loss[loss=0.08601, simple_loss=0.106, pruned_loss=0.0228, audio_tagging_loss=0.0102, over 3056388.70 frames. ], batch size: 56, lr: 6.86e-03, grad_scale: 8.0 2023-11-19 14:24:22,483 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.14 vs. limit=22.5 2023-11-19 14:24:47,172 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=777266.6666666666, ans=0.125 2023-11-19 14:24:55,550 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=777266.6666666666, ans=0.125 2023-11-19 14:25:06,436 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=777333.3333333334, ans=0.0 2023-11-19 14:25:09,774 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8400, loss[loss=0.06884, simple_loss=0.07724, pruned_loss=0.01757, audio_tagging_loss=0.01265, over 14250.00 frames. ], tot_loss[loss=0.08528, simple_loss=0.1053, pruned_loss=0.02256, audio_tagging_loss=0.01007, over 3061673.27 frames. ], batch size: 55, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 14:25:20,495 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.39 vs. limit=6.0 2023-11-19 14:25:23,275 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=777466.6666666666, ans=0.2 2023-11-19 14:25:32,117 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.696e+01 8.445e+01 9.359e+01 1.034e+02 1.708e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 14:25:33,446 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=777533.3333333334, ans=0.5 2023-11-19 14:25:58,384 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=777666.6666666666, ans=0.1 2023-11-19 14:26:00,822 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.91 vs. limit=15.0 2023-11-19 14:26:05,412 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8450, loss[loss=0.08571, simple_loss=0.1043, pruned_loss=0.02293, audio_tagging_loss=0.01062, over 16302.00 frames. ], tot_loss[loss=0.08508, simple_loss=0.1047, pruned_loss=0.02257, audio_tagging_loss=0.01014, over 3057837.95 frames. ], batch size: 59, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 14:26:05,695 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=777733.3333333334, ans=0.125 2023-11-19 14:26:30,009 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=777866.6666666666, ans=0.125 2023-11-19 14:26:36,852 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=777866.6666666666, ans=0.125 2023-11-19 14:26:50,934 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=778000.0, ans=0.0 2023-11-19 14:26:59,477 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=778000.0, ans=0.125 2023-11-19 14:27:01,304 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8500, loss[loss=0.0641, simple_loss=0.07061, pruned_loss=0.01393, audio_tagging_loss=0.01486, over 15793.00 frames. ], tot_loss[loss=0.08528, simple_loss=0.105, pruned_loss=0.02257, audio_tagging_loss=0.01021, over 3053329.19 frames. ], batch size: 60, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 14:27:02,452 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=778066.6666666666, ans=0.2 2023-11-19 14:27:14,095 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=778133.3333333334, ans=0.125 2023-11-19 14:27:16,533 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=778133.3333333334, ans=0.125 2023-11-19 14:27:24,284 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.947e+01 8.482e+01 9.526e+01 1.059e+02 1.313e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-19 14:27:24,819 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2023-11-19 14:27:32,965 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=778266.6666666666, ans=0.5 2023-11-19 14:27:56,058 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8550, loss[loss=0.0873, simple_loss=0.1021, pruned_loss=0.02591, audio_tagging_loss=0.01032, over 14989.00 frames. ], tot_loss[loss=0.08592, simple_loss=0.106, pruned_loss=0.02278, audio_tagging_loss=0.01013, over 3052338.37 frames. ], batch size: 56, lr: 6.86e-03, grad_scale: 8.0 2023-11-19 14:28:00,070 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=778400.0, ans=0.125 2023-11-19 14:28:04,121 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=778400.0, ans=0.0 2023-11-19 14:28:04,188 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:28:18,558 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=778533.3333333334, ans=0.125 2023-11-19 14:28:26,491 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.16 vs. limit=22.5 2023-11-19 14:28:34,574 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=778600.0, ans=0.05 2023-11-19 14:28:35,842 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.07 vs. limit=12.0 2023-11-19 14:28:39,111 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.04 vs. limit=6.0 2023-11-19 14:28:50,417 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2023-11-19 14:28:52,522 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8600, loss[loss=0.08134, simple_loss=0.099, pruned_loss=0.02256, audio_tagging_loss=0.00928, over 14572.00 frames. ], tot_loss[loss=0.08594, simple_loss=0.1059, pruned_loss=0.02287, audio_tagging_loss=0.01014, over 3046635.45 frames. ], batch size: 55, lr: 6.86e-03, grad_scale: 8.0 2023-11-19 14:28:53,112 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.97 vs. limit=22.5 2023-11-19 14:29:15,651 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 5.831e+01 8.133e+01 8.812e+01 9.832e+01 1.513e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-19 14:29:27,996 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=778933.3333333334, ans=0.125 2023-11-19 14:29:47,748 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8650, loss[loss=0.1102, simple_loss=0.1362, pruned_loss=0.03462, audio_tagging_loss=0.007481, over 14836.00 frames. ], tot_loss[loss=0.08629, simple_loss=0.1061, pruned_loss=0.02296, audio_tagging_loss=0.01026, over 3044386.02 frames. ], batch size: 55, lr: 6.86e-03, grad_scale: 8.0 2023-11-19 14:29:52,732 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=779066.6666666666, ans=0.0 2023-11-19 14:30:08,255 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.36 vs. limit=12.0 2023-11-19 14:30:35,520 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=779333.3333333334, ans=0.2 2023-11-19 14:30:36,536 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:30:43,880 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8700, loss[loss=0.08886, simple_loss=0.09901, pruned_loss=0.02515, audio_tagging_loss=0.0142, over 14107.00 frames. ], tot_loss[loss=0.08613, simple_loss=0.1055, pruned_loss=0.02297, audio_tagging_loss=0.01039, over 3049042.61 frames. ], batch size: 53, lr: 6.85e-03, grad_scale: 8.0 2023-11-19 14:30:53,355 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=779400.0, ans=0.0 2023-11-19 14:30:59,631 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=779466.6666666666, ans=0.1 2023-11-19 14:31:07,642 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.174e+01 9.057e+01 1.021e+02 2.200e+02, threshold=1.811e+02, percent-clipped=1.0 2023-11-19 14:31:31,726 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:31:35,778 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=779666.6666666666, ans=0.1 2023-11-19 14:31:39,901 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8750, loss[loss=0.08135, simple_loss=0.09301, pruned_loss=0.02487, audio_tagging_loss=0.009972, over 15265.00 frames. ], tot_loss[loss=0.08636, simple_loss=0.1059, pruned_loss=0.02301, audio_tagging_loss=0.01042, over 3055820.85 frames. ], batch size: 60, lr: 6.85e-03, grad_scale: 8.0 2023-11-19 14:31:52,937 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=779800.0, ans=0.125 2023-11-19 14:32:11,345 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=779866.6666666666, ans=0.1 2023-11-19 14:32:35,724 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.63 vs. limit=15.0 2023-11-19 14:32:36,128 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8800, loss[loss=0.06585, simple_loss=0.07556, pruned_loss=0.01384, audio_tagging_loss=0.01423, over 14331.00 frames. ], tot_loss[loss=0.08631, simple_loss=0.1056, pruned_loss=0.02297, audio_tagging_loss=0.01054, over 3051300.18 frames. ], batch size: 55, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:32:48,867 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2023-11-19 14:32:53,915 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=780133.3333333334, ans=0.125 2023-11-19 14:32:59,411 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.658e+01 9.293e+01 1.017e+02 2.957e+02, threshold=1.859e+02, percent-clipped=2.0 2023-11-19 14:33:02,109 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.39 vs. limit=15.0 2023-11-19 14:33:06,523 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.04 vs. limit=15.0 2023-11-19 14:33:22,759 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.02 vs. limit=22.5 2023-11-19 14:33:31,585 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8850, loss[loss=0.09691, simple_loss=0.122, pruned_loss=0.02618, audio_tagging_loss=0.009752, over 15325.00 frames. ], tot_loss[loss=0.08631, simple_loss=0.1054, pruned_loss=0.02297, audio_tagging_loss=0.01064, over 3054440.44 frames. ], batch size: 57, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:33:40,505 WARNING [train_asr.py:1319] (2/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 14:34:25,945 INFO [scaling.py:1118] (2/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:34:26,784 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8900, loss[loss=0.08582, simple_loss=0.1066, pruned_loss=0.02426, audio_tagging_loss=0.008235, over 15817.00 frames. ], tot_loss[loss=0.08623, simple_loss=0.1059, pruned_loss=0.02294, audio_tagging_loss=0.01036, over 3053725.72 frames. ], batch size: 59, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:34:28,064 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=780733.3333333334, ans=0.1 2023-11-19 14:34:48,676 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=780866.6666666666, ans=0.2 2023-11-19 14:34:50,415 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 6.845e+01 8.412e+01 9.119e+01 1.005e+02 1.451e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 14:34:56,027 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=780866.6666666666, ans=0.125 2023-11-19 14:34:57,105 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=780866.6666666666, ans=0.0 2023-11-19 14:34:58,089 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=780866.6666666666, ans=0.125 2023-11-19 14:34:59,779 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=780933.3333333334, ans=0.125 2023-11-19 14:35:00,162 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.26 vs. limit=15.0 2023-11-19 14:35:22,293 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 8950, loss[loss=0.09721, simple_loss=0.1193, pruned_loss=0.02753, audio_tagging_loss=0.01005, over 14303.00 frames. ], tot_loss[loss=0.0864, simple_loss=0.1064, pruned_loss=0.02297, audio_tagging_loss=0.01024, over 3051003.10 frames. ], batch size: 54, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:35:25,741 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=781066.6666666666, ans=0.0 2023-11-19 14:35:35,684 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=781133.3333333334, ans=0.125 2023-11-19 14:35:51,989 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=781200.0, ans=0.125 2023-11-19 14:36:14,854 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.87 vs. limit=22.5 2023-11-19 14:36:18,252 INFO [train_asr.py:1115] (2/4) Epoch 10, batch 9000, loss[loss=0.07218, simple_loss=0.08841, pruned_loss=0.01831, audio_tagging_loss=0.009672, over 15879.00 frames. ], tot_loss[loss=0.08657, simple_loss=0.1064, pruned_loss=0.02313, audio_tagging_loss=0.01026, over 3044065.62 frames. ], batch size: 59, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:36:18,253 INFO [train_asr.py:1138] (2/4) Computing validation loss 2023-11-19 14:36:58,301 INFO [train_asr.py:1147] (2/4) Epoch 10, validation: loss=0.06535, simple_loss=0.05527, pruned_loss=0.006386, audio_tagging_loss=0.03133, over 4681554.00 frames. 2023-11-19 14:36:58,302 INFO [train_asr.py:1148] (2/4) Maximum memory allocated so far is 25771MB 2023-11-19 14:37:02,136 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.56 vs. limit=12.0 2023-11-19 14:37:06,303 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=781400.0, ans=0.1 2023-11-19 14:37:09,100 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=781466.6666666666, ans=0.125 2023-11-19 14:37:25,084 INFO [scaling.py:1022] (2/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.17 vs. limit=22.5 2023-11-19 14:37:29,708 INFO [optim.py:476] (2/4) Clipping_scale=2.0, grad-norm quartiles 7.012e+01 8.251e+01 8.814e+01 9.719e+01 1.451e+02, threshold=1.763e+02, percent-clipped=0.0 2023-11-19 14:37:37,951 INFO [scaling.py:213] (2/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=781533.3333333334, ans=0.125