2023-11-18 01:38:34,664 INFO [train_asr.py:1183] (0/4) Training started 2023-11-18 01:38:34,668 INFO [train_asr.py:1193] (0/4) Device: cuda:0 2023-11-18 01:38:34,671 INFO [train_asr.py:1205] (0/4) {'best_train_loss': inf, 'best_valid_loss': inf, 'best_train_epoch': -1, 'best_valid_epoch': -1, 'batch_idx_train': 0, 'log_interval': 50, 'reset_interval': 200, 'valid_interval': 3000, 'feature_dim': 80, 'subsampling_factor': 4, 'warm_step': 2000, 'env_info': {'k2-version': '1.24.3', 'k2-build-type': 'Release', 'k2-with-cuda': True, 'k2-git-sha1': '2b2ac14b326d61d79d04e53fbd69b1ff6d630411', 'k2-git-date': 'Thu Aug 24 05:58:26 2023', 'lhotse-version': '1.16.0', 'torch-version': '2.0.1+cu117', 'torch-cuda-available': True, 'torch-cuda-version': '11.7', 'python-version': '3.1', 'icefall-git-branch': 'multi_KD', 'icefall-git-sha1': '025f11fd-dirty', 'icefall-git-date': 'Fri Nov 17 16:19:07 2023', 'icefall-path': '/star-xy/softwares/icefall_development/icefall_multi_KD', 'k2-path': '/star-xy/softwares/k2_development/k2/k2/python/k2/__init__.py', 'lhotse-path': '/star-xy/softwares/anaconda3/envs/multi_KD/lib/python3.10/site-packages/lhotse/__init__.py', 'hostname': 'de-74279-k2-train-10-1113160712-78bc8d8bd8-pw6cd', 'IP address': '10.177.94.17'}, 'world_size': 4, 'master_port': 13454, 'tensorboard': True, 'num_epochs': 50, 'start_epoch': 1, 'start_batch': 0, 'exp_dir': PosixPath('multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0'), 'bpe_model': 'data/lang_bpe_500/bpe.model', 'base_lr': 0.045, 'lr_batches': 7500, 'lr_epochs': 3.5, 'ref_duration': 600, 'context_size': 2, 'prune_range': 5, 'lm_scale': 0.25, 'am_scale': 0.0, 'simple_loss_scale': 0.5, 'ctc_loss_scale': 0.2, 'audio_tagging_loss_scale': 1.0, 'seed': 42, 'print_diagnostics': False, 'inf_check': False, 'save_every_n': 4000, 'keep_last_k': 30, 'average_period': 200, 'use_fp16': True, 'num_encoder_layers': '2,2,3,4,3,2', 'downsampling_factor': '1,2,4,8,4,2', 'feedforward_dim': '512,768,1024,1536,1024,768', 'num_heads': '4,4,4,8,4,4', 'encoder_dim': '192,256,384,512,384,256', 'query_head_dim': '32', 'value_head_dim': '12', 'pos_head_dim': '4', 'pos_dim': 48, 'encoder_unmasked_dim': '192,192,256,256,256,192', 'cnn_module_kernel': '31,31,15,15,15,31', 'decoder_dim': 512, 'joiner_dim': 512, 'causal': False, 'chunk_size': '16,32,64,-1', 'left_context_frames': '64,128,256,-1', 'use_transducer': True, 'use_ctc': False, 'do_audio_tagging': True, 'full_libri': True, 'mini_libri': False, 'use_vox2': False, 'use_libriheavy': False, 'libriheavy_subset': 'small', 'use_audioset': True, 'audioset_subset': 'unbalanced', 'manifest_dir': PosixPath('data/fbank'), 'max_duration': 1000, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': False, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'shuffle': True, 'drop_last': True, 'return_cuts': True, 'num_workers': 2, 'enable_spec_aug': True, 'spec_aug_time_warp_factor': 80, 'enable_musan': True, 'enable_audioset': False, 'use_musan_separately': False, 'input_strategy': 'PrecomputedFeatures', 'drop_features': False, 'return_audio': False, 'use_beats': True, 'use_ecapa': True, 'use_whisper': True, 'whisper_mvq': False, 'beats_ckpt': 'data/models/BEATs/BEATs_iter3_plus_AS2M_finetuned_on_AS2M_cpt2.pt', 'whisper_version': 'small.en', 'blank_id': 0, 'vocab_size': 500} 2023-11-18 01:38:34,671 INFO [train_asr.py:1207] (0/4) About to create model 2023-11-18 01:38:35,408 INFO [train_asr.py:1211] (0/4) Number of model parameters: 65819362 2023-11-18 01:38:38,795 INFO [train_asr.py:1227] (0/4) Using DDP 2023-11-18 01:38:39,398 INFO [train_asr.py:1271] (0/4) Getting audioset cuts 2023-11-18 01:38:39,398 INFO [kd_datamodule.py:796] (0/4) About to get the audioset cuts. 2023-11-18 01:38:39,462 INFO [train_asr.py:1277] (0/4) Using mux to combine Librispeech with audioset 2023-11-18 01:38:39,462 INFO [train_asr.py:1287] (0/4) CutSet(len=2748469) [underlying data type: ] 2023-11-18 01:38:48,650 INFO [kd_datamodule.py:396] (0/4) Enable MUSAN 2023-11-18 01:38:48,651 INFO [kd_datamodule.py:397] (0/4) About to get Musan cuts 2023-11-18 01:38:51,454 INFO [kd_datamodule.py:427] (0/4) Enable SpecAugment 2023-11-18 01:38:51,454 INFO [kd_datamodule.py:428] (0/4) Time warp factor: 80 2023-11-18 01:38:51,454 INFO [kd_datamodule.py:438] (0/4) Num frame mask: 10 2023-11-18 01:38:51,454 INFO [kd_datamodule.py:451] (0/4) About to create train dataset 2023-11-18 01:38:51,456 INFO [kd_datamodule.py:487] (0/4) Using SimpleCutSampler 2023-11-18 01:38:51,456 INFO [kd_datamodule.py:495] (0/4) About to create train dataloader 2023-11-18 01:38:51,460 INFO [kd_datamodule.py:814] (0/4) About to get the audioset eval cuts. 2023-11-18 01:38:51,462 INFO [kd_datamodule.py:529] (0/4) About to create dev dataset 2023-11-18 01:38:51,931 INFO [kd_datamodule.py:550] (0/4) About to create dev dataloader 2023-11-18 01:39:26,877 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 0, loss[loss=3.609, simple_loss=2.093, pruned_loss=2.052, audio_tagging_loss=1.311, over 15808.00 frames. ], tot_loss[loss=3.609, simple_loss=2.093, pruned_loss=2.052, audio_tagging_loss=1.311, over 15808.00 frames. ], batch size: 60, lr: 2.25e-02, grad_scale: 2.0 2023-11-18 01:39:26,879 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 01:39:43,429 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.6555, 2.9092, 2.8354, 2.9362, 2.7137, 2.9193, 2.6206, 2.8750], device='cuda:0') 2023-11-18 01:39:51,683 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.3.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.0907, 4.1124, 4.0767, 4.1115, 4.0950, 4.1141, 4.0602, 4.1070], device='cuda:0') 2023-11-18 01:40:00,453 INFO [train_asr.py:1147] (0/4) Epoch 1, validation: loss=2.927, simple_loss=1.349, pruned_loss=1.339, audio_tagging_loss=1.444, over 4681554.00 frames. 2023-11-18 01:40:00,454 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 01:40:10,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.25 vs. limit=7.5 2023-11-18 01:40:19,492 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=319.27 vs. limit=7.525 2023-11-18 01:40:23,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=33.23 vs. limit=7.55 2023-11-18 01:40:27,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=231.39 vs. limit=5.066666666666666 2023-11-18 01:40:28,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=482.65 vs. limit=5.066666666666666 2023-11-18 01:40:28,155 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=220.00 vs. limit=7.55 2023-11-18 01:40:32,062 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=369.61 vs. limit=7.55 2023-11-18 01:40:40,972 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=473.38 vs. limit=7.55 2023-11-18 01:40:48,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=161.86 vs. limit=7.575 2023-11-18 01:40:49,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=200.0, ans=0.09875 2023-11-18 01:40:56,907 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=60.44 vs. limit=5.133333333333334 2023-11-18 01:41:00,920 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=32.44 vs. limit=7.7 2023-11-18 01:41:04,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=266.6666666666667, ans=0.19 2023-11-18 01:41:05,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=266.6666666666667, ans=0.094 2023-11-18 01:41:07,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=251.31 vs. limit=7.6 2023-11-18 01:41:07,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=183.55 vs. limit=7.6 2023-11-18 01:41:09,575 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 50, loss[loss=0.635, simple_loss=0.5311, pruned_loss=0.5866, audio_tagging_loss=0.03852, over 15214.00 frames. ], tot_loss[loss=1.284, simple_loss=0.9362, pruned_loss=0.8024, audio_tagging_loss=0.2657, over 689070.14 frames. ], batch size: 56, lr: 2.48e-02, grad_scale: 1.0 2023-11-18 01:41:10,042 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.44 vs. limit=3.05 2023-11-18 01:41:13,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=333.3333333333333, ans=0.484375 2023-11-18 01:41:16,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=333.3333333333333, ans=0.0925 2023-11-18 01:41:18,543 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=346.63 vs. limit=7.625 2023-11-18 01:41:29,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=400.0, ans=3.06 2023-11-18 01:41:31,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=134.42 vs. limit=7.65 2023-11-18 01:41:32,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=400.0, ans=0.091 2023-11-18 01:41:40,228 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=348.49 vs. limit=7.675 2023-11-18 01:41:52,144 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=318.28 vs. limit=7.7 2023-11-18 01:41:52,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=533.3333333333334, ans=3.08 2023-11-18 01:41:54,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=332.22 vs. limit=7.7 2023-11-18 01:42:01,447 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=350.87 vs. limit=7.9 2023-11-18 01:42:11,252 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=15.69 vs. limit=4.24 2023-11-18 01:42:16,175 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=154.09 vs. limit=7.95 2023-11-18 01:42:18,223 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 100, loss[loss=0.3848, simple_loss=0.3025, pruned_loss=0.363, audio_tagging_loss=0.03707, over 14478.00 frames. ], tot_loss[loss=0.8344, simple_loss=0.6272, pruned_loss=0.6015, audio_tagging_loss=0.1416, over 1203851.67 frames. ], batch size: 54, lr: 2.70e-02, grad_scale: 2.0 2023-11-18 01:42:19,525 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 4.039e+01 1.213e+02 5.684e+02 1.606e+03 1.428e+04, threshold=1.137e+03, percent-clipped=0.0 2023-11-18 01:42:25,392 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=180.53 vs. limit=7.75 2023-11-18 01:42:26,975 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.59 vs. limit=8.0 2023-11-18 01:42:29,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=666.6666666666666, ans=0.29333333333333333 2023-11-18 01:42:33,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=733.3333333333334, ans=0.036666666666666674 2023-11-18 01:42:37,235 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=109.29 vs. limit=7.775 2023-11-18 01:42:43,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=175.34 vs. limit=7.8 2023-11-18 01:42:45,149 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=321.11 vs. limit=7.8 2023-11-18 01:42:48,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=800.0, ans=7.8 2023-11-18 01:42:52,997 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=38.47 vs. limit=8.1 2023-11-18 01:42:53,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=800.0, ans=0.4625 2023-11-18 01:43:04,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=93.80 vs. limit=4.173333333333334 2023-11-18 01:43:17,469 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=4.373333333333333 2023-11-18 01:43:24,587 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 150, loss[loss=0.3997, simple_loss=0.3209, pruned_loss=0.3978, audio_tagging_loss=0.02408, over 15786.00 frames. ], tot_loss[loss=0.6675, simple_loss=0.5121, pruned_loss=0.5326, audio_tagging_loss=0.09317, over 1622580.56 frames. ], batch size: 58, lr: 2.93e-02, grad_scale: 2.0 2023-11-18 01:43:33,173 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=47.34 vs. limit=8.25 2023-11-18 01:43:33,229 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=296.31 vs. limit=7.875 2023-11-18 01:43:37,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=1066.6666666666667, ans=0.45 2023-11-18 01:43:44,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=1066.6666666666667, ans=0.21600000000000003 2023-11-18 01:43:49,795 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=147.76 vs. limit=7.9 2023-11-18 01:43:53,743 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=95.74 vs. limit=7.925 2023-11-18 01:43:55,096 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=286.56 vs. limit=7.925 2023-11-18 01:43:58,958 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=247.83 vs. limit=7.925 2023-11-18 01:43:59,027 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=275.09 vs. limit=7.925 2023-11-18 01:44:11,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=140.57 vs. limit=7.95 2023-11-18 01:44:12,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=59.23 vs. limit=5.6 2023-11-18 01:44:14,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=1200.0, ans=0.44375 2023-11-18 01:44:20,279 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=406.65 vs. limit=7.975 2023-11-18 01:44:32,275 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 200, loss[loss=0.5109, simple_loss=0.4186, pruned_loss=0.5023, audio_tagging_loss=0.01771, over 15663.00 frames. ], tot_loss[loss=0.5716, simple_loss=0.443, pruned_loss=0.4808, audio_tagging_loss=0.06876, over 1941332.75 frames. ], batch size: 57, lr: 3.15e-02, grad_scale: 4.0 2023-11-18 01:44:33,546 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.394e+01 4.484e+01 5.110e+01 6.274e+01 1.485e+02, threshold=1.022e+02, percent-clipped=0.0 2023-11-18 01:44:37,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=44.02 vs. limit=4.266666666666667 2023-11-18 01:44:45,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=96.38 vs. limit=8.0 2023-11-18 01:44:46,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=207.43 vs. limit=8.55 2023-11-18 01:44:47,729 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=280.94 vs. limit=8.025 2023-11-18 01:44:49,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=46.18 vs. limit=8.025 2023-11-18 01:44:52,714 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=208.87 vs. limit=8.025 2023-11-18 01:45:00,871 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=205.25 vs. limit=8.05 2023-11-18 01:45:00,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=1466.6666666666667, ans=5.366666666666667 2023-11-18 01:45:05,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=267.08 vs. limit=8.6 2023-11-18 01:45:11,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=122.80 vs. limit=8.6 2023-11-18 01:45:13,836 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=196.38 vs. limit=8.075 2023-11-18 01:45:17,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=1533.3333333333333, ans=5.958333333333333 2023-11-18 01:45:22,619 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=10.99 vs. limit=5.383333333333333 2023-11-18 01:45:23,906 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=72.20 vs. limit=5.766666666666667 2023-11-18 01:45:30,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=141.16 vs. limit=8.1 2023-11-18 01:45:35,274 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=12.63 vs. limit=5.4 2023-11-18 01:45:40,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=90.03 vs. limit=8.125 2023-11-18 01:45:40,370 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.88 vs. limit=4.666666666666667 2023-11-18 01:45:41,048 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 250, loss[loss=0.2932, simple_loss=0.228, pruned_loss=0.2656, audio_tagging_loss=0.02298, over 14917.00 frames. ], tot_loss[loss=0.5169, simple_loss=0.404, pruned_loss=0.4482, audio_tagging_loss=0.0537, over 2187851.37 frames. ], batch size: 56, lr: 3.38e-02, grad_scale: 4.0 2023-11-18 01:45:41,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=85.63 vs. limit=8.125 2023-11-18 01:45:42,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=1666.6666666666667, ans=0.1375 2023-11-18 01:45:51,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=61.25 vs. limit=8.125 2023-11-18 01:46:01,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.39 vs. limit=4.693333333333333 2023-11-18 01:46:14,779 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=54.24 vs. limit=8.85 2023-11-18 01:46:22,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=1866.6666666666667, ans=0.04416666666666667 2023-11-18 01:46:23,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=1866.6666666666667, ans=0.23133333333333334 2023-11-18 01:46:25,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=89.68 vs. limit=8.9 2023-11-18 01:46:31,175 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.96 vs. limit=8.9 2023-11-18 01:46:33,353 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.00 vs. limit=4.773333333333333 2023-11-18 01:46:37,150 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.59 vs. limit=4.773333333333333 2023-11-18 01:46:43,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=1933.3333333333333, ans=0.409375 2023-11-18 01:46:46,670 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 300, loss[loss=0.3567, simple_loss=0.2832, pruned_loss=0.3314, audio_tagging_loss=0.01669, over 16019.00 frames. ], tot_loss[loss=0.4751, simple_loss=0.3726, pruned_loss=0.4185, audio_tagging_loss=0.04392, over 2381561.11 frames. ], batch size: 59, lr: 3.60e-02, grad_scale: 8.0 2023-11-18 01:46:47,186 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=58.28 vs. limit=8.25 2023-11-18 01:46:47,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=32.65 vs. limit=6.0 2023-11-18 01:46:47,928 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.566e+01 4.754e+01 5.461e+01 6.771e+01 2.069e+02, threshold=1.092e+02, percent-clipped=3.0 2023-11-18 01:46:48,783 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=43.88 vs. limit=8.25 2023-11-18 01:46:49,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=2000.0, ans=0.125 2023-11-18 01:46:49,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=56.03 vs. limit=8.25 2023-11-18 01:47:01,124 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=141.27 vs. limit=8.275 2023-11-18 01:47:05,337 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=5.52 vs. limit=4.413333333333333 2023-11-18 01:47:19,140 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=35.32 vs. limit=9.1 2023-11-18 01:47:19,320 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.58 vs. limit=5.533333333333333 2023-11-18 01:47:22,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=2133.3333333333335, ans=0.052 2023-11-18 01:47:25,569 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=150.43 vs. limit=9.15 2023-11-18 01:47:27,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2200.0, ans=0.08625000000000001 2023-11-18 01:47:32,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=2200.0, ans=0.050499999999999996 2023-11-18 01:47:36,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=2200.0, ans=0.396875 2023-11-18 01:47:37,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=55.77 vs. limit=8.35 2023-11-18 01:47:37,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=90.11 vs. limit=8.35 2023-11-18 01:47:38,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=44.91 vs. limit=8.35 2023-11-18 01:47:44,454 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=61.96 vs. limit=6.133333333333333 2023-11-18 01:47:51,176 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 350, loss[loss=0.3973, simple_loss=0.3115, pruned_loss=0.3573, audio_tagging_loss=0.02099, over 15425.00 frames. ], tot_loss[loss=0.4495, simple_loss=0.353, pruned_loss=0.3998, audio_tagging_loss=0.03713, over 2530488.25 frames. ], batch size: 58, lr: 3.83e-02, grad_scale: 8.0 2023-11-18 01:47:51,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=2333.3333333333335, ans=0.035 2023-11-18 01:47:53,027 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=49.60 vs. limit=9.25 2023-11-18 01:48:13,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=2400.0, ans=0.085 2023-11-18 01:48:13,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=2400.0, ans=0.046 2023-11-18 01:48:15,941 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=43.23 vs. limit=6.2 2023-11-18 01:48:17,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=32.05 vs. limit=9.35 2023-11-18 01:48:18,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=93.86 vs. limit=8.425 2023-11-18 01:48:34,703 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=47.29 vs. limit=8.45 2023-11-18 01:48:42,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=121.21 vs. limit=8.475 2023-11-18 01:48:53,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=2600.0, ans=0.08375 2023-11-18 01:48:53,846 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.61 vs. limit=5.65 2023-11-18 01:48:57,487 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 400, loss[loss=0.3184, simple_loss=0.2429, pruned_loss=0.2792, audio_tagging_loss=0.02158, over 14090.00 frames. ], tot_loss[loss=0.4271, simple_loss=0.3346, pruned_loss=0.381, audio_tagging_loss=0.03246, over 2644033.35 frames. ], batch size: 54, lr: 4.05e-02, grad_scale: 16.0 2023-11-18 01:48:58,707 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.943e+01 5.270e+01 6.183e+01 8.354e+01 3.927e+02, threshold=1.237e+02, percent-clipped=8.0 2023-11-18 01:49:04,474 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=49.62 vs. limit=8.5 2023-11-18 01:49:05,479 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=54.46 vs. limit=9.5 2023-11-18 01:49:10,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=2733.3333333333335, ans=0.7773333333333333 2023-11-18 01:49:12,079 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=9.55 2023-11-18 01:49:19,557 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.62 vs. limit=5.683333333333334 2023-11-18 01:49:32,557 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.55 vs. limit=9.6 2023-11-18 01:49:39,949 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=3.43 2023-11-18 01:49:41,719 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=33.58 vs. limit=9.65 2023-11-18 01:49:58,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=2933.3333333333335, ans=6.833333333333334 2023-11-18 01:50:00,775 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 450, loss[loss=0.4696, simple_loss=0.367, pruned_loss=0.4289, audio_tagging_loss=0.01404, over 15224.00 frames. ], tot_loss[loss=0.4135, simple_loss=0.3232, pruned_loss=0.3687, audio_tagging_loss=0.02884, over 2733339.10 frames. ], batch size: 56, lr: 4.28e-02, grad_scale: 16.0 2023-11-18 01:50:06,422 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.87 vs. limit=8.625 2023-11-18 01:50:08,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=3000.0, ans=0.03249999999999999 2023-11-18 01:50:12,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=3066.6666666666665, ans=0.031 2023-11-18 01:50:15,230 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=24.79 vs. limit=8.65 2023-11-18 01:50:26,081 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=35.68 vs. limit=9.85 2023-11-18 01:50:26,223 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.02 vs. limit=5.783333333333333 2023-11-18 01:50:27,346 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=42.78 vs. limit=9.85 2023-11-18 01:50:28,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3133.3333333333335, ans=0.26866666666666666 2023-11-18 01:50:40,403 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=6.50 vs. limit=5.28 2023-11-18 01:50:41,495 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=61.33 vs. limit=8.7 2023-11-18 01:50:43,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=3200.0, ans=0.027999999999999997 2023-11-18 01:50:49,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=3200.0, ans=5.8 2023-11-18 01:50:54,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3266.6666666666665, ans=0.2673333333333333 2023-11-18 01:50:56,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=3266.6666666666665, ans=0.249 2023-11-18 01:51:02,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=3266.6666666666665, ans=8.725 2023-11-18 01:51:05,766 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 500, loss[loss=0.3519, simple_loss=0.2704, pruned_loss=0.3066, audio_tagging_loss=0.01553, over 15712.00 frames. ], tot_loss[loss=0.399, simple_loss=0.3105, pruned_loss=0.3541, audio_tagging_loss=0.02621, over 2796759.65 frames. ], batch size: 58, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:51:06,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=3333.3333333333335, ans=7.083333333333334 2023-11-18 01:51:06,950 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 3.968e+01 4.922e+01 5.274e+01 6.306e+01 1.338e+02, threshold=1.055e+02, percent-clipped=1.0 2023-11-18 01:51:07,522 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=53.84 vs. limit=8.75 2023-11-18 01:51:10,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=77.01 vs. limit=8.75 2023-11-18 01:51:11,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=72.83 vs. limit=8.75 2023-11-18 01:51:14,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=24.88 vs. limit=8.75 2023-11-18 01:51:21,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=3400.0, ans=0.07250000000000001 2023-11-18 01:51:38,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=3466.6666666666665, ans=0.2653333333333333 2023-11-18 01:51:44,044 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.02 vs. limit=8.825 2023-11-18 01:51:51,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.93 vs. limit=6.766666666666667 2023-11-18 01:51:55,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=3600.0, ans=0.07750000000000001 2023-11-18 01:52:05,428 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=31.19 vs. limit=8.85 2023-11-18 01:52:06,726 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=25.63 vs. limit=8.85 2023-11-18 01:52:09,207 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 550, loss[loss=0.4159, simple_loss=0.3197, pruned_loss=0.3522, audio_tagging_loss=0.01761, over 14422.00 frames. ], tot_loss[loss=0.3932, simple_loss=0.3048, pruned_loss=0.3465, audio_tagging_loss=0.02408, over 2856647.67 frames. ], batch size: 55, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:52:10,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=3666.6666666666665, ans=0.328125 2023-11-18 01:52:10,954 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=42.57 vs. limit=8.875 2023-11-18 01:52:13,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=3666.6666666666665, ans=0.2633333333333333 2023-11-18 01:52:13,656 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.29 vs. limit=10.25 2023-11-18 01:52:13,692 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.19 vs. limit=10.25 2023-11-18 01:52:29,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=3733.3333333333335, ans=5.933333333333334 2023-11-18 01:52:30,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.83 vs. limit=5.933333333333334 2023-11-18 01:52:32,125 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=20.60 vs. limit=8.9 2023-11-18 01:52:33,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=3800.0, ans=0.057499999999999996 2023-11-18 01:52:34,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.94 vs. limit=8.925 2023-11-18 01:52:37,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=3800.0, ans=0.057499999999999996 2023-11-18 01:52:50,402 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.49 vs. limit=8.95 2023-11-18 01:52:53,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=48.68 vs. limit=8.95 2023-11-18 01:53:03,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=3933.3333333333335, ans=0.315625 2023-11-18 01:53:12,524 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 600, loss[loss=0.2837, simple_loss=0.2135, pruned_loss=0.2305, audio_tagging_loss=0.01687, over 14419.00 frames. ], tot_loss[loss=0.3852, simple_loss=0.2971, pruned_loss=0.3358, audio_tagging_loss=0.02268, over 2900603.16 frames. ], batch size: 56, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:53:13,652 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 4.102e+01 5.824e+01 6.784e+01 8.267e+01 3.333e+02, threshold=1.357e+02, percent-clipped=4.0 2023-11-18 01:53:15,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=4000.0, ans=0.009999999999999995 2023-11-18 01:53:15,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=4000.0, ans=0.04999999999999999 2023-11-18 01:53:16,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=4000.0, ans=0.3125 2023-11-18 01:53:21,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=4000.0, ans=0.3125 2023-11-18 01:53:29,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.04 vs. limit=10.55 2023-11-18 01:53:30,582 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=10.55 2023-11-18 01:53:46,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=17.71 vs. limit=9.05 2023-11-18 01:53:50,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=4200.0, ans=0.303125 2023-11-18 01:53:50,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=4200.0, ans=0.258 2023-11-18 01:53:51,556 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=29.51 vs. limit=9.075 2023-11-18 01:53:52,838 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.70 vs. limit=10.65 2023-11-18 01:53:56,398 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=43.07 vs. limit=9.075 2023-11-18 01:53:56,588 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.49 vs. limit=10.65 2023-11-18 01:54:08,318 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=9.1 2023-11-18 01:54:14,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=4266.666666666667, ans=0.3 2023-11-18 01:54:16,779 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 650, loss[loss=0.4885, simple_loss=0.3742, pruned_loss=0.4025, audio_tagging_loss=0.01709, over 15801.00 frames. ], tot_loss[loss=0.3837, simple_loss=0.2947, pruned_loss=0.3302, audio_tagging_loss=0.02152, over 2932676.47 frames. ], batch size: 56, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:54:18,591 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=30.45 vs. limit=9.125 2023-11-18 01:54:20,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=4333.333333333333, ans=0.04861111111111111 2023-11-18 01:54:24,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=4333.333333333333, ans=0.04861111111111111 2023-11-18 01:54:32,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=4400.0, ans=0.29375 2023-11-18 01:54:35,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=4400.0, ans=0.29375 2023-11-18 01:54:56,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.98 vs. limit=10.9 2023-11-18 01:54:59,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=4533.333333333333, ans=0.04777777777777778 2023-11-18 01:55:04,094 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.32 vs. limit=7.266666666666667 2023-11-18 01:55:07,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=4600.0, ans=0.284375 2023-11-18 01:55:11,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=4600.0, ans=0.254 2023-11-18 01:55:14,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=4600.0, ans=0.284375 2023-11-18 01:55:19,014 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 700, loss[loss=0.3245, simple_loss=0.2436, pruned_loss=0.2506, audio_tagging_loss=0.01955, over 15356.00 frames. ], tot_loss[loss=0.3792, simple_loss=0.29, pruned_loss=0.3216, audio_tagging_loss=0.02071, over 2961533.83 frames. ], batch size: 57, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:55:20,172 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.747e+01 8.200e+01 9.584e+01 1.192e+02 3.813e+02, threshold=1.917e+02, percent-clipped=10.0 2023-11-18 01:55:20,708 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=33.98 vs. limit=9.25 2023-11-18 01:55:24,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=4666.666666666667, ans=0.009855072463768115 2023-11-18 01:55:38,261 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=9.275 2023-11-18 01:55:41,621 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.27 vs. limit=9.275 2023-11-18 01:55:52,636 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.406e+00 2023-11-18 01:55:53,016 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.72 vs. limit=9.3 2023-11-18 01:55:55,301 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.25 vs. limit=9.3 2023-11-18 01:55:57,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=4866.666666666667, ans=0.271875 2023-11-18 01:56:00,112 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=42.26 vs. limit=9.325 2023-11-18 01:56:00,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=19.43 vs. limit=9.325 2023-11-18 01:56:01,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.50 vs. limit=9.325 2023-11-18 01:56:02,773 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.53 vs. limit=9.325 2023-11-18 01:56:07,192 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.06 vs. limit=5.946666666666667 2023-11-18 01:56:13,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.97 vs. limit=11.2 2023-11-18 01:56:14,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=4933.333333333333, ans=0.0 2023-11-18 01:56:18,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=4933.333333333333, ans=0.25066666666666665 2023-11-18 01:56:20,975 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=19.99 vs. limit=11.25 2023-11-18 01:56:21,538 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 750, loss[loss=0.3564, simple_loss=0.2656, pruned_loss=0.2777, audio_tagging_loss=0.0191, over 15074.00 frames. ], tot_loss[loss=0.3799, simple_loss=0.2895, pruned_loss=0.3171, audio_tagging_loss=0.02004, over 2985381.22 frames. ], batch size: 56, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:56:28,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=5000.0, ans=0.04583333333333334 2023-11-18 01:56:32,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=5000.0, ans=0.265625 2023-11-18 01:56:35,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.74 vs. limit=9.4 2023-11-18 01:56:49,884 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=7.50 vs. limit=6.053333333333333 2023-11-18 01:56:59,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=5200.0, ans=0.25625 2023-11-18 01:56:59,861 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=21.77 vs. limit=9.45 2023-11-18 01:57:05,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=5200.0, ans=0.03375 2023-11-18 01:57:09,186 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=7.47 vs. limit=6.08 2023-11-18 01:57:10,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=5266.666666666667, ans=0.0 2023-11-18 01:57:21,536 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.82 vs. limit=11.45 2023-11-18 01:57:22,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=5266.666666666667, ans=0.24733333333333332 2023-11-18 01:57:22,657 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=22.11 vs. limit=9.475 2023-11-18 01:57:24,948 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 800, loss[loss=0.3812, simple_loss=0.2897, pruned_loss=0.2936, audio_tagging_loss=0.01434, over 15026.00 frames. ], tot_loss[loss=0.3746, simple_loss=0.2847, pruned_loss=0.3073, audio_tagging_loss=0.01942, over 3000671.04 frames. ], batch size: 58, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:57:27,267 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.496e+01 8.780e+01 1.132e+02 1.440e+02 3.329e+02, threshold=2.265e+02, percent-clipped=7.0 2023-11-18 01:57:27,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.46 vs. limit=6.333333333333333 2023-11-18 01:57:29,168 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.34 vs. limit=11.5 2023-11-18 01:57:38,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=5400.0, ans=0.281 2023-11-18 01:57:48,343 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=9.55 2023-11-18 01:57:49,714 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.68 vs. limit=9.55 2023-11-18 01:57:53,178 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.64 vs. limit=9.55 2023-11-18 01:57:58,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=5466.666666666667, ans=0.7086666666666667 2023-11-18 01:57:59,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=5533.333333333333, ans=9.575 2023-11-18 01:58:25,501 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 850, loss[loss=0.3753, simple_loss=0.2838, pruned_loss=0.276, audio_tagging_loss=0.01861, over 15162.00 frames. ], tot_loss[loss=0.369, simple_loss=0.28, pruned_loss=0.2966, audio_tagging_loss=0.01915, over 3010903.03 frames. ], batch size: 58, lr: 4.49e-02, grad_scale: 16.0 2023-11-18 01:58:33,621 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=15.90 vs. limit=9.625 2023-11-18 01:58:36,236 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.41 vs. limit=11.75 2023-11-18 01:58:45,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.32 vs. limit=9.65 2023-11-18 01:58:50,197 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.05 vs. limit=9.675 2023-11-18 01:58:55,509 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=17.61 vs. limit=9.675 2023-11-18 01:59:03,515 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.53 vs. limit=11.9 2023-11-18 01:59:05,948 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=11.9 2023-11-18 01:59:07,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=5866.666666666667, ans=0.009594202898550725 2023-11-18 01:59:08,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=5866.666666666667, ans=0.22499999999999998 2023-11-18 01:59:15,144 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.28 vs. limit=9.725 2023-11-18 01:59:19,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=15.61 vs. limit=9.725 2023-11-18 01:59:23,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=5933.333333333333, ans=0.6923333333333334 2023-11-18 01:59:26,566 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 900, loss[loss=0.3676, simple_loss=0.285, pruned_loss=0.259, audio_tagging_loss=0.01584, over 13843.00 frames. ], tot_loss[loss=0.3595, simple_loss=0.2729, pruned_loss=0.2821, audio_tagging_loss=0.01877, over 3016392.70 frames. ], batch size: 53, lr: 4.48e-02, grad_scale: 16.0 2023-11-18 01:59:28,875 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.579e+01 7.921e+01 9.785e+01 1.252e+02 2.736e+02, threshold=1.957e+02, percent-clipped=4.0 2023-11-18 01:59:43,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=6066.666666666667, ans=0.04138888888888889 2023-11-18 01:59:48,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=17.12 vs. limit=12.05 2023-11-18 02:00:09,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=6200.0, ans=0.04083333333333333 2023-11-18 02:00:23,095 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.21 vs. limit=8.133333333333333 2023-11-18 02:00:26,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=7.15 vs. limit=6.506666666666667 2023-11-18 02:00:28,095 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 950, loss[loss=0.3536, simple_loss=0.2811, pruned_loss=0.2387, audio_tagging_loss=0.01332, over 15522.00 frames. ], tot_loss[loss=0.352, simple_loss=0.2685, pruned_loss=0.2692, audio_tagging_loss=0.01813, over 3027043.24 frames. ], batch size: 57, lr: 4.48e-02, grad_scale: 8.0 2023-11-18 02:00:40,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=6400.0, ans=0.236 2023-11-18 02:00:47,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=6400.0, ans=6.6 2023-11-18 02:00:48,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=6400.0, ans=0.009478260869565217 2023-11-18 02:00:52,167 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.72 vs. limit=6.616666666666667 2023-11-18 02:00:53,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=6466.666666666667, ans=0.23533333333333334 2023-11-18 02:01:05,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=6533.333333333333, ans=0.19374999999999998 2023-11-18 02:01:17,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.15 vs. limit=9.975 2023-11-18 02:01:17,377 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=9.975 2023-11-18 02:01:27,320 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1000, loss[loss=0.2987, simple_loss=0.238, pruned_loss=0.19, audio_tagging_loss=0.01563, over 14785.00 frames. ], tot_loss[loss=0.3434, simple_loss=0.2637, pruned_loss=0.2554, audio_tagging_loss=0.0174, over 3030578.00 frames. ], batch size: 54, lr: 4.48e-02, grad_scale: 8.0 2023-11-18 02:01:27,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=6666.666666666667, ans=0.0 2023-11-18 02:01:30,668 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.563e+01 9.019e+01 1.486e+02 2.475e+02 7.919e+02, threshold=2.973e+02, percent-clipped=36.0 2023-11-18 02:01:41,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=6733.333333333333, ans=0.23266666666666666 2023-11-18 02:01:42,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.37 vs. limit=10.025 2023-11-18 02:01:43,331 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.24 vs. limit=6.693333333333333 2023-11-18 02:01:45,547 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.20 vs. limit=12.55 2023-11-18 02:01:53,964 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:01:57,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=6800.0, ans=0.18125000000000002 2023-11-18 02:02:00,480 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.84 vs. limit=6.7 2023-11-18 02:02:02,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.15 vs. limit=10.075 2023-11-18 02:02:08,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=6866.666666666667, ans=0.0093768115942029 2023-11-18 02:02:14,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=6933.333333333333, ans=0.175 2023-11-18 02:02:26,185 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1050, loss[loss=0.2723, simple_loss=0.2138, pruned_loss=0.1683, audio_tagging_loss=0.01821, over 16266.00 frames. ], tot_loss[loss=0.3307, simple_loss=0.2555, pruned_loss=0.2391, audio_tagging_loss=0.01694, over 3036768.47 frames. ], batch size: 61, lr: 4.48e-02, grad_scale: 8.0 2023-11-18 02:02:33,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=12.81 vs. limit=12.75 2023-11-18 02:02:37,409 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.07 vs. limit=8.5 2023-11-18 02:02:39,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=7066.666666666667, ans=0.037222222222222226 2023-11-18 02:02:46,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=7066.666666666667, ans=0.16875 2023-11-18 02:02:57,453 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.76 vs. limit=10.175 2023-11-18 02:03:02,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=9.93 vs. limit=10.2 2023-11-18 02:03:04,244 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.70 vs. limit=12.9 2023-11-18 02:03:07,518 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=10.2 2023-11-18 02:03:11,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=7200.0, ans=0.009304347826086957 2023-11-18 02:03:20,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=7266.666666666667, ans=0.159375 2023-11-18 02:03:25,430 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1100, loss[loss=0.2117, simple_loss=0.1635, pruned_loss=0.1236, audio_tagging_loss=0.01963, over 15073.00 frames. ], tot_loss[loss=0.3216, simple_loss=0.2503, pruned_loss=0.2258, audio_tagging_loss=0.01655, over 3040709.74 frames. ], batch size: 57, lr: 4.48e-02, grad_scale: 8.0 2023-11-18 02:03:25,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=7333.333333333333, ans=0.00927536231884058 2023-11-18 02:03:26,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=7333.333333333333, ans=0.15625 2023-11-18 02:03:28,770 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.535e+01 1.098e+02 1.778e+02 2.963e+02 6.822e+02, threshold=3.557e+02, percent-clipped=25.0 2023-11-18 02:03:28,841 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:03:38,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=7400.0, ans=0.226 2023-11-18 02:03:43,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=7400.0, ans=0.641 2023-11-18 02:03:50,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=7466.666666666667, ans=0.035555555555555556 2023-11-18 02:03:52,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=7466.666666666667, ans=0.15000000000000002 2023-11-18 02:03:54,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=7466.666666666667, ans=0.22533333333333333 2023-11-18 02:03:59,471 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.53 vs. limit=13.15 2023-11-18 02:04:01,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=7533.333333333333, ans=0.03527777777777778 2023-11-18 02:04:07,729 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=6.90 vs. limit=7.013333333333334 2023-11-18 02:04:12,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=10.35 2023-11-18 02:04:13,251 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.29 vs. limit=8.8 2023-11-18 02:04:21,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=7666.666666666667, ans=0.140625 2023-11-18 02:04:22,641 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1150, loss[loss=0.2748, simple_loss=0.2211, pruned_loss=0.1646, audio_tagging_loss=0.0156, over 15274.00 frames. ], tot_loss[loss=0.3119, simple_loss=0.2445, pruned_loss=0.2129, audio_tagging_loss=0.01626, over 3040412.79 frames. ], batch size: 57, lr: 4.47e-02, grad_scale: 8.0 2023-11-18 02:04:22,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=7666.666666666667, ans=0.034722222222222224 2023-11-18 02:04:33,089 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=10.4 2023-11-18 02:04:53,713 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.89 vs. limit=10.425 2023-11-18 02:05:09,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=7933.333333333333, ans=0.6223333333333334 2023-11-18 02:05:11,403 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=13.45 2023-11-18 02:05:14,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=7933.333333333333, ans=0.128125 2023-11-18 02:05:20,093 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1200, loss[loss=0.3035, simple_loss=0.2509, pruned_loss=0.1709, audio_tagging_loss=0.01842, over 16223.00 frames. ], tot_loss[loss=0.3034, simple_loss=0.2398, pruned_loss=0.2011, audio_tagging_loss=0.01605, over 3048793.04 frames. ], batch size: 58, lr: 4.47e-02, grad_scale: 16.0 2023-11-18 02:05:23,354 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.165e+01 1.072e+02 1.842e+02 2.807e+02 8.662e+02, threshold=3.683e+02, percent-clipped=14.0 2023-11-18 02:05:37,055 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.33 vs. limit=7.226666666666667 2023-11-18 02:06:08,241 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.98 vs. limit=9.133333333333333 2023-11-18 02:06:14,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.15 vs. limit=7.066666666666666 2023-11-18 02:06:17,038 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1250, loss[loss=0.2967, simple_loss=0.2466, pruned_loss=0.1714, audio_tagging_loss=0.01356, over 15790.00 frames. ], tot_loss[loss=0.2927, simple_loss=0.2327, pruned_loss=0.1889, audio_tagging_loss=0.016, over 3045555.50 frames. ], batch size: 55, lr: 4.47e-02, grad_scale: 16.0 2023-11-18 02:06:19,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=8333.333333333334, ans=0.03194444444444444 2023-11-18 02:06:26,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=8333.333333333334, ans=0.125 2023-11-18 02:06:26,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=8400.0, ans=0.125 2023-11-18 02:06:32,657 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.97 vs. limit=10.65 2023-11-18 02:06:35,056 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=4.08 vs. limit=4.26 2023-11-18 02:06:40,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.01 vs. limit=13.85 2023-11-18 02:06:44,886 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=13.68 vs. limit=13.85 2023-11-18 02:07:00,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=8533.333333333334, ans=0.6013333333333334 2023-11-18 02:07:05,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=8600.0, ans=0.125 2023-11-18 02:07:09,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=8600.0, ans=0.125 2023-11-18 02:07:13,971 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1300, loss[loss=0.2615, simple_loss=0.2197, pruned_loss=0.1412, audio_tagging_loss=0.01654, over 15027.00 frames. ], tot_loss[loss=0.2808, simple_loss=0.2246, pruned_loss=0.1766, audio_tagging_loss=0.01595, over 3047010.80 frames. ], batch size: 56, lr: 4.47e-02, grad_scale: 16.0 2023-11-18 02:07:17,220 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.700e+01 1.001e+02 1.539e+02 2.707e+02 8.460e+02, threshold=3.079e+02, percent-clipped=10.0 2023-11-18 02:07:29,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=8733.333333333334, ans=10.0 2023-11-18 02:07:34,489 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=1.270e-01 2023-11-18 02:07:37,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=8800.0, ans=0.008956521739130436 2023-11-18 02:07:41,370 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.51 vs. limit=10.8 2023-11-18 02:07:44,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.94 vs. limit=10.8 2023-11-18 02:07:46,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=8800.0, ans=0.212 2023-11-18 02:07:47,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=8866.666666666666, ans=0.04949747468305833 2023-11-18 02:08:00,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=8933.333333333334, ans=0.125 2023-11-18 02:08:10,218 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1350, loss[loss=0.1888, simple_loss=0.1518, pruned_loss=0.1006, audio_tagging_loss=0.01704, over 14327.00 frames. ], tot_loss[loss=0.2707, simple_loss=0.2175, pruned_loss=0.1663, audio_tagging_loss=0.01588, over 3046924.60 frames. ], batch size: 54, lr: 4.46e-02, grad_scale: 16.0 2023-11-18 02:08:26,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=9066.666666666666, ans=0.125 2023-11-18 02:08:28,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.47 vs. limit=10.9 2023-11-18 02:08:40,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=9133.333333333334, ans=0.02861111111111111 2023-11-18 02:08:40,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.88 vs. limit=14.35 2023-11-18 02:08:41,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=10.925 2023-11-18 02:08:47,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=9200.0, ans=0.028333333333333335 2023-11-18 02:08:49,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=9200.0, ans=0.028333333333333335 2023-11-18 02:08:52,802 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:08:57,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=9266.666666666666, ans=0.5756666666666668 2023-11-18 02:08:59,215 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.58 vs. limit=10.975 2023-11-18 02:09:02,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=9266.666666666666, ans=0.008855072463768116 2023-11-18 02:09:09,621 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1400, loss[loss=0.2847, simple_loss=0.2411, pruned_loss=0.1554, audio_tagging_loss=0.01452, over 16226.00 frames. ], tot_loss[loss=0.2624, simple_loss=0.2123, pruned_loss=0.1574, audio_tagging_loss=0.01569, over 3050289.70 frames. ], batch size: 61, lr: 4.46e-02, grad_scale: 16.0 2023-11-18 02:09:11,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.17 vs. limit=11.0 2023-11-18 02:09:12,849 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.567e+01 1.322e+02 1.809e+02 2.689e+02 4.159e+02, threshold=3.617e+02, percent-clipped=14.0 2023-11-18 02:09:16,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=9333.333333333334, ans=0.125 2023-11-18 02:09:16,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=9333.333333333334, ans=0.125 2023-11-18 02:09:18,897 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.30 vs. limit=14.5 2023-11-18 02:09:39,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=9466.666666666666, ans=0.0 2023-11-18 02:09:43,526 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=14.61 vs. limit=14.65 2023-11-18 02:10:01,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=9600.0, ans=0.02666666666666667 2023-11-18 02:10:05,820 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1450, loss[loss=0.2213, simple_loss=0.1825, pruned_loss=0.112, audio_tagging_loss=0.0207, over 14091.00 frames. ], tot_loss[loss=0.2573, simple_loss=0.2098, pruned_loss=0.1508, audio_tagging_loss=0.01557, over 3047218.81 frames. ], batch size: 56, lr: 4.46e-02, grad_scale: 16.0 2023-11-18 02:10:21,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=9733.333333333334, ans=0.20266666666666666 2023-11-18 02:10:28,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=9800.0, ans=0.125 2023-11-18 02:10:39,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=9866.666666666666, ans=0.5546666666666666 2023-11-18 02:10:42,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=9866.666666666666, ans=0.125 2023-11-18 02:10:49,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=9866.666666666666, ans=0.125 2023-11-18 02:10:56,950 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.25 vs. limit=14.95 2023-11-18 02:11:01,706 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1500, loss[loss=0.2609, simple_loss=0.2332, pruned_loss=0.1353, audio_tagging_loss=0.01035, over 14723.00 frames. ], tot_loss[loss=0.2519, simple_loss=0.2069, pruned_loss=0.1447, audio_tagging_loss=0.01542, over 3046451.35 frames. ], batch size: 54, lr: 4.46e-02, grad_scale: 16.0 2023-11-18 02:11:03,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.53 vs. limit=10.0 2023-11-18 02:11:04,892 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.893e+01 1.138e+02 1.532e+02 2.102e+02 5.614e+02, threshold=3.064e+02, percent-clipped=6.0 2023-11-18 02:11:11,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=10000.0, ans=0.025 2023-11-18 02:11:24,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=10133.333333333334, ans=0.125 2023-11-18 02:11:30,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=10133.333333333334, ans=0.19866666666666666 2023-11-18 02:11:34,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=10133.333333333334, ans=0.0 2023-11-18 02:11:41,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=10200.0, ans=0.0 2023-11-18 02:11:46,748 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=11.07 vs. limit=11.35 2023-11-18 02:11:54,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=10266.666666666666, ans=0.125 2023-11-18 02:11:59,194 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1550, loss[loss=0.2289, simple_loss=0.1892, pruned_loss=0.1213, audio_tagging_loss=0.01625, over 15759.00 frames. ], tot_loss[loss=0.2447, simple_loss=0.2021, pruned_loss=0.1377, audio_tagging_loss=0.01547, over 3044386.54 frames. ], batch size: 58, lr: 4.45e-02, grad_scale: 16.0 2023-11-18 02:12:03,724 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.99 vs. limit=15.25 2023-11-18 02:12:06,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=10333.333333333334, ans=0.125 2023-11-18 02:12:21,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=10466.666666666666, ans=0.008594202898550726 2023-11-18 02:12:31,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.99 vs. limit=7.633333333333334 2023-11-18 02:12:33,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=10533.333333333334, ans=0.125 2023-11-18 02:12:35,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=10533.333333333334, ans=0.0 2023-11-18 02:12:39,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=10533.333333333334, ans=0.125 2023-11-18 02:12:48,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=10600.0, ans=0.008565217391304348 2023-11-18 02:12:55,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.94 vs. limit=11.5 2023-11-18 02:12:56,145 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1600, loss[loss=0.1974, simple_loss=0.1709, pruned_loss=0.09666, audio_tagging_loss=0.01561, over 14774.00 frames. ], tot_loss[loss=0.2396, simple_loss=0.199, pruned_loss=0.1324, audio_tagging_loss=0.01555, over 3043115.70 frames. ], batch size: 57, lr: 4.45e-02, grad_scale: 32.0 2023-11-18 02:12:59,355 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.594e+01 1.048e+02 1.443e+02 2.212e+02 4.225e+02, threshold=2.886e+02, percent-clipped=6.0 2023-11-18 02:13:06,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=10733.333333333334, ans=0.125 2023-11-18 02:13:35,239 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.98 vs. limit=11.575 2023-11-18 02:13:40,545 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.80 vs. limit=15.7 2023-11-18 02:13:42,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=10933.333333333334, ans=0.125 2023-11-18 02:13:51,856 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1650, loss[loss=0.2389, simple_loss=0.2055, pruned_loss=0.1189, audio_tagging_loss=0.01804, over 14670.00 frames. ], tot_loss[loss=0.234, simple_loss=0.1955, pruned_loss=0.127, audio_tagging_loss=0.01561, over 3045728.18 frames. ], batch size: 55, lr: 4.45e-02, grad_scale: 16.0 2023-11-18 02:13:58,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=11000.0, ans=11.625 2023-11-18 02:14:12,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=11066.666666666666, ans=0.02055555555555556 2023-11-18 02:14:17,898 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=16.28 vs. limit=15.85 2023-11-18 02:14:21,051 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.76 vs. limit=15.85 2023-11-18 02:14:23,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.77 vs. limit=8.453333333333333 2023-11-18 02:14:48,712 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1700, loss[loss=0.178, simple_loss=0.1451, pruned_loss=0.08786, audio_tagging_loss=0.0185, over 17048.00 frames. ], tot_loss[loss=0.2301, simple_loss=0.1934, pruned_loss=0.123, audio_tagging_loss=0.01552, over 3046454.03 frames. ], batch size: 65, lr: 4.44e-02, grad_scale: 16.0 2023-11-18 02:14:49,295 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=15.76 vs. limit=16.0 2023-11-18 02:14:53,003 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.938e+01 1.231e+02 1.950e+02 2.730e+02 7.528e+02, threshold=3.901e+02, percent-clipped=22.0 2023-11-18 02:14:59,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=11400.0, ans=0.5010000000000001 2023-11-18 02:15:30,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=11533.333333333334, ans=0.125 2023-11-18 02:15:30,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=11533.333333333334, ans=0.00836231884057971 2023-11-18 02:15:44,893 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1750, loss[loss=0.2481, simple_loss=0.2283, pruned_loss=0.1218, audio_tagging_loss=0.01154, over 15685.00 frames. ], tot_loss[loss=0.2268, simple_loss=0.1917, pruned_loss=0.1197, audio_tagging_loss=0.01536, over 3042298.95 frames. ], batch size: 57, lr: 4.44e-02, grad_scale: 16.0 2023-11-18 02:15:45,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=11666.666666666666, ans=0.125 2023-11-18 02:15:58,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=11733.333333333334, ans=0.48933333333333334 2023-11-18 02:16:08,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=11800.0, ans=0.182 2023-11-18 02:16:16,838 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.28 vs. limit=11.925 2023-11-18 02:16:29,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.34 vs. limit=16.45 2023-11-18 02:16:41,140 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1800, loss[loss=0.2098, simple_loss=0.1859, pruned_loss=0.1042, audio_tagging_loss=0.0127, over 15001.00 frames. ], tot_loss[loss=0.2221, simple_loss=0.189, pruned_loss=0.1158, audio_tagging_loss=0.01513, over 3040209.63 frames. ], batch size: 56, lr: 4.44e-02, grad_scale: 16.0 2023-11-18 02:16:45,461 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.752e+01 1.122e+02 1.379e+02 2.095e+02 9.381e+02, threshold=2.759e+02, percent-clipped=5.0 2023-11-18 02:16:51,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=12000.0, ans=0.00826086956521739 2023-11-18 02:17:14,506 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.03 vs. limit=16.65 2023-11-18 02:17:26,429 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.63 vs. limit=12.1 2023-11-18 02:17:27,506 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=18.30 vs. limit=16.7 2023-11-18 02:17:37,658 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1850, loss[loss=0.1733, simple_loss=0.1491, pruned_loss=0.0839, audio_tagging_loss=0.01492, over 14888.00 frames. ], tot_loss[loss=0.2185, simple_loss=0.1869, pruned_loss=0.1125, audio_tagging_loss=0.01507, over 3040737.06 frames. ], batch size: 56, lr: 4.43e-02, grad_scale: 16.0 2023-11-18 02:17:39,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=21.47 vs. limit=11.166666666666668 2023-11-18 02:17:59,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=12466.666666666666, ans=0.125 2023-11-18 02:18:02,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=12466.666666666666, ans=0.125 2023-11-18 02:18:04,687 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=18.05 vs. limit=16.85 2023-11-18 02:18:06,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=12466.666666666666, ans=0.125 2023-11-18 02:18:19,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=12533.333333333334, ans=0.17466666666666666 2023-11-18 02:18:20,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=12533.333333333334, ans=0.125 2023-11-18 02:18:21,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=12600.0, ans=0.125 2023-11-18 02:18:22,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=12600.0, ans=0.07 2023-11-18 02:18:28,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=12600.0, ans=0.125 2023-11-18 02:18:29,371 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:18:33,425 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1900, loss[loss=0.2387, simple_loss=0.2164, pruned_loss=0.1165, audio_tagging_loss=0.01386, over 14982.00 frames. ], tot_loss[loss=0.2162, simple_loss=0.1862, pruned_loss=0.1103, audio_tagging_loss=0.01483, over 3041562.46 frames. ], batch size: 55, lr: 4.43e-02, grad_scale: 16.0 2023-11-18 02:18:36,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=12666.666666666666, ans=0.013888888888888895 2023-11-18 02:18:37,661 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.647e+01 1.124e+02 1.503e+02 2.193e+02 6.798e+02, threshold=3.006e+02, percent-clipped=14.0 2023-11-18 02:18:54,353 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.93 vs. limit=12.275 2023-11-18 02:19:00,443 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.87 vs. limit=17.1 2023-11-18 02:19:06,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=12866.666666666666, ans=0.125 2023-11-18 02:19:23,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=12933.333333333334, ans=0.4473333333333333 2023-11-18 02:19:26,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=12933.333333333334, ans=0.008057971014492753 2023-11-18 02:19:29,593 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 1950, loss[loss=0.1955, simple_loss=0.1779, pruned_loss=0.09421, audio_tagging_loss=0.01224, over 15038.00 frames. ], tot_loss[loss=0.2092, simple_loss=0.1813, pruned_loss=0.1054, audio_tagging_loss=0.01473, over 3039287.77 frames. ], batch size: 57, lr: 4.43e-02, grad_scale: 16.0 2023-11-18 02:19:35,425 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.21 vs. limit=12.375 2023-11-18 02:19:38,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=13000.0, ans=0.125 2023-11-18 02:19:41,264 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.36 vs. limit=12.4 2023-11-18 02:19:48,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=13066.666666666666, ans=0.125 2023-11-18 02:19:52,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=13133.333333333334, ans=0.125 2023-11-18 02:19:58,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=13133.333333333334, ans=0.125 2023-11-18 02:20:21,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=13266.666666666666, ans=0.4356666666666667 2023-11-18 02:20:26,155 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2000, loss[loss=0.2195, simple_loss=0.1974, pruned_loss=0.1074, audio_tagging_loss=0.01347, over 15598.00 frames. ], tot_loss[loss=0.2068, simple_loss=0.1798, pruned_loss=0.1033, audio_tagging_loss=0.01475, over 3041611.46 frames. ], batch size: 56, lr: 4.42e-02, grad_scale: 32.0 2023-11-18 02:20:30,374 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 1.116e+02 1.535e+02 2.034e+02 3.808e+02, threshold=3.071e+02, percent-clipped=5.0 2023-11-18 02:20:38,528 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.32 vs. limit=12.525 2023-11-18 02:20:42,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten.whitening_limit, batch_count=13400.0, ans=12.525 2023-11-18 02:20:46,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=13466.666666666666, ans=0.16533333333333333 2023-11-18 02:20:54,847 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:21:21,371 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2050, loss[loss=0.1946, simple_loss=0.1726, pruned_loss=0.09336, audio_tagging_loss=0.01494, over 15199.00 frames. ], tot_loss[loss=0.2049, simple_loss=0.1794, pruned_loss=0.1015, audio_tagging_loss=0.01466, over 3035992.78 frames. ], batch size: 57, lr: 4.42e-02, grad_scale: 32.0 2023-11-18 02:21:24,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=13666.666666666666, ans=0.07 2023-11-18 02:21:25,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=13666.666666666666, ans=0.125 2023-11-18 02:21:51,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=13800.0, ans=0.0 2023-11-18 02:21:51,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=13800.0, ans=0.007869565217391305 2023-11-18 02:21:56,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=13866.666666666666, ans=0.007855072463768115 2023-11-18 02:22:07,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=13933.333333333334, ans=0.0 2023-11-18 02:22:17,375 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2100, loss[loss=0.2119, simple_loss=0.2024, pruned_loss=0.09953, audio_tagging_loss=0.01115, over 15860.00 frames. ], tot_loss[loss=0.2023, simple_loss=0.1779, pruned_loss=0.09935, audio_tagging_loss=0.01472, over 3035514.62 frames. ], batch size: 59, lr: 4.42e-02, grad_scale: 32.0 2023-11-18 02:22:21,606 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.391e+01 1.118e+02 1.317e+02 1.653e+02 4.106e+02, threshold=2.634e+02, percent-clipped=4.0 2023-11-18 02:22:42,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=14133.333333333334, ans=0.125 2023-11-18 02:22:43,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=14133.333333333334, ans=0.15866666666666665 2023-11-18 02:22:54,412 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.09 vs. limit=12.1 2023-11-18 02:23:00,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=14200.0, ans=0.025 2023-11-18 02:23:04,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=14266.666666666666, ans=0.125 2023-11-18 02:23:08,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=14266.666666666666, ans=0.15733333333333333 2023-11-18 02:23:09,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=14266.666666666666, ans=0.125 2023-11-18 02:23:13,716 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2150, loss[loss=0.2583, simple_loss=0.2352, pruned_loss=0.1307, audio_tagging_loss=0.009974, over 15426.00 frames. ], tot_loss[loss=0.2001, simple_loss=0.1768, pruned_loss=0.09762, audio_tagging_loss=0.0146, over 3029819.00 frames. ], batch size: 56, lr: 4.41e-02, grad_scale: 32.0 2023-11-18 02:23:25,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=14400.0, ans=0.006666666666666668 2023-11-18 02:23:42,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=14466.666666666666, ans=0.125 2023-11-18 02:23:42,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=14466.666666666666, ans=0.125 2023-11-18 02:23:45,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=14533.333333333334, ans=0.007710144927536232 2023-11-18 02:23:47,753 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:23:54,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=14533.333333333334, ans=0.125 2023-11-18 02:23:58,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=14600.0, ans=0.005833333333333336 2023-11-18 02:24:10,328 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2200, loss[loss=0.1852, simple_loss=0.1796, pruned_loss=0.08199, audio_tagging_loss=0.01346, over 15141.00 frames. ], tot_loss[loss=0.1971, simple_loss=0.1752, pruned_loss=0.09533, audio_tagging_loss=0.0146, over 3029650.86 frames. ], batch size: 55, lr: 4.41e-02, grad_scale: 32.0 2023-11-18 02:24:14,664 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.017e+01 1.117e+02 1.377e+02 2.009e+02 5.109e+02, threshold=2.755e+02, percent-clipped=7.0 2023-11-18 02:24:17,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=14666.666666666666, ans=0.125 2023-11-18 02:24:59,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=14933.333333333334, ans=0.15066666666666667 2023-11-18 02:25:01,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=14933.333333333334, ans=0.125 2023-11-18 02:25:02,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=14933.333333333334, ans=0.3773333333333333 2023-11-18 02:25:07,608 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2250, loss[loss=0.1793, simple_loss=0.155, pruned_loss=0.08429, audio_tagging_loss=0.01746, over 13396.00 frames. ], tot_loss[loss=0.1962, simple_loss=0.1751, pruned_loss=0.09434, audio_tagging_loss=0.01462, over 3027464.26 frames. ], batch size: 54, lr: 4.40e-02, grad_scale: 32.0 2023-11-18 02:25:16,937 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.08 vs. limit=18.75 2023-11-18 02:25:43,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.32 vs. limit=12.6 2023-11-18 02:25:50,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=15200.0, ans=0.003333333333333334 2023-11-18 02:26:05,404 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2300, loss[loss=0.1298, simple_loss=0.1017, pruned_loss=0.06038, audio_tagging_loss=0.01859, over 14343.00 frames. ], tot_loss[loss=0.1948, simple_loss=0.1749, pruned_loss=0.09297, audio_tagging_loss=0.01467, over 3032998.54 frames. ], batch size: 56, lr: 4.40e-02, grad_scale: 32.0 2023-11-18 02:26:06,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=15333.333333333334, ans=0.125 2023-11-18 02:26:09,710 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.661e+01 1.107e+02 1.429e+02 1.999e+02 3.636e+02, threshold=2.858e+02, percent-clipped=5.0 2023-11-18 02:26:09,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=15333.333333333334, ans=0.002777777777777775 2023-11-18 02:26:17,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.26 vs. limit=19.05 2023-11-18 02:26:31,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=15466.666666666666, ans=0.125 2023-11-18 02:26:39,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=15533.333333333334, ans=0.007492753623188406 2023-11-18 02:26:56,029 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:27:01,486 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2350, loss[loss=0.1786, simple_loss=0.167, pruned_loss=0.08116, audio_tagging_loss=0.01394, over 14956.00 frames. ], tot_loss[loss=0.1925, simple_loss=0.1733, pruned_loss=0.09126, audio_tagging_loss=0.01474, over 3039629.17 frames. ], batch size: 56, lr: 4.40e-02, grad_scale: 32.0 2023-11-18 02:27:09,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.21 vs. limit=19.25 2023-11-18 02:27:32,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=15800.0, ans=0.0008333333333333387 2023-11-18 02:27:39,427 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=23.33 vs. limit=19.4 2023-11-18 02:27:50,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=15933.333333333334, ans=0.00027777777777777263 2023-11-18 02:27:57,909 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2400, loss[loss=0.1932, simple_loss=0.1775, pruned_loss=0.09188, audio_tagging_loss=0.01253, over 16479.00 frames. ], tot_loss[loss=0.1889, simple_loss=0.1705, pruned_loss=0.08894, audio_tagging_loss=0.01485, over 3043702.54 frames. ], batch size: 63, lr: 4.39e-02, grad_scale: 32.0 2023-11-18 02:28:02,146 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.230e+01 1.240e+02 1.395e+02 1.790e+02 3.155e+02, threshold=2.791e+02, percent-clipped=5.0 2023-11-18 02:28:17,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16066.666666666666, ans=0.13933333333333334 2023-11-18 02:28:26,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16133.333333333334, ans=0.13866666666666666 2023-11-18 02:28:31,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=16200.0, ans=0.007347826086956522 2023-11-18 02:28:49,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=16266.666666666666, ans=0.0 2023-11-18 02:28:54,743 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2450, loss[loss=0.1646, simple_loss=0.1498, pruned_loss=0.07454, audio_tagging_loss=0.01517, over 14676.00 frames. ], tot_loss[loss=0.1889, simple_loss=0.171, pruned_loss=0.08855, audio_tagging_loss=0.01502, over 3047531.59 frames. ], batch size: 55, lr: 4.39e-02, grad_scale: 32.0 2023-11-18 02:29:09,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=16400.0, ans=0.0 2023-11-18 02:29:14,970 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=17.76 vs. limit=19.8 2023-11-18 02:29:24,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=16466.666666666668, ans=0.32366666666666677 2023-11-18 02:29:25,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=16466.666666666668, ans=0.1353333333333333 2023-11-18 02:29:39,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=16600.0, ans=10.0 2023-11-18 02:29:44,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=16600.0, ans=0.0 2023-11-18 02:29:49,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=16666.666666666668, ans=0.125 2023-11-18 02:29:49,868 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2500, loss[loss=0.2263, simple_loss=0.208, pruned_loss=0.1075, audio_tagging_loss=0.01488, over 15678.00 frames. ], tot_loss[loss=0.1887, simple_loss=0.1715, pruned_loss=0.08815, audio_tagging_loss=0.0149, over 3051046.96 frames. ], batch size: 59, lr: 4.38e-02, grad_scale: 32.0 2023-11-18 02:29:54,092 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.242e+01 1.096e+02 1.316e+02 1.723e+02 3.236e+02, threshold=2.632e+02, percent-clipped=4.0 2023-11-18 02:29:55,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.95 vs. limit=13.75 2023-11-18 02:30:10,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=16800.0, ans=0.132 2023-11-18 02:30:10,885 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.90 vs. limit=13.8 2023-11-18 02:30:38,294 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=15.12 vs. limit=13.85 2023-11-18 02:30:42,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=16933.333333333332, ans=0.00718840579710145 2023-11-18 02:30:45,338 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2550, loss[loss=0.2104, simple_loss=0.2099, pruned_loss=0.0948, audio_tagging_loss=0.01067, over 15879.00 frames. ], tot_loss[loss=0.1866, simple_loss=0.1705, pruned_loss=0.08685, audio_tagging_loss=0.01458, over 3044864.54 frames. ], batch size: 58, lr: 4.38e-02, grad_scale: 32.0 2023-11-18 02:30:58,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=17066.666666666668, ans=0.125 2023-11-18 02:31:26,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=17200.0, ans=0.29800000000000004 2023-11-18 02:31:35,222 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.73 vs. limit=9.316666666666666 2023-11-18 02:31:43,162 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2600, loss[loss=0.1798, simple_loss=0.1676, pruned_loss=0.08452, audio_tagging_loss=0.01148, over 15278.00 frames. ], tot_loss[loss=0.1834, simple_loss=0.1679, pruned_loss=0.08511, audio_tagging_loss=0.01441, over 3042351.10 frames. ], batch size: 57, lr: 4.37e-02, grad_scale: 32.0 2023-11-18 02:31:44,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=17333.333333333332, ans=0.125 2023-11-18 02:31:44,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=17333.333333333332, ans=0.125 2023-11-18 02:31:47,405 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.043e+01 1.250e+02 1.620e+02 2.059e+02 4.953e+02, threshold=3.240e+02, percent-clipped=12.0 2023-11-18 02:31:57,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=17400.0, ans=0.461 2023-11-18 02:32:03,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=17400.0, ans=0.125 2023-11-18 02:32:07,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=17466.666666666668, ans=0.125 2023-11-18 02:32:08,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=17466.666666666668, ans=0.0 2023-11-18 02:32:09,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=17466.666666666668, ans=0.05 2023-11-18 02:32:22,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=17533.333333333332, ans=0.0 2023-11-18 02:32:31,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=17600.0, ans=0.007043478260869565 2023-11-18 02:32:34,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=17600.0, ans=0.0 2023-11-18 02:32:39,192 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2650, loss[loss=0.126, simple_loss=0.1115, pruned_loss=0.05677, audio_tagging_loss=0.01344, over 14477.00 frames. ], tot_loss[loss=0.1826, simple_loss=0.1677, pruned_loss=0.08451, audio_tagging_loss=0.01426, over 3044433.56 frames. ], batch size: 58, lr: 4.37e-02, grad_scale: 32.0 2023-11-18 02:32:43,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=17666.666666666668, ans=0.125 2023-11-18 02:32:43,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=17666.666666666668, ans=0.12333333333333332 2023-11-18 02:32:45,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=17666.666666666668, ans=0.125 2023-11-18 02:32:45,910 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:32:47,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=17666.666666666668, ans=0.0733333333333333 2023-11-18 02:32:50,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=17733.333333333332, ans=0.007014492753623189 2023-11-18 02:32:57,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.32 vs. limit=20.8 2023-11-18 02:33:24,610 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.79 vs. limit=13.966666666666665 2023-11-18 02:33:26,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=17933.333333333332, ans=0.125 2023-11-18 02:33:27,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=17933.333333333332, ans=0.125 2023-11-18 02:33:34,668 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2700, loss[loss=0.1665, simple_loss=0.1668, pruned_loss=0.06932, audio_tagging_loss=0.01378, over 14810.00 frames. ], tot_loss[loss=0.1823, simple_loss=0.168, pruned_loss=0.08406, audio_tagging_loss=0.01431, over 3047945.93 frames. ], batch size: 54, lr: 4.36e-02, grad_scale: 32.0 2023-11-18 02:33:35,396 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.60 vs. limit=14.25 2023-11-18 02:33:38,903 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.040e+01 1.101e+02 1.289e+02 1.771e+02 2.746e+02, threshold=2.578e+02, percent-clipped=0.0 2023-11-18 02:33:42,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=18000.0, ans=0.0 2023-11-18 02:33:46,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=18066.666666666668, ans=0.006942028985507246 2023-11-18 02:33:47,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=18066.666666666668, ans=0.05 2023-11-18 02:33:58,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=18133.333333333332, ans=0.125 2023-11-18 02:34:14,652 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.88 vs. limit=14.325 2023-11-18 02:34:19,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=16.30 vs. limit=14.35 2023-11-18 02:34:23,145 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.52 vs. limit=14.35 2023-11-18 02:34:31,186 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2750, loss[loss=0.1851, simple_loss=0.17, pruned_loss=0.0863, audio_tagging_loss=0.01377, over 14821.00 frames. ], tot_loss[loss=0.1801, simple_loss=0.1662, pruned_loss=0.08271, audio_tagging_loss=0.01434, over 3044641.77 frames. ], batch size: 57, lr: 4.36e-02, grad_scale: 32.0 2023-11-18 02:34:44,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=18400.0, ans=0.125 2023-11-18 02:34:53,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=18466.666666666668, ans=0.125 2023-11-18 02:35:00,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=18466.666666666668, ans=0.47700000000000004 2023-11-18 02:35:02,365 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=14.425 2023-11-18 02:35:10,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.51 vs. limit=14.45 2023-11-18 02:35:20,316 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:35:27,839 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2800, loss[loss=0.1192, simple_loss=0.1054, pruned_loss=0.05226, audio_tagging_loss=0.01423, over 13763.00 frames. ], tot_loss[loss=0.1788, simple_loss=0.1652, pruned_loss=0.08187, audio_tagging_loss=0.01433, over 3040852.67 frames. ], batch size: 55, lr: 4.36e-02, grad_scale: 32.0 2023-11-18 02:35:32,076 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.686e+01 1.129e+02 1.327e+02 1.684e+02 3.032e+02, threshold=2.655e+02, percent-clipped=2.0 2023-11-18 02:35:33,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=18666.666666666668, ans=10.0 2023-11-18 02:35:35,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=11.14 vs. limit=11.466666666666667 2023-11-18 02:35:38,835 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:35:59,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=18800.0, ans=0.11200000000000002 2023-11-18 02:36:05,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=18866.666666666668, ans=0.11133333333333331 2023-11-18 02:36:18,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=18933.333333333332, ans=0.125 2023-11-18 02:36:23,516 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2850, loss[loss=0.1265, simple_loss=0.1141, pruned_loss=0.05518, audio_tagging_loss=0.01427, over 15764.00 frames. ], tot_loss[loss=0.1784, simple_loss=0.1654, pruned_loss=0.08138, audio_tagging_loss=0.01428, over 3038838.34 frames. ], batch size: 63, lr: 4.35e-02, grad_scale: 32.0 2023-11-18 02:36:39,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=19066.666666666668, ans=0.2326666666666667 2023-11-18 02:36:44,079 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.24 vs. limit=14.65 2023-11-18 02:36:44,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=19066.666666666668, ans=0.125 2023-11-18 02:36:44,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=19066.666666666668, ans=0.9406666666666667 2023-11-18 02:36:46,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=19133.333333333332, ans=0.006710144927536233 2023-11-18 02:36:48,455 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=14.675 2023-11-18 02:36:52,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=19133.333333333332, ans=0.125 2023-11-18 02:36:54,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=19133.333333333332, ans=0.125 2023-11-18 02:36:55,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=19133.333333333332, ans=0.2303333333333334 2023-11-18 02:36:56,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=19133.333333333332, ans=0.125 2023-11-18 02:37:13,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=19266.666666666668, ans=0.025 2023-11-18 02:37:16,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=19266.666666666668, ans=0.22566666666666668 2023-11-18 02:37:18,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=19266.666666666668, ans=0.07 2023-11-18 02:37:18,491 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.63 vs. limit=5.890000000000001 2023-11-18 02:37:21,221 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2900, loss[loss=0.2132, simple_loss=0.1923, pruned_loss=0.104, audio_tagging_loss=0.01305, over 14687.00 frames. ], tot_loss[loss=0.1774, simple_loss=0.1647, pruned_loss=0.08082, audio_tagging_loss=0.01423, over 3038675.38 frames. ], batch size: 53, lr: 4.35e-02, grad_scale: 32.0 2023-11-18 02:37:21,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=19333.333333333332, ans=0.006666666666666667 2023-11-18 02:37:23,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.37 vs. limit=14.75 2023-11-18 02:37:25,457 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.965e+01 1.019e+02 1.241e+02 1.587e+02 2.643e+02, threshold=2.482e+02, percent-clipped=0.0 2023-11-18 02:37:41,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=19400.0, ans=0.0 2023-11-18 02:37:43,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=19466.666666666668, ans=0.125 2023-11-18 02:37:48,110 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.33 vs. limit=22.1 2023-11-18 02:37:49,327 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.81 vs. limit=14.8 2023-11-18 02:38:04,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.00 vs. limit=22.15 2023-11-18 02:38:17,197 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 2950, loss[loss=0.1418, simple_loss=0.129, pruned_loss=0.05922, audio_tagging_loss=0.01812, over 14978.00 frames. ], tot_loss[loss=0.1777, simple_loss=0.1653, pruned_loss=0.0809, audio_tagging_loss=0.01419, over 3044916.08 frames. ], batch size: 58, lr: 4.34e-02, grad_scale: 32.0 2023-11-18 02:38:19,928 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.27 vs. limit=14.875 2023-11-18 02:38:21,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=19666.666666666668, ans=0.006594202898550724 2023-11-18 02:38:33,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=19733.333333333332, ans=0.125 2023-11-18 02:38:37,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=19733.333333333332, ans=0.125 2023-11-18 02:38:54,938 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:39:00,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=22.91 vs. limit=14.933333333333334 2023-11-18 02:39:06,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=19933.333333333332, ans=0.20233333333333337 2023-11-18 02:39:13,993 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3000, loss[loss=0.1268, simple_loss=0.1162, pruned_loss=0.0564, audio_tagging_loss=0.01224, over 14709.00 frames. ], tot_loss[loss=0.1767, simple_loss=0.1645, pruned_loss=0.08018, audio_tagging_loss=0.01426, over 3049171.20 frames. ], batch size: 55, lr: 4.34e-02, grad_scale: 32.0 2023-11-18 02:39:13,994 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 02:39:34,495 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4386, 4.0699, 4.4264, 4.3711], device='cuda:0') 2023-11-18 02:39:41,870 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.4.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([4.4337, 4.0156, 4.4417, 4.3620], device='cuda:0') 2023-11-18 02:39:47,915 INFO [train_asr.py:1147] (0/4) Epoch 1, validation: loss=0.1123, simple_loss=0.08353, pruned_loss=0.02777, audio_tagging_loss=0.04274, over 4681554.00 frames. 2023-11-18 02:39:47,915 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 02:39:48,309 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.80 vs. limit=15.0 2023-11-18 02:39:51,558 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.85 vs. limit=15.0 2023-11-18 02:39:52,080 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.173e+01 1.112e+02 1.246e+02 1.564e+02 3.954e+02, threshold=2.493e+02, percent-clipped=6.0 2023-11-18 02:39:54,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=20000.0, ans=0.125 2023-11-18 02:39:54,857 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.15 vs. limit=10.0 2023-11-18 02:39:55,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=20000.0, ans=0.006521739130434783 2023-11-18 02:40:26,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=20200.0, ans=0.125 2023-11-18 02:40:34,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=20266.666666666668, ans=0.125 2023-11-18 02:40:35,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=20266.666666666668, ans=10.0 2023-11-18 02:40:37,782 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.50 vs. limit=22.5 2023-11-18 02:40:42,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.71 vs. limit=22.5 2023-11-18 02:40:43,621 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3050, loss[loss=0.1844, simple_loss=0.1776, pruned_loss=0.08132, audio_tagging_loss=0.01423, over 14625.00 frames. ], tot_loss[loss=0.1769, simple_loss=0.1654, pruned_loss=0.08006, audio_tagging_loss=0.01417, over 3056832.40 frames. ], batch size: 53, lr: 4.33e-02, grad_scale: 32.0 2023-11-18 02:41:00,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=20400.0, ans=0.125 2023-11-18 02:41:04,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=20466.666666666668, ans=0.1 2023-11-18 02:41:04,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=20466.666666666668, ans=0.125 2023-11-18 02:41:05,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.01 vs. limit=15.0 2023-11-18 02:41:06,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=20466.666666666668, ans=0.125 2023-11-18 02:41:15,385 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.47 vs. limit=6.0 2023-11-18 02:41:17,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=20533.333333333332, ans=0.125 2023-11-18 02:41:18,249 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:41:23,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=20533.333333333332, ans=0.1 2023-11-18 02:41:40,407 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3100, loss[loss=0.2195, simple_loss=0.203, pruned_loss=0.1048, audio_tagging_loss=0.01322, over 15669.00 frames. ], tot_loss[loss=0.1773, simple_loss=0.166, pruned_loss=0.0802, audio_tagging_loss=0.01414, over 3062666.67 frames. ], batch size: 58, lr: 4.33e-02, grad_scale: 32.0 2023-11-18 02:41:40,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=20666.666666666668, ans=0.006376811594202898 2023-11-18 02:41:44,733 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.846e+01 1.051e+02 1.308e+02 1.673e+02 2.696e+02, threshold=2.616e+02, percent-clipped=3.0 2023-11-18 02:41:53,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=20733.333333333332, ans=0.0063623188405797105 2023-11-18 02:41:56,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=20733.333333333332, ans=0.0063623188405797105 2023-11-18 02:42:12,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=20800.0, ans=0.125 2023-11-18 02:42:28,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=9.75 vs. limit=10.0 2023-11-18 02:42:31,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=20933.333333333332, ans=0.125 2023-11-18 02:42:37,778 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3150, loss[loss=0.1896, simple_loss=0.1917, pruned_loss=0.08367, audio_tagging_loss=0.01007, over 14942.00 frames. ], tot_loss[loss=0.1775, simple_loss=0.1666, pruned_loss=0.08002, audio_tagging_loss=0.01419, over 3056635.04 frames. ], batch size: 54, lr: 4.32e-02, grad_scale: 32.0 2023-11-18 02:42:42,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=21000.0, ans=0.125 2023-11-18 02:42:59,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=21133.333333333332, ans=0.125 2023-11-18 02:43:00,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=21133.333333333332, ans=0.2 2023-11-18 02:43:34,055 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3200, loss[loss=0.2051, simple_loss=0.1891, pruned_loss=0.09588, audio_tagging_loss=0.01465, over 14849.00 frames. ], tot_loss[loss=0.175, simple_loss=0.1641, pruned_loss=0.07833, audio_tagging_loss=0.01457, over 3054480.74 frames. ], batch size: 58, lr: 4.32e-02, grad_scale: 32.0 2023-11-18 02:43:38,318 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.453e+01 1.064e+02 1.244e+02 1.490e+02 2.410e+02, threshold=2.488e+02, percent-clipped=0.0 2023-11-18 02:43:40,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=21333.333333333332, ans=0.125 2023-11-18 02:43:48,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=21400.0, ans=0.2 2023-11-18 02:44:06,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=21466.666666666668, ans=0.125 2023-11-18 02:44:14,349 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 02:44:30,115 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3250, loss[loss=0.2192, simple_loss=0.2042, pruned_loss=0.1025, audio_tagging_loss=0.01457, over 15060.00 frames. ], tot_loss[loss=0.175, simple_loss=0.1643, pruned_loss=0.07819, audio_tagging_loss=0.01468, over 3058654.52 frames. ], batch size: 57, lr: 4.31e-02, grad_scale: 32.0 2023-11-18 02:44:34,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=21666.666666666668, ans=0.1 2023-11-18 02:44:48,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=21733.333333333332, ans=0.0 2023-11-18 02:44:50,311 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=19.17 vs. limit=22.5 2023-11-18 02:44:53,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=15.0 2023-11-18 02:45:00,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=21800.0, ans=0.125 2023-11-18 02:45:08,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=21866.666666666668, ans=0.125 2023-11-18 02:45:14,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=21933.333333333332, ans=0.00610144927536232 2023-11-18 02:45:14,896 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=12.43 vs. limit=12.0 2023-11-18 02:45:18,466 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.55 vs. limit=15.0 2023-11-18 02:45:27,668 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3300, loss[loss=0.2522, simple_loss=0.2442, pruned_loss=0.1176, audio_tagging_loss=0.01248, over 15776.00 frames. ], tot_loss[loss=0.1742, simple_loss=0.1639, pruned_loss=0.07761, audio_tagging_loss=0.01461, over 3060273.85 frames. ], batch size: 56, lr: 4.31e-02, grad_scale: 32.0 2023-11-18 02:45:32,503 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.259e+01 1.069e+02 1.225e+02 1.477e+02 2.736e+02, threshold=2.451e+02, percent-clipped=1.0 2023-11-18 02:45:33,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=22000.0, ans=0.125 2023-11-18 02:45:47,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=22066.666666666668, ans=0.09899494936611666 2023-11-18 02:45:55,062 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.19 vs. limit=22.5 2023-11-18 02:45:58,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=22133.333333333332, ans=0.125 2023-11-18 02:46:12,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=22266.666666666668, ans=0.125 2023-11-18 02:46:24,730 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3350, loss[loss=0.1985, simple_loss=0.1862, pruned_loss=0.09379, audio_tagging_loss=0.01159, over 16074.00 frames. ], tot_loss[loss=0.1748, simple_loss=0.1651, pruned_loss=0.0779, audio_tagging_loss=0.01435, over 3055747.07 frames. ], batch size: 58, lr: 4.30e-02, grad_scale: 32.0 2023-11-18 02:46:55,121 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.58 vs. limit=22.5 2023-11-18 02:47:11,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=22600.0, ans=0.005956521739130435 2023-11-18 02:47:14,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=22600.0, ans=0.1 2023-11-18 02:47:15,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=22600.0, ans=0.125 2023-11-18 02:47:16,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=22600.0, ans=0.005956521739130435 2023-11-18 02:47:21,312 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3400, loss[loss=0.1348, simple_loss=0.125, pruned_loss=0.05878, audio_tagging_loss=0.01353, over 14891.00 frames. ], tot_loss[loss=0.1741, simple_loss=0.1647, pruned_loss=0.07767, audio_tagging_loss=0.01403, over 3059747.51 frames. ], batch size: 56, lr: 4.29e-02, grad_scale: 32.0 2023-11-18 02:47:25,579 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.820e+01 1.014e+02 1.234e+02 1.515e+02 3.091e+02, threshold=2.469e+02, percent-clipped=0.0 2023-11-18 02:47:28,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=22666.666666666668, ans=0.125 2023-11-18 02:48:11,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=22933.333333333332, ans=0.0058840579710144935 2023-11-18 02:48:11,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=22933.333333333332, ans=15.0 2023-11-18 02:48:16,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=22933.333333333332, ans=0.125 2023-11-18 02:48:18,064 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3450, loss[loss=0.1696, simple_loss=0.1523, pruned_loss=0.07987, audio_tagging_loss=0.01361, over 12990.00 frames. ], tot_loss[loss=0.1726, simple_loss=0.1635, pruned_loss=0.07701, audio_tagging_loss=0.01383, over 3053528.70 frames. ], batch size: 54, lr: 4.29e-02, grad_scale: 32.0 2023-11-18 02:48:22,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=23000.0, ans=0.125 2023-11-18 02:48:40,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=23133.333333333332, ans=0.0 2023-11-18 02:48:50,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=23200.0, ans=0.125 2023-11-18 02:48:55,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=23200.0, ans=0.00582608695652174 2023-11-18 02:49:15,058 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3500, loss[loss=0.1972, simple_loss=0.19, pruned_loss=0.08836, audio_tagging_loss=0.01385, over 14814.00 frames. ], tot_loss[loss=0.1719, simple_loss=0.1633, pruned_loss=0.07646, audio_tagging_loss=0.01377, over 3047151.10 frames. ], batch size: 55, lr: 4.28e-02, grad_scale: 32.0 2023-11-18 02:49:19,473 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.153e+01 1.129e+02 1.309e+02 1.633e+02 2.948e+02, threshold=2.617e+02, percent-clipped=2.0 2023-11-18 02:49:26,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=23400.0, ans=0.0 2023-11-18 02:49:29,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=23400.0, ans=0.5 2023-11-18 02:49:39,362 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.71 vs. limit=10.0 2023-11-18 02:49:44,223 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:49:49,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=23533.333333333332, ans=0.0 2023-11-18 02:49:50,697 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.67 vs. limit=15.0 2023-11-18 02:49:53,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=23533.333333333332, ans=0.125 2023-11-18 02:50:10,786 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3550, loss[loss=0.1654, simple_loss=0.1619, pruned_loss=0.07423, audio_tagging_loss=0.01018, over 15623.00 frames. ], tot_loss[loss=0.1713, simple_loss=0.163, pruned_loss=0.07605, audio_tagging_loss=0.01373, over 3050189.40 frames. ], batch size: 60, lr: 4.28e-02, grad_scale: 32.0 2023-11-18 02:50:43,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=23800.0, ans=0.125 2023-11-18 02:50:43,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=23800.0, ans=0.0 2023-11-18 02:50:44,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=23866.666666666668, ans=0.1 2023-11-18 02:50:51,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=23866.666666666668, ans=0.2 2023-11-18 02:50:58,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=23933.333333333332, ans=0.125 2023-11-18 02:51:01,491 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.73 vs. limit=15.0 2023-11-18 02:51:07,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.56 vs. limit=10.0 2023-11-18 02:51:08,217 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3600, loss[loss=0.1365, simple_loss=0.1301, pruned_loss=0.05894, audio_tagging_loss=0.01248, over 14973.00 frames. ], tot_loss[loss=0.1701, simple_loss=0.1622, pruned_loss=0.07536, audio_tagging_loss=0.01365, over 3044329.84 frames. ], batch size: 57, lr: 4.27e-02, grad_scale: 32.0 2023-11-18 02:51:11,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=24000.0, ans=0.125 2023-11-18 02:51:13,842 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.609e+01 1.015e+02 1.156e+02 1.393e+02 2.534e+02, threshold=2.312e+02, percent-clipped=0.0 2023-11-18 02:51:23,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24066.666666666668, ans=0.1 2023-11-18 02:51:38,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=24133.333333333332, ans=0.125 2023-11-18 02:51:58,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=24266.666666666668, ans=0.0 2023-11-18 02:52:02,888 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.42 vs. limit=6.0 2023-11-18 02:52:05,638 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3650, loss[loss=0.1857, simple_loss=0.1746, pruned_loss=0.08427, audio_tagging_loss=0.01411, over 15230.00 frames. ], tot_loss[loss=0.17, simple_loss=0.1619, pruned_loss=0.07528, audio_tagging_loss=0.0138, over 3040234.44 frames. ], batch size: 56, lr: 4.27e-02, grad_scale: 64.0 2023-11-18 02:52:51,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=24600.0, ans=0.005521739130434783 2023-11-18 02:52:55,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=24600.0, ans=0.2 2023-11-18 02:53:00,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=24666.666666666668, ans=0.0 2023-11-18 02:53:01,445 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3700, loss[loss=0.1582, simple_loss=0.1511, pruned_loss=0.06911, audio_tagging_loss=0.01355, over 15261.00 frames. ], tot_loss[loss=0.1688, simple_loss=0.1608, pruned_loss=0.07445, audio_tagging_loss=0.01396, over 3036643.90 frames. ], batch size: 58, lr: 4.26e-02, grad_scale: 64.0 2023-11-18 02:53:02,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=24666.666666666668, ans=0.0 2023-11-18 02:53:03,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=24666.666666666668, ans=0.1 2023-11-18 02:53:03,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=24666.666666666668, ans=0.125 2023-11-18 02:53:03,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=24666.666666666668, ans=0.125 2023-11-18 02:53:05,626 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 1.068e+02 1.322e+02 1.624e+02 2.925e+02, threshold=2.645e+02, percent-clipped=5.0 2023-11-18 02:53:16,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=24733.333333333332, ans=0.125 2023-11-18 02:53:22,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=24733.333333333332, ans=0.0 2023-11-18 02:53:22,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=24733.333333333332, ans=0.1 2023-11-18 02:53:29,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=24800.0, ans=0.125 2023-11-18 02:53:41,970 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.82 vs. limit=10.0 2023-11-18 02:53:44,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=24866.666666666668, ans=0.0 2023-11-18 02:53:46,344 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-11-18 02:53:53,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=24933.333333333332, ans=0.125 2023-11-18 02:53:58,196 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3750, loss[loss=0.1652, simple_loss=0.1597, pruned_loss=0.06992, audio_tagging_loss=0.0155, over 14647.00 frames. ], tot_loss[loss=0.1697, simple_loss=0.1619, pruned_loss=0.0748, audio_tagging_loss=0.01392, over 3041835.23 frames. ], batch size: 57, lr: 4.26e-02, grad_scale: 64.0 2023-11-18 02:54:04,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=25000.0, ans=0.125 2023-11-18 02:54:16,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=25066.666666666668, ans=0.125 2023-11-18 02:54:37,584 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 02:54:53,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=25266.666666666668, ans=0.2 2023-11-18 02:54:56,659 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3800, loss[loss=0.151, simple_loss=0.1427, pruned_loss=0.06433, audio_tagging_loss=0.01529, over 15027.00 frames. ], tot_loss[loss=0.1695, simple_loss=0.1617, pruned_loss=0.07463, audio_tagging_loss=0.014, over 3048479.57 frames. ], batch size: 57, lr: 4.25e-02, grad_scale: 64.0 2023-11-18 02:54:58,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=25333.333333333332, ans=0.1 2023-11-18 02:54:59,362 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.52 vs. limit=12.0 2023-11-18 02:55:01,041 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.087e+01 1.058e+02 1.234e+02 1.426e+02 2.558e+02, threshold=2.469e+02, percent-clipped=0.0 2023-11-18 02:55:05,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=25333.333333333332, ans=0.125 2023-11-18 02:55:05,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=25333.333333333332, ans=0.125 2023-11-18 02:55:10,460 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.81 vs. limit=10.0 2023-11-18 02:55:30,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=25533.333333333332, ans=0.125 2023-11-18 02:55:36,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=25533.333333333332, ans=0.005318840579710145 2023-11-18 02:55:38,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=25533.333333333332, ans=0.2 2023-11-18 02:55:38,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=25533.333333333332, ans=0.1 2023-11-18 02:55:42,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=25600.0, ans=0.005304347826086957 2023-11-18 02:55:46,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=16.39 vs. limit=15.0 2023-11-18 02:55:53,512 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3850, loss[loss=0.1604, simple_loss=0.1505, pruned_loss=0.07202, audio_tagging_loss=0.01318, over 15964.00 frames. ], tot_loss[loss=0.1694, simple_loss=0.1618, pruned_loss=0.07454, audio_tagging_loss=0.01399, over 3052311.69 frames. ], batch size: 59, lr: 4.24e-02, grad_scale: 64.0 2023-11-18 02:55:57,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=25666.666666666668, ans=0.0 2023-11-18 02:55:58,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=25666.666666666668, ans=0.1 2023-11-18 02:56:00,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=6.07 vs. limit=6.0 2023-11-18 02:56:09,723 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.39 vs. limit=10.0 2023-11-18 02:56:19,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=25800.0, ans=0.125 2023-11-18 02:56:20,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=25800.0, ans=0.1 2023-11-18 02:56:23,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=25800.0, ans=0.125 2023-11-18 02:56:29,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=25866.666666666668, ans=0.0 2023-11-18 02:56:40,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=25933.333333333332, ans=0.0 2023-11-18 02:56:49,652 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3900, loss[loss=0.1839, simple_loss=0.1682, pruned_loss=0.08202, audio_tagging_loss=0.01774, over 15241.00 frames. ], tot_loss[loss=0.1698, simple_loss=0.1623, pruned_loss=0.07457, audio_tagging_loss=0.01405, over 3053713.52 frames. ], batch size: 57, lr: 4.24e-02, grad_scale: 64.0 2023-11-18 02:56:52,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=26000.0, ans=0.0 2023-11-18 02:56:53,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=26000.0, ans=0.1 2023-11-18 02:56:54,439 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.882e+01 1.063e+02 1.269e+02 1.447e+02 2.279e+02, threshold=2.539e+02, percent-clipped=0.0 2023-11-18 02:56:56,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.15 vs. limit=12.0 2023-11-18 02:57:29,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=26200.0, ans=0.1 2023-11-18 02:57:43,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=26266.666666666668, ans=0.125 2023-11-18 02:57:47,474 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 3950, loss[loss=0.1426, simple_loss=0.1357, pruned_loss=0.06293, audio_tagging_loss=0.01181, over 14801.00 frames. ], tot_loss[loss=0.169, simple_loss=0.1614, pruned_loss=0.0741, audio_tagging_loss=0.01418, over 3052594.81 frames. ], batch size: 56, lr: 4.23e-02, grad_scale: 64.0 2023-11-18 02:57:47,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=26333.333333333332, ans=0.005144927536231885 2023-11-18 02:57:54,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=26333.333333333332, ans=0.125 2023-11-18 02:58:01,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=26400.0, ans=0.1 2023-11-18 02:58:06,768 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.66 vs. limit=15.0 2023-11-18 02:58:21,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=26533.333333333332, ans=0.025 2023-11-18 02:58:25,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=26533.333333333332, ans=0.125 2023-11-18 02:58:35,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=26600.0, ans=0.125 2023-11-18 02:58:39,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.19 vs. limit=15.0 2023-11-18 02:58:40,310 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.31 vs. limit=15.0 2023-11-18 02:58:43,600 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-4000.pt 2023-11-18 02:58:46,895 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4000, loss[loss=0.211, simple_loss=0.2008, pruned_loss=0.097, audio_tagging_loss=0.01363, over 15282.00 frames. ], tot_loss[loss=0.1687, simple_loss=0.1612, pruned_loss=0.07386, audio_tagging_loss=0.01426, over 3058309.43 frames. ], batch size: 57, lr: 4.23e-02, grad_scale: 64.0 2023-11-18 02:58:48,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=26666.666666666668, ans=0.2 2023-11-18 02:58:49,602 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=17.23 vs. limit=15.0 2023-11-18 02:58:51,136 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.575e+01 1.092e+02 1.270e+02 1.504e+02 2.237e+02, threshold=2.540e+02, percent-clipped=0.0 2023-11-18 02:58:57,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=26733.333333333332, ans=0.0 2023-11-18 02:59:04,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=26733.333333333332, ans=0.125 2023-11-18 02:59:20,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=26866.666666666668, ans=0.125 2023-11-18 02:59:23,240 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.96 vs. limit=15.0 2023-11-18 02:59:25,081 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.279e+01 2023-11-18 02:59:30,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=26866.666666666668, ans=0.1 2023-11-18 02:59:36,074 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=18.10 vs. limit=15.0 2023-11-18 02:59:36,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=26933.333333333332, ans=0.1 2023-11-18 02:59:43,007 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4050, loss[loss=0.1159, simple_loss=0.0993, pruned_loss=0.04914, audio_tagging_loss=0.01714, over 14725.00 frames. ], tot_loss[loss=0.1671, simple_loss=0.1596, pruned_loss=0.07297, audio_tagging_loss=0.01437, over 3057451.01 frames. ], batch size: 61, lr: 4.22e-02, grad_scale: 64.0 2023-11-18 02:59:43,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=27000.0, ans=0.125 2023-11-18 02:59:46,341 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:00:33,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=27266.666666666668, ans=0.004942028985507246 2023-11-18 03:00:41,273 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4100, loss[loss=0.1721, simple_loss=0.1839, pruned_loss=0.07112, audio_tagging_loss=0.009068, over 15256.00 frames. ], tot_loss[loss=0.1679, simple_loss=0.1609, pruned_loss=0.07328, audio_tagging_loss=0.01412, over 3054963.18 frames. ], batch size: 56, lr: 4.22e-02, grad_scale: 64.0 2023-11-18 03:00:45,568 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.632e+01 1.139e+02 1.299e+02 1.567e+02 2.247e+02, threshold=2.597e+02, percent-clipped=0.0 2023-11-18 03:00:54,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=27400.0, ans=0.0 2023-11-18 03:00:59,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=27400.0, ans=0.00491304347826087 2023-11-18 03:01:06,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=12.0 2023-11-18 03:01:21,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=27533.333333333332, ans=0.125 2023-11-18 03:01:23,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=27533.333333333332, ans=0.125 2023-11-18 03:01:27,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=27600.0, ans=0.125 2023-11-18 03:01:29,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=27600.0, ans=0.0 2023-11-18 03:01:30,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=27600.0, ans=0.2 2023-11-18 03:01:38,132 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4150, loss[loss=0.166, simple_loss=0.1495, pruned_loss=0.07821, audio_tagging_loss=0.01305, over 13956.00 frames. ], tot_loss[loss=0.1669, simple_loss=0.1604, pruned_loss=0.07279, audio_tagging_loss=0.01392, over 3051084.90 frames. ], batch size: 52, lr: 4.21e-02, grad_scale: 64.0 2023-11-18 03:01:39,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=27666.666666666668, ans=0.0 2023-11-18 03:01:43,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=27666.666666666668, ans=0.125 2023-11-18 03:01:53,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=27733.333333333332, ans=0.125 2023-11-18 03:02:19,681 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:02:19,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=27866.666666666668, ans=0.125 2023-11-18 03:02:34,615 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4200, loss[loss=0.1812, simple_loss=0.1772, pruned_loss=0.07914, audio_tagging_loss=0.0135, over 15525.00 frames. ], tot_loss[loss=0.166, simple_loss=0.1599, pruned_loss=0.07236, audio_tagging_loss=0.01365, over 3053254.24 frames. ], batch size: 56, lr: 4.20e-02, grad_scale: 64.0 2023-11-18 03:02:38,899 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.536e+01 1.064e+02 1.276e+02 1.442e+02 2.964e+02, threshold=2.551e+02, percent-clipped=1.0 2023-11-18 03:02:40,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=28000.0, ans=0.05 2023-11-18 03:02:51,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.71 vs. limit=6.0 2023-11-18 03:02:52,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=28066.666666666668, ans=0.125 2023-11-18 03:02:55,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=28066.666666666668, ans=0.125 2023-11-18 03:03:03,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=28133.333333333332, ans=0.125 2023-11-18 03:03:08,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=28200.0, ans=0.125 2023-11-18 03:03:14,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=28200.0, ans=0.07 2023-11-18 03:03:26,751 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.377e+00 2023-11-18 03:03:31,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=28333.333333333332, ans=22.5 2023-11-18 03:03:32,466 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4250, loss[loss=0.1868, simple_loss=0.1819, pruned_loss=0.08063, audio_tagging_loss=0.0152, over 15715.00 frames. ], tot_loss[loss=0.1657, simple_loss=0.1598, pruned_loss=0.07221, audio_tagging_loss=0.01355, over 3051682.18 frames. ], batch size: 57, lr: 4.20e-02, grad_scale: 64.0 2023-11-18 03:03:50,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=28400.0, ans=0.0 2023-11-18 03:03:57,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=28466.666666666668, ans=0.125 2023-11-18 03:04:00,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=28466.666666666668, ans=0.125 2023-11-18 03:04:00,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=28466.666666666668, ans=0.125 2023-11-18 03:04:17,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=28600.0, ans=0.125 2023-11-18 03:04:22,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=28600.0, ans=0.0 2023-11-18 03:04:25,688 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.59 vs. limit=15.0 2023-11-18 03:04:28,449 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4300, loss[loss=0.2217, simple_loss=0.2149, pruned_loss=0.1043, audio_tagging_loss=0.009882, over 16381.00 frames. ], tot_loss[loss=0.1669, simple_loss=0.1612, pruned_loss=0.07289, audio_tagging_loss=0.0134, over 3054982.93 frames. ], batch size: 59, lr: 4.19e-02, grad_scale: 64.0 2023-11-18 03:04:29,992 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.06 vs. limit=15.0 2023-11-18 03:04:30,273 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.40 vs. limit=15.0 2023-11-18 03:04:32,715 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 1.089e+02 1.255e+02 1.443e+02 2.387e+02, threshold=2.510e+02, percent-clipped=0.0 2023-11-18 03:04:34,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=28666.666666666668, ans=0.125 2023-11-18 03:04:40,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=28733.333333333332, ans=0.2 2023-11-18 03:04:42,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=28733.333333333332, ans=0.125 2023-11-18 03:04:44,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.13 vs. limit=15.0 2023-11-18 03:04:59,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=28800.0, ans=0.02 2023-11-18 03:05:25,335 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4350, loss[loss=0.1822, simple_loss=0.1737, pruned_loss=0.08187, audio_tagging_loss=0.01345, over 14946.00 frames. ], tot_loss[loss=0.1679, simple_loss=0.1624, pruned_loss=0.07325, audio_tagging_loss=0.0134, over 3050859.20 frames. ], batch size: 58, lr: 4.19e-02, grad_scale: 64.0 2023-11-18 03:05:34,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=29000.0, ans=0.0 2023-11-18 03:05:38,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=29066.666666666668, ans=0.125 2023-11-18 03:05:42,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=29066.666666666668, ans=0.0045507246376811595 2023-11-18 03:05:47,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=29133.333333333332, ans=0.025 2023-11-18 03:05:51,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.28 vs. limit=15.0 2023-11-18 03:05:52,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=29133.333333333332, ans=0.125 2023-11-18 03:06:01,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=29200.0, ans=0.0 2023-11-18 03:06:13,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=29266.666666666668, ans=0.125 2023-11-18 03:06:22,953 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4400, loss[loss=0.1747, simple_loss=0.1648, pruned_loss=0.07707, audio_tagging_loss=0.01524, over 14301.00 frames. ], tot_loss[loss=0.167, simple_loss=0.1618, pruned_loss=0.07272, audio_tagging_loss=0.01333, over 3051391.11 frames. ], batch size: 54, lr: 4.18e-02, grad_scale: 64.0 2023-11-18 03:06:27,738 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.425e+01 1.162e+02 1.302e+02 1.640e+02 3.175e+02, threshold=2.603e+02, percent-clipped=6.0 2023-11-18 03:06:30,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=29333.333333333332, ans=0.125 2023-11-18 03:06:34,954 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.11 vs. limit=15.0 2023-11-18 03:06:39,925 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.29 vs. limit=15.0 2023-11-18 03:06:57,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=29533.333333333332, ans=0.125 2023-11-18 03:07:09,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=29600.0, ans=0.2 2023-11-18 03:07:19,289 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4450, loss[loss=0.2084, simple_loss=0.2061, pruned_loss=0.09385, audio_tagging_loss=0.01157, over 16939.00 frames. ], tot_loss[loss=0.1665, simple_loss=0.1613, pruned_loss=0.07245, audio_tagging_loss=0.01338, over 3047836.34 frames. ], batch size: 59, lr: 4.17e-02, grad_scale: 64.0 2023-11-18 03:07:34,479 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=15.0 2023-11-18 03:07:41,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=29800.0, ans=0.125 2023-11-18 03:07:44,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=29800.0, ans=0.95 2023-11-18 03:07:51,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=29800.0, ans=0.2 2023-11-18 03:08:00,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=29866.666666666668, ans=0.004376811594202898 2023-11-18 03:08:09,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.26 vs. limit=22.5 2023-11-18 03:08:15,493 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4500, loss[loss=0.1515, simple_loss=0.1468, pruned_loss=0.06177, audio_tagging_loss=0.01627, over 14796.00 frames. ], tot_loss[loss=0.1663, simple_loss=0.1609, pruned_loss=0.07233, audio_tagging_loss=0.01351, over 3056730.35 frames. ], batch size: 56, lr: 4.17e-02, grad_scale: 64.0 2023-11-18 03:08:20,342 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 1.091e+02 1.301e+02 1.544e+02 2.749e+02, threshold=2.602e+02, percent-clipped=1.0 2023-11-18 03:08:25,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=30000.0, ans=0.0 2023-11-18 03:08:31,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=30066.666666666668, ans=0.125 2023-11-18 03:08:45,290 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.17 vs. limit=12.0 2023-11-18 03:09:06,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=30266.666666666668, ans=0.125 2023-11-18 03:09:06,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=30266.666666666668, ans=0.2 2023-11-18 03:09:09,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=30266.666666666668, ans=0.1 2023-11-18 03:09:13,099 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4550, loss[loss=0.109, simple_loss=0.1103, pruned_loss=0.03915, audio_tagging_loss=0.01467, over 15781.00 frames. ], tot_loss[loss=0.1661, simple_loss=0.1606, pruned_loss=0.07218, audio_tagging_loss=0.01355, over 3058146.60 frames. ], batch size: 63, lr: 4.16e-02, grad_scale: 64.0 2023-11-18 03:09:23,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=30400.0, ans=0.0 2023-11-18 03:09:27,253 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.97 vs. limit=15.0 2023-11-18 03:09:37,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=30466.666666666668, ans=0.125 2023-11-18 03:09:40,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=30466.666666666668, ans=0.125 2023-11-18 03:09:43,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=30466.666666666668, ans=15.0 2023-11-18 03:09:45,966 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.86 vs. limit=22.5 2023-11-18 03:09:55,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=30533.333333333332, ans=0.125 2023-11-18 03:09:57,980 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:10:04,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=30600.0, ans=0.2 2023-11-18 03:10:07,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.74 vs. limit=15.0 2023-11-18 03:10:10,257 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4600, loss[loss=0.1384, simple_loss=0.1285, pruned_loss=0.05937, audio_tagging_loss=0.01478, over 14185.00 frames. ], tot_loss[loss=0.1657, simple_loss=0.1601, pruned_loss=0.07189, audio_tagging_loss=0.01381, over 3055783.18 frames. ], batch size: 54, lr: 4.15e-02, grad_scale: 64.0 2023-11-18 03:10:14,504 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.773e+01 1.069e+02 1.267e+02 1.546e+02 2.795e+02, threshold=2.534e+02, percent-clipped=1.0 2023-11-18 03:10:31,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=30800.0, ans=0.0 2023-11-18 03:10:44,333 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.45 vs. limit=6.0 2023-11-18 03:11:06,018 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4650, loss[loss=0.1669, simple_loss=0.1568, pruned_loss=0.07426, audio_tagging_loss=0.01425, over 15345.00 frames. ], tot_loss[loss=0.1638, simple_loss=0.158, pruned_loss=0.07082, audio_tagging_loss=0.01397, over 3053615.29 frames. ], batch size: 57, lr: 4.15e-02, grad_scale: 64.0 2023-11-18 03:11:07,729 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.05 vs. limit=22.5 2023-11-18 03:11:18,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=31066.666666666668, ans=0.125 2023-11-18 03:11:31,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=31133.333333333332, ans=0.125 2023-11-18 03:11:55,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=31266.666666666668, ans=0.125 2023-11-18 03:12:02,404 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4700, loss[loss=0.1777, simple_loss=0.1745, pruned_loss=0.07741, audio_tagging_loss=0.01303, over 15440.00 frames. ], tot_loss[loss=0.164, simple_loss=0.1583, pruned_loss=0.07095, audio_tagging_loss=0.01392, over 3051094.67 frames. ], batch size: 56, lr: 4.14e-02, grad_scale: 64.0 2023-11-18 03:12:05,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.01 vs. limit=15.0 2023-11-18 03:12:07,955 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.482e+01 1.109e+02 1.199e+02 1.387e+02 2.796e+02, threshold=2.398e+02, percent-clipped=1.0 2023-11-18 03:12:15,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=31400.0, ans=0.07 2023-11-18 03:12:22,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=31400.0, ans=0.1 2023-11-18 03:12:58,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.93 vs. limit=15.0 2023-11-18 03:12:58,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=31666.666666666668, ans=0.1 2023-11-18 03:12:59,575 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4750, loss[loss=0.1686, simple_loss=0.1582, pruned_loss=0.07703, audio_tagging_loss=0.01246, over 14955.00 frames. ], tot_loss[loss=0.162, simple_loss=0.1558, pruned_loss=0.06995, audio_tagging_loss=0.01412, over 3057540.08 frames. ], batch size: 56, lr: 4.14e-02, grad_scale: 64.0 2023-11-18 03:13:02,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.81 vs. limit=22.5 2023-11-18 03:13:20,547 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.32 vs. limit=15.0 2023-11-18 03:13:27,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=31800.0, ans=0.003956521739130435 2023-11-18 03:13:29,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=31800.0, ans=0.1 2023-11-18 03:13:50,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=31933.333333333332, ans=0.95 2023-11-18 03:13:55,722 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4800, loss[loss=0.1541, simple_loss=0.1439, pruned_loss=0.06482, audio_tagging_loss=0.01733, over 15459.00 frames. ], tot_loss[loss=0.1617, simple_loss=0.1558, pruned_loss=0.06951, audio_tagging_loss=0.01431, over 3061159.40 frames. ], batch size: 59, lr: 4.13e-02, grad_scale: 64.0 2023-11-18 03:13:56,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=32000.0, ans=0.5 2023-11-18 03:13:59,888 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.623e+01 1.059e+02 1.265e+02 1.558e+02 2.176e+02, threshold=2.529e+02, percent-clipped=0.0 2023-11-18 03:14:12,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=32066.666666666668, ans=0.0 2023-11-18 03:14:47,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=32266.666666666668, ans=0.0 2023-11-18 03:14:51,911 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4850, loss[loss=0.1293, simple_loss=0.1269, pruned_loss=0.05137, audio_tagging_loss=0.01448, over 14878.00 frames. ], tot_loss[loss=0.1604, simple_loss=0.1545, pruned_loss=0.0687, audio_tagging_loss=0.01441, over 3052400.83 frames. ], batch size: 57, lr: 4.12e-02, grad_scale: 64.0 2023-11-18 03:15:22,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=32466.666666666668, ans=22.5 2023-11-18 03:15:28,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.80 vs. limit=22.5 2023-11-18 03:15:31,886 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.21 vs. limit=22.5 2023-11-18 03:15:35,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=32600.0, ans=0.2 2023-11-18 03:15:43,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=32600.0, ans=0.0 2023-11-18 03:15:48,648 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4900, loss[loss=0.1822, simple_loss=0.1735, pruned_loss=0.08128, audio_tagging_loss=0.01421, over 14864.00 frames. ], tot_loss[loss=0.161, simple_loss=0.1555, pruned_loss=0.06911, audio_tagging_loss=0.01413, over 3047376.12 frames. ], batch size: 55, lr: 4.12e-02, grad_scale: 64.0 2023-11-18 03:15:52,885 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.066e+01 1.045e+02 1.197e+02 1.386e+02 2.012e+02, threshold=2.394e+02, percent-clipped=0.0 2023-11-18 03:15:59,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.84 vs. limit=10.0 2023-11-18 03:16:04,295 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.56 vs. limit=22.5 2023-11-18 03:16:24,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=32866.666666666664, ans=0.1 2023-11-18 03:16:25,911 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:16:27,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=32866.666666666664, ans=0.5 2023-11-18 03:16:34,731 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.62 vs. limit=15.0 2023-11-18 03:16:35,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=32933.333333333336, ans=0.0037101449275362313 2023-11-18 03:16:43,795 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 4950, loss[loss=0.1497, simple_loss=0.1431, pruned_loss=0.06362, audio_tagging_loss=0.01453, over 16387.00 frames. ], tot_loss[loss=0.1597, simple_loss=0.1546, pruned_loss=0.06856, audio_tagging_loss=0.01391, over 3051714.47 frames. ], batch size: 65, lr: 4.11e-02, grad_scale: 64.0 2023-11-18 03:16:44,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=33000.0, ans=0.0 2023-11-18 03:16:47,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=33000.0, ans=0.125 2023-11-18 03:17:12,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=33133.333333333336, ans=0.0 2023-11-18 03:17:30,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=33266.666666666664, ans=0.0 2023-11-18 03:17:40,581 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5000, loss[loss=0.1787, simple_loss=0.184, pruned_loss=0.0745, audio_tagging_loss=0.0122, over 15252.00 frames. ], tot_loss[loss=0.1582, simple_loss=0.1537, pruned_loss=0.0676, audio_tagging_loss=0.01372, over 3051919.30 frames. ], batch size: 54, lr: 4.10e-02, grad_scale: 64.0 2023-11-18 03:17:41,166 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.21 vs. limit=10.0 2023-11-18 03:17:45,399 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 1.071e+02 1.252e+02 1.412e+02 1.907e+02, threshold=2.505e+02, percent-clipped=0.0 2023-11-18 03:17:47,042 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.41 vs. limit=22.5 2023-11-18 03:18:11,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=33466.666666666664, ans=0.0 2023-11-18 03:18:30,775 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.983e+00 2023-11-18 03:18:37,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=33666.666666666664, ans=0.1 2023-11-18 03:18:38,045 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5050, loss[loss=0.1738, simple_loss=0.1679, pruned_loss=0.07436, audio_tagging_loss=0.01552, over 14581.00 frames. ], tot_loss[loss=0.1587, simple_loss=0.1542, pruned_loss=0.06789, audio_tagging_loss=0.01369, over 3052056.02 frames. ], batch size: 54, lr: 4.10e-02, grad_scale: 64.0 2023-11-18 03:18:51,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=33733.333333333336, ans=0.09899494936611666 2023-11-18 03:18:58,585 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=1.318e-02 2023-11-18 03:19:01,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.00 vs. limit=15.0 2023-11-18 03:19:24,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=33933.333333333336, ans=0.2 2023-11-18 03:19:32,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=34000.0, ans=0.125 2023-11-18 03:19:33,397 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5100, loss[loss=0.1703, simple_loss=0.1694, pruned_loss=0.07297, audio_tagging_loss=0.01267, over 15463.00 frames. ], tot_loss[loss=0.159, simple_loss=0.1551, pruned_loss=0.06804, audio_tagging_loss=0.01344, over 3050673.54 frames. ], batch size: 56, lr: 4.09e-02, grad_scale: 64.0 2023-11-18 03:19:37,618 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 1.065e+02 1.271e+02 1.460e+02 2.434e+02, threshold=2.541e+02, percent-clipped=0.0 2023-11-18 03:19:42,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=34000.0, ans=0.0 2023-11-18 03:19:42,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=34000.0, ans=0.125 2023-11-18 03:19:53,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=34066.666666666664, ans=0.1 2023-11-18 03:20:11,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.42 vs. limit=15.0 2023-11-18 03:20:29,381 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5150, loss[loss=0.1513, simple_loss=0.1532, pruned_loss=0.06366, audio_tagging_loss=0.01099, over 15998.00 frames. ], tot_loss[loss=0.1582, simple_loss=0.1542, pruned_loss=0.06767, audio_tagging_loss=0.0135, over 3049098.73 frames. ], batch size: 57, lr: 4.09e-02, grad_scale: 64.0 2023-11-18 03:20:29,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=34333.333333333336, ans=0.0 2023-11-18 03:20:30,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=34333.333333333336, ans=0.125 2023-11-18 03:20:34,658 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.13 vs. limit=15.0 2023-11-18 03:20:35,781 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.40 vs. limit=12.0 2023-11-18 03:20:44,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=34400.0, ans=0.125 2023-11-18 03:20:55,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=34466.666666666664, ans=0.09899494936611666 2023-11-18 03:20:58,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=34466.666666666664, ans=0.125 2023-11-18 03:21:12,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=34533.333333333336, ans=0.0033623188405797096 2023-11-18 03:21:21,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=34600.0, ans=0.0 2023-11-18 03:21:26,264 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5200, loss[loss=0.1764, simple_loss=0.1742, pruned_loss=0.07648, audio_tagging_loss=0.01279, over 16115.00 frames. ], tot_loss[loss=0.1579, simple_loss=0.1541, pruned_loss=0.06737, audio_tagging_loss=0.01342, over 3048327.82 frames. ], batch size: 59, lr: 4.08e-02, grad_scale: 64.0 2023-11-18 03:21:30,529 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.892e+01 1.044e+02 1.171e+02 1.375e+02 2.529e+02, threshold=2.342e+02, percent-clipped=0.0 2023-11-18 03:21:46,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.42 vs. limit=15.0 2023-11-18 03:22:11,504 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=17.34 vs. limit=15.0 2023-11-18 03:22:12,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=34933.333333333336, ans=0.1 2023-11-18 03:22:18,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2023-11-18 03:22:22,082 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5250, loss[loss=0.1901, simple_loss=0.1864, pruned_loss=0.08384, audio_tagging_loss=0.01301, over 16423.00 frames. ], tot_loss[loss=0.1591, simple_loss=0.1556, pruned_loss=0.06801, audio_tagging_loss=0.01336, over 3048517.87 frames. ], batch size: 64, lr: 4.07e-02, grad_scale: 64.0 2023-11-18 03:22:38,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=35066.666666666664, ans=0.1 2023-11-18 03:22:39,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=35066.666666666664, ans=0.125 2023-11-18 03:22:43,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=35133.333333333336, ans=0.09899494936611666 2023-11-18 03:22:53,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=35133.333333333336, ans=0.0032318840579710142 2023-11-18 03:22:59,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=35200.0, ans=0.125 2023-11-18 03:23:06,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=35266.666666666664, ans=0.0 2023-11-18 03:23:10,403 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=19.72 vs. limit=22.5 2023-11-18 03:23:16,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=35333.333333333336, ans=0.125 2023-11-18 03:23:18,041 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5300, loss[loss=0.1455, simple_loss=0.1416, pruned_loss=0.06242, audio_tagging_loss=0.01231, over 15097.00 frames. ], tot_loss[loss=0.1592, simple_loss=0.1558, pruned_loss=0.06796, audio_tagging_loss=0.01335, over 3046655.21 frames. ], batch size: 60, lr: 4.07e-02, grad_scale: 64.0 2023-11-18 03:23:22,277 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.314e+01 1.054e+02 1.180e+02 1.432e+02 2.621e+02, threshold=2.360e+02, percent-clipped=2.0 2023-11-18 03:23:27,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=35333.333333333336, ans=0.125 2023-11-18 03:23:39,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=35466.666666666664, ans=0.125 2023-11-18 03:23:57,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=35533.333333333336, ans=0.2 2023-11-18 03:24:14,619 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5350, loss[loss=0.116, simple_loss=0.1029, pruned_loss=0.04532, audio_tagging_loss=0.01929, over 13957.00 frames. ], tot_loss[loss=0.1587, simple_loss=0.1556, pruned_loss=0.06748, audio_tagging_loss=0.01338, over 3042794.17 frames. ], batch size: 55, lr: 4.06e-02, grad_scale: 64.0 2023-11-18 03:24:35,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=35800.0, ans=0.04949747468305833 2023-11-18 03:24:40,029 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.92 vs. limit=22.5 2023-11-18 03:24:49,805 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.86 vs. limit=15.0 2023-11-18 03:25:10,843 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5400, loss[loss=0.2136, simple_loss=0.2157, pruned_loss=0.09598, audio_tagging_loss=0.009764, over 15377.00 frames. ], tot_loss[loss=0.1577, simple_loss=0.1539, pruned_loss=0.06713, audio_tagging_loss=0.01363, over 3044275.19 frames. ], batch size: 55, lr: 4.05e-02, grad_scale: 64.0 2023-11-18 03:25:12,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=36000.0, ans=0.95 2023-11-18 03:25:15,060 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.377e+01 1.086e+02 1.314e+02 1.571e+02 2.162e+02, threshold=2.627e+02, percent-clipped=0.0 2023-11-18 03:25:27,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=36066.666666666664, ans=0.1 2023-11-18 03:25:35,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=36133.333333333336, ans=0.003014492753623188 2023-11-18 03:25:35,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=36133.333333333336, ans=0.0 2023-11-18 03:25:38,698 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.48 vs. limit=15.0 2023-11-18 03:25:44,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=36200.0, ans=0.0 2023-11-18 03:25:46,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=36200.0, ans=0.2 2023-11-18 03:25:47,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=36200.0, ans=0.2 2023-11-18 03:26:01,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=36266.666666666664, ans=0.1 2023-11-18 03:26:03,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=36266.666666666664, ans=0.2 2023-11-18 03:26:06,203 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5450, loss[loss=0.1485, simple_loss=0.1419, pruned_loss=0.06302, audio_tagging_loss=0.01451, over 14549.00 frames. ], tot_loss[loss=0.1583, simple_loss=0.1549, pruned_loss=0.06728, audio_tagging_loss=0.0136, over 3040283.44 frames. ], batch size: 56, lr: 4.05e-02, grad_scale: 64.0 2023-11-18 03:26:10,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=36333.333333333336, ans=0.1 2023-11-18 03:26:15,288 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.88 vs. limit=22.5 2023-11-18 03:26:36,235 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.34 vs. limit=15.0 2023-11-18 03:26:39,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=36533.333333333336, ans=0.125 2023-11-18 03:26:49,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=36533.333333333336, ans=0.125 2023-11-18 03:26:52,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.63 vs. limit=12.0 2023-11-18 03:26:57,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.36 vs. limit=22.5 2023-11-18 03:27:03,291 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5500, loss[loss=0.1682, simple_loss=0.1626, pruned_loss=0.07357, audio_tagging_loss=0.01336, over 15781.00 frames. ], tot_loss[loss=0.1598, simple_loss=0.1567, pruned_loss=0.06795, audio_tagging_loss=0.01349, over 3041091.51 frames. ], batch size: 58, lr: 4.04e-02, grad_scale: 64.0 2023-11-18 03:27:03,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=36666.666666666664, ans=0.2 2023-11-18 03:27:07,490 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.157e+01 1.024e+02 1.184e+02 1.343e+02 1.900e+02, threshold=2.368e+02, percent-clipped=0.0 2023-11-18 03:27:11,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=36666.666666666664, ans=0.1 2023-11-18 03:27:44,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=36866.666666666664, ans=0.1 2023-11-18 03:27:52,905 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=2.53 vs. limit=15.0 2023-11-18 03:27:58,588 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5550, loss[loss=0.1889, simple_loss=0.188, pruned_loss=0.08048, audio_tagging_loss=0.0144, over 15872.00 frames. ], tot_loss[loss=0.1597, simple_loss=0.1567, pruned_loss=0.06777, audio_tagging_loss=0.01359, over 3049049.04 frames. ], batch size: 57, lr: 4.03e-02, grad_scale: 64.0 2023-11-18 03:28:02,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.00 vs. limit=12.0 2023-11-18 03:28:06,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=37000.0, ans=0.0 2023-11-18 03:28:25,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=37133.333333333336, ans=0.5 2023-11-18 03:28:54,735 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5600, loss[loss=0.1711, simple_loss=0.1689, pruned_loss=0.07384, audio_tagging_loss=0.01278, over 15050.00 frames. ], tot_loss[loss=0.1591, simple_loss=0.1563, pruned_loss=0.06726, audio_tagging_loss=0.01367, over 3047388.05 frames. ], batch size: 59, lr: 4.03e-02, grad_scale: 64.0 2023-11-18 03:28:59,531 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.818e+01 1.034e+02 1.195e+02 1.444e+02 2.133e+02, threshold=2.390e+02, percent-clipped=0.0 2023-11-18 03:29:06,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=37400.0, ans=0.1 2023-11-18 03:29:16,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=37466.666666666664, ans=0.0027246376811594216 2023-11-18 03:29:16,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=37466.666666666664, ans=0.125 2023-11-18 03:29:28,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=37533.333333333336, ans=0.95 2023-11-18 03:29:35,261 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:29:42,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=37600.0, ans=0.125 2023-11-18 03:29:45,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=37600.0, ans=0.125 2023-11-18 03:29:47,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=37600.0, ans=0.0 2023-11-18 03:29:48,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.88 vs. limit=10.0 2023-11-18 03:29:51,787 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5650, loss[loss=0.1512, simple_loss=0.1587, pruned_loss=0.05837, audio_tagging_loss=0.01343, over 16154.00 frames. ], tot_loss[loss=0.1586, simple_loss=0.1552, pruned_loss=0.0672, audio_tagging_loss=0.01382, over 3051716.40 frames. ], batch size: 57, lr: 4.02e-02, grad_scale: 128.0 2023-11-18 03:29:53,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=37666.666666666664, ans=0.1 2023-11-18 03:30:01,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=37733.333333333336, ans=0.125 2023-11-18 03:30:20,687 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.65 vs. limit=6.0 2023-11-18 03:30:30,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=37866.666666666664, ans=0.0026376811594202906 2023-11-18 03:30:47,189 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5700, loss[loss=0.1877, simple_loss=0.1934, pruned_loss=0.08208, audio_tagging_loss=0.008907, over 15576.00 frames. ], tot_loss[loss=0.1579, simple_loss=0.1545, pruned_loss=0.06686, audio_tagging_loss=0.01375, over 3052867.12 frames. ], batch size: 57, lr: 4.02e-02, grad_scale: 64.0 2023-11-18 03:30:47,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=38000.0, ans=0.125 2023-11-18 03:30:50,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=38000.0, ans=0.0026086956521739132 2023-11-18 03:30:52,374 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.432e+01 1.093e+02 1.259e+02 1.491e+02 2.385e+02, threshold=2.519e+02, percent-clipped=0.0 2023-11-18 03:30:58,205 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.41 vs. limit=15.0 2023-11-18 03:31:04,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=38066.666666666664, ans=0.125 2023-11-18 03:31:05,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=38066.666666666664, ans=0.0025942028985507255 2023-11-18 03:31:13,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=38133.333333333336, ans=0.002579710144927536 2023-11-18 03:31:17,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=38133.333333333336, ans=0.2 2023-11-18 03:31:25,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=38200.0, ans=0.07 2023-11-18 03:31:36,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=38266.666666666664, ans=0.125 2023-11-18 03:31:40,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=38266.666666666664, ans=0.2 2023-11-18 03:31:42,345 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5750, loss[loss=0.1056, simple_loss=0.09029, pruned_loss=0.04268, audio_tagging_loss=0.01777, over 15133.00 frames. ], tot_loss[loss=0.1557, simple_loss=0.1523, pruned_loss=0.06596, audio_tagging_loss=0.01365, over 3045172.61 frames. ], batch size: 59, lr: 4.01e-02, grad_scale: 32.0 2023-11-18 03:31:46,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=38333.333333333336, ans=0.125 2023-11-18 03:31:50,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=38333.333333333336, ans=0.1 2023-11-18 03:32:04,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=38400.0, ans=0.0 2023-11-18 03:32:04,137 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:32:11,818 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.51 vs. limit=15.0 2023-11-18 03:32:15,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.01 vs. limit=10.0 2023-11-18 03:32:26,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=38600.0, ans=0.2 2023-11-18 03:32:30,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=38600.0, ans=0.125 2023-11-18 03:32:36,956 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.46 vs. limit=22.5 2023-11-18 03:32:39,949 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5800, loss[loss=0.1627, simple_loss=0.1713, pruned_loss=0.06544, audio_tagging_loss=0.01161, over 15412.00 frames. ], tot_loss[loss=0.156, simple_loss=0.1525, pruned_loss=0.06625, audio_tagging_loss=0.01355, over 3047505.33 frames. ], batch size: 59, lr: 4.00e-02, grad_scale: 32.0 2023-11-18 03:32:46,933 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.270e+01 1.065e+02 1.200e+02 1.362e+02 2.023e+02, threshold=2.399e+02, percent-clipped=0.0 2023-11-18 03:32:53,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=38733.333333333336, ans=0.125 2023-11-18 03:32:55,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=38733.333333333336, ans=0.09899494936611666 2023-11-18 03:33:01,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2023-11-18 03:33:06,600 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.77 vs. limit=22.5 2023-11-18 03:33:17,354 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.88 vs. limit=12.0 2023-11-18 03:33:17,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=38866.666666666664, ans=0.1 2023-11-18 03:33:25,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=38933.333333333336, ans=0.0024057971014492746 2023-11-18 03:33:29,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff2.min_abs, batch_count=38933.333333333336, ans=0.1 2023-11-18 03:33:35,988 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5850, loss[loss=0.1506, simple_loss=0.1419, pruned_loss=0.06108, audio_tagging_loss=0.0186, over 14798.00 frames. ], tot_loss[loss=0.1559, simple_loss=0.1522, pruned_loss=0.06621, audio_tagging_loss=0.01357, over 3045046.06 frames. ], batch size: 54, lr: 4.00e-02, grad_scale: 32.0 2023-11-18 03:33:36,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=39000.0, ans=0.025 2023-11-18 03:33:46,264 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.52 vs. limit=10.0 2023-11-18 03:33:48,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=39066.666666666664, ans=0.2 2023-11-18 03:34:13,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=39200.0, ans=0.125 2023-11-18 03:34:20,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.53 vs. limit=15.0 2023-11-18 03:34:21,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=39266.666666666664, ans=0.025 2023-11-18 03:34:26,037 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=17.30 vs. limit=15.0 2023-11-18 03:34:26,058 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.36 vs. limit=22.5 2023-11-18 03:34:28,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=39266.666666666664, ans=0.125 2023-11-18 03:34:31,919 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5900, loss[loss=0.1636, simple_loss=0.1625, pruned_loss=0.06964, audio_tagging_loss=0.01271, over 17003.00 frames. ], tot_loss[loss=0.1557, simple_loss=0.1521, pruned_loss=0.066, audio_tagging_loss=0.01364, over 3050170.79 frames. ], batch size: 66, lr: 3.99e-02, grad_scale: 32.0 2023-11-18 03:34:38,797 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.189e+01 1.114e+02 1.332e+02 1.512e+02 2.705e+02, threshold=2.665e+02, percent-clipped=2.0 2023-11-18 03:34:39,396 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=17.32 vs. limit=15.0 2023-11-18 03:34:52,411 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=8.696e+00 2023-11-18 03:34:55,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=39466.666666666664, ans=0.125 2023-11-18 03:35:00,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=39466.666666666664, ans=0.0 2023-11-18 03:35:07,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=39533.333333333336, ans=0.125 2023-11-18 03:35:28,910 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 5950, loss[loss=0.1814, simple_loss=0.1833, pruned_loss=0.07678, audio_tagging_loss=0.01297, over 15498.00 frames. ], tot_loss[loss=0.1565, simple_loss=0.1534, pruned_loss=0.0663, audio_tagging_loss=0.01354, over 3053490.96 frames. ], batch size: 57, lr: 3.98e-02, grad_scale: 32.0 2023-11-18 03:35:34,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=39666.666666666664, ans=0.125 2023-11-18 03:35:35,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=39666.666666666664, ans=0.125 2023-11-18 03:35:41,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=39733.333333333336, ans=0.0022318840579710142 2023-11-18 03:35:52,388 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.96 vs. limit=22.5 2023-11-18 03:36:11,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=39866.666666666664, ans=0.0 2023-11-18 03:36:19,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=39933.333333333336, ans=0.125 2023-11-18 03:36:24,716 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6000, loss[loss=0.2085, simple_loss=0.2147, pruned_loss=0.09192, audio_tagging_loss=0.009275, over 16100.00 frames. ], tot_loss[loss=0.1554, simple_loss=0.1522, pruned_loss=0.06573, audio_tagging_loss=0.01358, over 3055680.47 frames. ], batch size: 58, lr: 3.98e-02, grad_scale: 32.0 2023-11-18 03:36:24,720 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 03:36:58,785 INFO [train_asr.py:1147] (0/4) Epoch 1, validation: loss=0.1009, simple_loss=0.07718, pruned_loss=0.02169, audio_tagging_loss=0.04066, over 4681554.00 frames. 2023-11-18 03:36:58,786 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 03:37:05,259 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 1.087e+02 1.275e+02 1.499e+02 2.354e+02, threshold=2.549e+02, percent-clipped=0.0 2023-11-18 03:37:15,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=40066.666666666664, ans=0.125 2023-11-18 03:37:32,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=40200.0, ans=0.125 2023-11-18 03:37:40,587 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:37:42,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=40266.666666666664, ans=0.2 2023-11-18 03:37:55,888 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6050, loss[loss=0.1333, simple_loss=0.1256, pruned_loss=0.05367, audio_tagging_loss=0.01685, over 15958.00 frames. ], tot_loss[loss=0.1562, simple_loss=0.153, pruned_loss=0.06614, audio_tagging_loss=0.01351, over 3062891.30 frames. ], batch size: 61, lr: 3.97e-02, grad_scale: 32.0 2023-11-18 03:38:00,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=40333.333333333336, ans=0.002101449275362318 2023-11-18 03:38:02,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=40333.333333333336, ans=0.002101449275362318 2023-11-18 03:38:05,543 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.93 vs. limit=15.0 2023-11-18 03:38:06,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=40400.0, ans=0.0020869565217391303 2023-11-18 03:38:07,772 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.20 vs. limit=22.5 2023-11-18 03:38:18,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=40466.666666666664, ans=0.125 2023-11-18 03:38:48,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=40600.0, ans=0.0 2023-11-18 03:38:52,469 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6100, loss[loss=0.1212, simple_loss=0.1176, pruned_loss=0.05024, audio_tagging_loss=0.01214, over 15246.00 frames. ], tot_loss[loss=0.1556, simple_loss=0.1524, pruned_loss=0.06578, audio_tagging_loss=0.01358, over 3061784.84 frames. ], batch size: 56, lr: 3.96e-02, grad_scale: 32.0 2023-11-18 03:38:54,273 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.09 vs. limit=12.0 2023-11-18 03:38:58,885 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.303e+01 1.093e+02 1.234e+02 1.511e+02 2.648e+02, threshold=2.468e+02, percent-clipped=3.0 2023-11-18 03:39:08,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=40733.333333333336, ans=0.1 2023-11-18 03:39:11,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=40733.333333333336, ans=0.125 2023-11-18 03:39:14,241 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.89 vs. limit=15.0 2023-11-18 03:39:33,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=40866.666666666664, ans=0.125 2023-11-18 03:39:35,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=40866.666666666664, ans=0.1 2023-11-18 03:39:38,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=40933.333333333336, ans=0.07 2023-11-18 03:39:40,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=40933.333333333336, ans=0.05 2023-11-18 03:39:48,161 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6150, loss[loss=0.1815, simple_loss=0.1853, pruned_loss=0.07607, audio_tagging_loss=0.01278, over 15997.00 frames. ], tot_loss[loss=0.157, simple_loss=0.1538, pruned_loss=0.0665, audio_tagging_loss=0.01363, over 3059264.47 frames. ], batch size: 59, lr: 3.96e-02, grad_scale: 32.0 2023-11-18 03:39:52,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=41000.0, ans=0.025 2023-11-18 03:40:36,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=41266.666666666664, ans=0.1 2023-11-18 03:40:42,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=41266.666666666664, ans=0.0 2023-11-18 03:40:42,610 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.47 vs. limit=15.0 2023-11-18 03:40:44,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=41333.333333333336, ans=0.125 2023-11-18 03:40:45,694 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6200, loss[loss=0.1457, simple_loss=0.1416, pruned_loss=0.06216, audio_tagging_loss=0.01274, over 14928.00 frames. ], tot_loss[loss=0.1557, simple_loss=0.1526, pruned_loss=0.06566, audio_tagging_loss=0.01371, over 3048697.80 frames. ], batch size: 58, lr: 3.95e-02, grad_scale: 32.0 2023-11-18 03:40:46,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=41333.333333333336, ans=0.2 2023-11-18 03:40:53,129 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.054e+01 1.077e+02 1.264e+02 1.430e+02 2.412e+02, threshold=2.529e+02, percent-clipped=0.0 2023-11-18 03:41:42,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=41666.666666666664, ans=0.0 2023-11-18 03:41:43,086 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6250, loss[loss=0.1234, simple_loss=0.1268, pruned_loss=0.0484, audio_tagging_loss=0.01163, over 14818.00 frames. ], tot_loss[loss=0.1524, simple_loss=0.1492, pruned_loss=0.06392, audio_tagging_loss=0.01383, over 3046462.09 frames. ], batch size: 56, lr: 3.94e-02, grad_scale: 32.0 2023-11-18 03:41:45,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=41666.666666666664, ans=0.125 2023-11-18 03:41:46,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=41666.666666666664, ans=0.0018115942028985518 2023-11-18 03:41:56,841 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=3.19 vs. limit=15.0 2023-11-18 03:41:57,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=4.23 vs. limit=15.0 2023-11-18 03:42:05,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=41800.0, ans=0.2 2023-11-18 03:42:06,820 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.70 vs. limit=15.0 2023-11-18 03:42:17,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=41866.666666666664, ans=0.1 2023-11-18 03:42:29,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=41933.333333333336, ans=0.125 2023-11-18 03:42:31,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=41933.333333333336, ans=0.0 2023-11-18 03:42:37,603 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=6.0 2023-11-18 03:42:38,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=42000.0, ans=0.2 2023-11-18 03:42:39,083 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6300, loss[loss=0.2023, simple_loss=0.2063, pruned_loss=0.08882, audio_tagging_loss=0.01028, over 15186.00 frames. ], tot_loss[loss=0.1522, simple_loss=0.1493, pruned_loss=0.06363, audio_tagging_loss=0.01395, over 3048918.91 frames. ], batch size: 55, lr: 3.94e-02, grad_scale: 32.0 2023-11-18 03:42:46,036 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.073e+01 1.061e+02 1.176e+02 1.388e+02 2.867e+02, threshold=2.352e+02, percent-clipped=1.0 2023-11-18 03:42:52,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=42066.666666666664, ans=0.125 2023-11-18 03:42:59,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=42066.666666666664, ans=0.125 2023-11-18 03:43:26,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=42266.666666666664, ans=0.125 2023-11-18 03:43:36,577 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6350, loss[loss=0.1605, simple_loss=0.16, pruned_loss=0.06341, audio_tagging_loss=0.01705, over 15687.00 frames. ], tot_loss[loss=0.1525, simple_loss=0.1498, pruned_loss=0.06372, audio_tagging_loss=0.01388, over 3053395.51 frames. ], batch size: 58, lr: 3.93e-02, grad_scale: 32.0 2023-11-18 03:43:42,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=42333.333333333336, ans=0.1 2023-11-18 03:43:47,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=16.74 vs. limit=22.5 2023-11-18 03:43:49,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=42400.0, ans=0.1 2023-11-18 03:44:16,682 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.68 vs. limit=6.0 2023-11-18 03:44:17,883 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:44:19,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=42533.333333333336, ans=0.125 2023-11-18 03:44:34,103 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6400, loss[loss=0.1659, simple_loss=0.1568, pruned_loss=0.07437, audio_tagging_loss=0.01317, over 14465.00 frames. ], tot_loss[loss=0.1532, simple_loss=0.1503, pruned_loss=0.06403, audio_tagging_loss=0.01396, over 3047551.19 frames. ], batch size: 55, lr: 3.92e-02, grad_scale: 32.0 2023-11-18 03:44:40,524 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.367e+01 1.120e+02 1.287e+02 1.674e+02 2.598e+02, threshold=2.575e+02, percent-clipped=2.0 2023-11-18 03:44:40,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=42666.666666666664, ans=0.125 2023-11-18 03:45:15,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=42866.666666666664, ans=0.0 2023-11-18 03:45:21,311 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.78 vs. limit=15.0 2023-11-18 03:45:29,972 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.73 vs. limit=12.0 2023-11-18 03:45:30,274 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6450, loss[loss=0.0947, simple_loss=0.08384, pruned_loss=0.03684, audio_tagging_loss=0.01595, over 15851.00 frames. ], tot_loss[loss=0.1534, simple_loss=0.1505, pruned_loss=0.06407, audio_tagging_loss=0.01412, over 3052112.37 frames. ], batch size: 61, lr: 3.92e-02, grad_scale: 32.0 2023-11-18 03:45:51,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=43066.666666666664, ans=0.2 2023-11-18 03:45:56,229 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.45 vs. limit=15.0 2023-11-18 03:46:00,592 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.51 vs. limit=22.5 2023-11-18 03:46:21,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=43266.666666666664, ans=0.125 2023-11-18 03:46:27,408 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6500, loss[loss=0.123, simple_loss=0.1271, pruned_loss=0.04594, audio_tagging_loss=0.01356, over 15308.00 frames. ], tot_loss[loss=0.154, simple_loss=0.1513, pruned_loss=0.06439, audio_tagging_loss=0.01392, over 3051298.40 frames. ], batch size: 57, lr: 3.91e-02, grad_scale: 32.0 2023-11-18 03:46:30,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=43333.333333333336, ans=0.125 2023-11-18 03:46:33,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.09 vs. limit=22.5 2023-11-18 03:46:34,392 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.242e+01 1.070e+02 1.244e+02 1.503e+02 2.306e+02, threshold=2.488e+02, percent-clipped=0.0 2023-11-18 03:46:34,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=43333.333333333336, ans=0.0 2023-11-18 03:46:37,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=43400.0, ans=0.125 2023-11-18 03:47:12,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=43600.0, ans=0.2 2023-11-18 03:47:24,062 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6550, loss[loss=0.1241, simple_loss=0.1118, pruned_loss=0.04993, audio_tagging_loss=0.01832, over 15237.00 frames. ], tot_loss[loss=0.1522, simple_loss=0.1493, pruned_loss=0.06371, audio_tagging_loss=0.01383, over 3049064.71 frames. ], batch size: 57, lr: 3.91e-02, grad_scale: 32.0 2023-11-18 03:47:33,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=43666.666666666664, ans=0.125 2023-11-18 03:47:37,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=43733.333333333336, ans=0.125 2023-11-18 03:47:43,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=43733.333333333336, ans=0.025 2023-11-18 03:48:02,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.20 vs. limit=15.0 2023-11-18 03:48:08,819 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2023-11-18 03:48:10,802 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.72 vs. limit=15.0 2023-11-18 03:48:16,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=43933.333333333336, ans=0.125 2023-11-18 03:48:20,311 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=22.21 vs. limit=22.5 2023-11-18 03:48:20,979 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6600, loss[loss=0.1235, simple_loss=0.1139, pruned_loss=0.05298, audio_tagging_loss=0.01363, over 15025.00 frames. ], tot_loss[loss=0.1507, simple_loss=0.1478, pruned_loss=0.06299, audio_tagging_loss=0.01381, over 3045288.36 frames. ], batch size: 58, lr: 3.90e-02, grad_scale: 32.0 2023-11-18 03:48:21,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=44000.0, ans=0.125 2023-11-18 03:48:28,032 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.463e+01 1.068e+02 1.215e+02 1.424e+02 2.055e+02, threshold=2.430e+02, percent-clipped=0.0 2023-11-18 03:48:37,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=44066.666666666664, ans=0.125 2023-11-18 03:48:44,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=44133.333333333336, ans=0.125 2023-11-18 03:48:49,206 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.56 vs. limit=15.0 2023-11-18 03:49:11,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=44266.666666666664, ans=0.125 2023-11-18 03:49:13,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=44266.666666666664, ans=0.125 2023-11-18 03:49:14,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=44266.666666666664, ans=0.0 2023-11-18 03:49:17,878 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6650, loss[loss=0.1135, simple_loss=0.1149, pruned_loss=0.0416, audio_tagging_loss=0.01442, over 14682.00 frames. ], tot_loss[loss=0.1511, simple_loss=0.1487, pruned_loss=0.06324, audio_tagging_loss=0.01355, over 3047921.16 frames. ], batch size: 57, lr: 3.89e-02, grad_scale: 32.0 2023-11-18 03:49:38,699 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:49:39,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=44466.666666666664, ans=0.07 2023-11-18 03:49:56,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=44533.333333333336, ans=0.125 2023-11-18 03:50:15,181 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6700, loss[loss=0.1051, simple_loss=0.0993, pruned_loss=0.03935, audio_tagging_loss=0.01609, over 14569.00 frames. ], tot_loss[loss=0.1515, simple_loss=0.1492, pruned_loss=0.06335, audio_tagging_loss=0.01354, over 3044665.91 frames. ], batch size: 54, lr: 3.89e-02, grad_scale: 32.0 2023-11-18 03:50:21,689 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.129e+01 1.020e+02 1.157e+02 1.284e+02 2.181e+02, threshold=2.314e+02, percent-clipped=0.0 2023-11-18 03:50:54,816 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=8.40 vs. limit=15.0 2023-11-18 03:51:04,462 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.11 vs. limit=15.0 2023-11-18 03:51:04,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=44933.333333333336, ans=0.0011014492753623189 2023-11-18 03:51:11,203 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6750, loss[loss=0.1857, simple_loss=0.1809, pruned_loss=0.08297, audio_tagging_loss=0.01227, over 15932.00 frames. ], tot_loss[loss=0.1509, simple_loss=0.1486, pruned_loss=0.06307, audio_tagging_loss=0.01354, over 3041988.05 frames. ], batch size: 58, lr: 3.88e-02, grad_scale: 32.0 2023-11-18 03:51:22,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=45066.666666666664, ans=0.1 2023-11-18 03:51:31,571 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.92 vs. limit=15.0 2023-11-18 03:51:43,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=45133.333333333336, ans=0.125 2023-11-18 03:52:00,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=45266.666666666664, ans=0.125 2023-11-18 03:52:01,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=45266.666666666664, ans=0.0010289855072463782 2023-11-18 03:52:03,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=45266.666666666664, ans=0.1 2023-11-18 03:52:08,436 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6800, loss[loss=0.1628, simple_loss=0.1591, pruned_loss=0.07123, audio_tagging_loss=0.012, over 15935.00 frames. ], tot_loss[loss=0.1516, simple_loss=0.1494, pruned_loss=0.0634, audio_tagging_loss=0.01351, over 3046734.42 frames. ], batch size: 61, lr: 3.87e-02, grad_scale: 32.0 2023-11-18 03:52:12,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=45333.333333333336, ans=0.2 2023-11-18 03:52:15,494 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.237e+01 1.104e+02 1.256e+02 1.386e+02 2.512e+02, threshold=2.511e+02, percent-clipped=1.0 2023-11-18 03:52:24,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=14.97 vs. limit=15.0 2023-11-18 03:52:31,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=45466.666666666664, ans=0.000985507246376813 2023-11-18 03:52:37,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=45466.666666666664, ans=0.1 2023-11-18 03:52:37,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=45466.666666666664, ans=0.125 2023-11-18 03:52:52,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=45533.333333333336, ans=12.0 2023-11-18 03:52:57,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=45600.0, ans=0.0 2023-11-18 03:53:05,769 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6850, loss[loss=0.09418, simple_loss=0.08874, pruned_loss=0.03314, audio_tagging_loss=0.01667, over 16301.00 frames. ], tot_loss[loss=0.1522, simple_loss=0.15, pruned_loss=0.06359, audio_tagging_loss=0.01358, over 3048719.47 frames. ], batch size: 63, lr: 3.87e-02, grad_scale: 32.0 2023-11-18 03:53:40,466 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.35 vs. limit=15.0 2023-11-18 03:53:52,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=45933.333333333336, ans=0.0008840579710144916 2023-11-18 03:53:56,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=45933.333333333336, ans=0.0 2023-11-18 03:54:01,997 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6900, loss[loss=0.1891, simple_loss=0.1881, pruned_loss=0.08202, audio_tagging_loss=0.01301, over 16491.00 frames. ], tot_loss[loss=0.1532, simple_loss=0.1512, pruned_loss=0.06388, audio_tagging_loss=0.01367, over 3047370.63 frames. ], batch size: 59, lr: 3.86e-02, grad_scale: 32.0 2023-11-18 03:54:08,335 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.826e+01 1.075e+02 1.233e+02 1.509e+02 2.353e+02, threshold=2.467e+02, percent-clipped=0.0 2023-11-18 03:54:23,085 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.50 vs. limit=12.0 2023-11-18 03:54:45,373 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 03:54:55,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=46266.666666666664, ans=0.1 2023-11-18 03:54:58,462 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 6950, loss[loss=0.2001, simple_loss=0.2099, pruned_loss=0.08535, audio_tagging_loss=0.009831, over 14858.00 frames. ], tot_loss[loss=0.1542, simple_loss=0.1523, pruned_loss=0.06439, audio_tagging_loss=0.01365, over 3048355.22 frames. ], batch size: 52, lr: 3.85e-02, grad_scale: 32.0 2023-11-18 03:55:17,276 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.65 vs. limit=22.5 2023-11-18 03:55:38,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=46533.333333333336, ans=0.2 2023-11-18 03:55:38,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=46533.333333333336, ans=0.125 2023-11-18 03:55:49,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=46600.0, ans=0.07 2023-11-18 03:55:55,781 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7000, loss[loss=0.1617, simple_loss=0.1563, pruned_loss=0.06908, audio_tagging_loss=0.0145, over 16485.00 frames. ], tot_loss[loss=0.1537, simple_loss=0.1516, pruned_loss=0.06423, audio_tagging_loss=0.01364, over 3046012.55 frames. ], batch size: 63, lr: 3.85e-02, grad_scale: 32.0 2023-11-18 03:56:00,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=46666.666666666664, ans=0.125 2023-11-18 03:56:02,185 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.771e+01 1.138e+02 1.312e+02 1.485e+02 2.708e+02, threshold=2.623e+02, percent-clipped=2.0 2023-11-18 03:56:16,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=46800.0, ans=0.025 2023-11-18 03:56:25,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=46800.0, ans=15.0 2023-11-18 03:56:51,713 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7050, loss[loss=0.1602, simple_loss=0.1652, pruned_loss=0.06273, audio_tagging_loss=0.01486, over 15900.00 frames. ], tot_loss[loss=0.1524, simple_loss=0.1498, pruned_loss=0.0636, audio_tagging_loss=0.01384, over 3043952.33 frames. ], batch size: 60, lr: 3.84e-02, grad_scale: 32.0 2023-11-18 03:56:52,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=47000.0, ans=0.05 2023-11-18 03:56:57,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.36 vs. limit=22.5 2023-11-18 03:57:10,156 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:57:14,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=47133.333333333336, ans=0.0 2023-11-18 03:57:17,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=47133.333333333336, ans=0.125 2023-11-18 03:57:29,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=47200.0, ans=0.0 2023-11-18 03:57:38,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=47266.666666666664, ans=0.125 2023-11-18 03:57:43,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=47266.666666666664, ans=0.0005942028985507254 2023-11-18 03:57:47,885 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7100, loss[loss=0.1698, simple_loss=0.1685, pruned_loss=0.06631, audio_tagging_loss=0.0192, over 15525.00 frames. ], tot_loss[loss=0.154, simple_loss=0.1514, pruned_loss=0.06441, audio_tagging_loss=0.01387, over 3049052.77 frames. ], batch size: 55, lr: 3.83e-02, grad_scale: 32.0 2023-11-18 03:57:55,365 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.676e+01 1.058e+02 1.182e+02 1.391e+02 1.929e+02, threshold=2.364e+02, percent-clipped=0.0 2023-11-18 03:58:10,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=47466.666666666664, ans=0.125 2023-11-18 03:58:16,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.99 vs. limit=12.0 2023-11-18 03:58:19,149 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.60 vs. limit=10.0 2023-11-18 03:58:21,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=47533.333333333336, ans=0.125 2023-11-18 03:58:21,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=47533.333333333336, ans=0.125 2023-11-18 03:58:30,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=47533.333333333336, ans=0.125 2023-11-18 03:58:42,146 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.82 vs. limit=5.0 2023-11-18 03:58:45,124 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7150, loss[loss=0.1436, simple_loss=0.1451, pruned_loss=0.0599, audio_tagging_loss=0.01117, over 14634.00 frames. ], tot_loss[loss=0.153, simple_loss=0.1503, pruned_loss=0.0637, audio_tagging_loss=0.01413, over 3047866.51 frames. ], batch size: 56, lr: 3.83e-02, grad_scale: 32.0 2023-11-18 03:58:47,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=47666.666666666664, ans=0.0 2023-11-18 03:58:50,106 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.18 vs. limit=12.0 2023-11-18 03:58:51,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=47666.666666666664, ans=0.0 2023-11-18 03:59:16,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=47866.666666666664, ans=0.07 2023-11-18 03:59:20,491 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.71 vs. limit=22.5 2023-11-18 03:59:21,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=47866.666666666664, ans=0.125 2023-11-18 03:59:38,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=47933.333333333336, ans=0.0 2023-11-18 03:59:40,236 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 03:59:40,954 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7200, loss[loss=0.1538, simple_loss=0.1532, pruned_loss=0.06353, audio_tagging_loss=0.0137, over 15612.00 frames. ], tot_loss[loss=0.1527, simple_loss=0.1504, pruned_loss=0.06345, audio_tagging_loss=0.014, over 3051592.70 frames. ], batch size: 59, lr: 3.82e-02, grad_scale: 32.0 2023-11-18 03:59:47,296 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.394e+01 1.038e+02 1.215e+02 1.416e+02 1.908e+02, threshold=2.429e+02, percent-clipped=0.0 2023-11-18 03:59:49,756 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.994e+00 2023-11-18 03:59:56,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=48066.666666666664, ans=0.000420289855072465 2023-11-18 04:00:12,577 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.77 vs. limit=22.5 2023-11-18 04:00:14,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=48200.0, ans=0.2 2023-11-18 04:00:18,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=48200.0, ans=0.0 2023-11-18 04:00:31,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=48266.666666666664, ans=0.125 2023-11-18 04:00:37,476 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7250, loss[loss=0.1611, simple_loss=0.1561, pruned_loss=0.06969, audio_tagging_loss=0.0134, over 15166.00 frames. ], tot_loss[loss=0.1515, simple_loss=0.1489, pruned_loss=0.06295, audio_tagging_loss=0.01408, over 3045133.61 frames. ], batch size: 59, lr: 3.82e-02, grad_scale: 32.0 2023-11-18 04:00:52,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=48400.0, ans=0.1 2023-11-18 04:01:34,449 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7300, loss[loss=0.194, simple_loss=0.1799, pruned_loss=0.09041, audio_tagging_loss=0.01364, over 14572.00 frames. ], tot_loss[loss=0.1516, simple_loss=0.1493, pruned_loss=0.06323, audio_tagging_loss=0.01372, over 3038882.85 frames. ], batch size: 56, lr: 3.81e-02, grad_scale: 32.0 2023-11-18 04:01:40,952 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.111e+01 1.121e+02 1.282e+02 1.467e+02 2.763e+02, threshold=2.564e+02, percent-clipped=2.0 2023-11-18 04:01:48,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=48733.333333333336, ans=22.5 2023-11-18 04:01:53,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=48733.333333333336, ans=22.5 2023-11-18 04:01:56,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=48800.0, ans=0.125 2023-11-18 04:02:11,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=48866.666666666664, ans=0.125 2023-11-18 04:02:11,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=48866.666666666664, ans=0.2 2023-11-18 04:02:30,021 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7350, loss[loss=0.2143, simple_loss=0.2221, pruned_loss=0.09351, audio_tagging_loss=0.009694, over 15554.00 frames. ], tot_loss[loss=0.1526, simple_loss=0.1507, pruned_loss=0.06362, audio_tagging_loss=0.01358, over 3048568.70 frames. ], batch size: 58, lr: 3.80e-02, grad_scale: 32.0 2023-11-18 04:02:45,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=49066.666666666664, ans=0.0 2023-11-18 04:03:10,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=49200.0, ans=15.0 2023-11-18 04:03:26,828 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7400, loss[loss=0.1707, simple_loss=0.1688, pruned_loss=0.07313, audio_tagging_loss=0.0132, over 16469.00 frames. ], tot_loss[loss=0.1512, simple_loss=0.1494, pruned_loss=0.06295, audio_tagging_loss=0.01354, over 3047207.25 frames. ], batch size: 60, lr: 3.80e-02, grad_scale: 32.0 2023-11-18 04:03:33,830 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.201e+01 1.102e+02 1.229e+02 1.424e+02 2.293e+02, threshold=2.457e+02, percent-clipped=0.0 2023-11-18 04:03:34,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=49333.333333333336, ans=0.0 2023-11-18 04:03:47,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=49400.0, ans=0.125 2023-11-18 04:03:48,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=49466.666666666664, ans=0.1 2023-11-18 04:03:57,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=49466.666666666664, ans=0.125 2023-11-18 04:04:11,467 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.21 vs. limit=15.0 2023-11-18 04:04:11,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=49600.0, ans=0.125 2023-11-18 04:04:14,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=49600.0, ans=0.1 2023-11-18 04:04:23,581 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7450, loss[loss=0.1519, simple_loss=0.1429, pruned_loss=0.06376, audio_tagging_loss=0.01671, over 15522.00 frames. ], tot_loss[loss=0.1521, simple_loss=0.1506, pruned_loss=0.06346, audio_tagging_loss=0.01338, over 3049563.53 frames. ], batch size: 59, lr: 3.79e-02, grad_scale: 32.0 2023-11-18 04:04:26,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=49666.666666666664, ans=0.0 2023-11-18 04:04:47,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=49800.0, ans=0.125 2023-11-18 04:05:04,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=22.99 vs. limit=22.5 2023-11-18 04:05:09,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=49933.333333333336, ans=0.2 2023-11-18 04:05:09,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.85 vs. limit=22.5 2023-11-18 04:05:15,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=49933.333333333336, ans=0.125 2023-11-18 04:05:20,170 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7500, loss[loss=0.1617, simple_loss=0.1564, pruned_loss=0.06938, audio_tagging_loss=0.01412, over 14296.00 frames. ], tot_loss[loss=0.153, simple_loss=0.1514, pruned_loss=0.06388, audio_tagging_loss=0.01335, over 3048997.64 frames. ], batch size: 54, lr: 3.78e-02, grad_scale: 32.0 2023-11-18 04:05:20,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=50000.0, ans=0.125 2023-11-18 04:05:23,838 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=6.19 vs. limit=15.0 2023-11-18 04:05:24,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=50000.0, ans=15.0 2023-11-18 04:05:26,547 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 1.063e+02 1.222e+02 1.436e+02 2.018e+02, threshold=2.444e+02, percent-clipped=0.0 2023-11-18 04:05:31,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=50066.666666666664, ans=0.0 2023-11-18 04:05:35,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.52 vs. limit=15.0 2023-11-18 04:05:41,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=50133.333333333336, ans=0.125 2023-11-18 04:06:04,648 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2023-11-18 04:06:07,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.77 vs. limit=15.0 2023-11-18 04:06:12,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=50266.666666666664, ans=10.0 2023-11-18 04:06:15,870 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7550, loss[loss=0.1924, simple_loss=0.1902, pruned_loss=0.08751, audio_tagging_loss=0.009823, over 16554.00 frames. ], tot_loss[loss=0.1534, simple_loss=0.1521, pruned_loss=0.06405, audio_tagging_loss=0.01325, over 3043646.60 frames. ], batch size: 60, lr: 3.78e-02, grad_scale: 32.0 2023-11-18 04:06:16,045 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:06:18,970 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.84 vs. limit=15.0 2023-11-18 04:06:28,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=50400.0, ans=0.125 2023-11-18 04:06:31,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=50400.0, ans=0.0 2023-11-18 04:06:32,530 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.551e+00 2023-11-18 04:06:54,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=50533.333333333336, ans=0.0 2023-11-18 04:07:12,580 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7600, loss[loss=0.1507, simple_loss=0.1481, pruned_loss=0.0636, audio_tagging_loss=0.01304, over 15757.00 frames. ], tot_loss[loss=0.1522, simple_loss=0.151, pruned_loss=0.06342, audio_tagging_loss=0.01331, over 3040617.63 frames. ], batch size: 59, lr: 3.77e-02, grad_scale: 32.0 2023-11-18 04:07:19,599 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.452e+01 1.053e+02 1.216e+02 1.364e+02 2.093e+02, threshold=2.431e+02, percent-clipped=0.0 2023-11-18 04:07:36,022 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.06 vs. limit=15.0 2023-11-18 04:07:36,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=50800.0, ans=0.1 2023-11-18 04:07:54,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=50866.666666666664, ans=0.0 2023-11-18 04:08:01,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=50933.333333333336, ans=0.125 2023-11-18 04:08:09,179 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7650, loss[loss=0.1501, simple_loss=0.1491, pruned_loss=0.06016, audio_tagging_loss=0.01537, over 15099.00 frames. ], tot_loss[loss=0.1526, simple_loss=0.1514, pruned_loss=0.06358, audio_tagging_loss=0.01335, over 3037433.56 frames. ], batch size: 56, lr: 3.77e-02, grad_scale: 32.0 2023-11-18 04:08:09,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=51000.0, ans=0.125 2023-11-18 04:08:13,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=51000.0, ans=0.1 2023-11-18 04:08:16,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=51000.0, ans=0.0 2023-11-18 04:08:20,938 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.20 vs. limit=15.0 2023-11-18 04:08:25,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=51066.666666666664, ans=0.0 2023-11-18 04:08:38,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.58 vs. limit=15.0 2023-11-18 04:08:42,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=51200.0, ans=0.2 2023-11-18 04:08:44,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=51200.0, ans=0.2 2023-11-18 04:08:51,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=51200.0, ans=0.0 2023-11-18 04:08:54,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=51266.666666666664, ans=0.1 2023-11-18 04:09:05,121 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7700, loss[loss=0.1435, simple_loss=0.1419, pruned_loss=0.05951, audio_tagging_loss=0.01304, over 16853.00 frames. ], tot_loss[loss=0.1525, simple_loss=0.1516, pruned_loss=0.06332, audio_tagging_loss=0.01337, over 3044927.82 frames. ], batch size: 64, lr: 3.76e-02, grad_scale: 32.0 2023-11-18 04:09:09,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=51333.333333333336, ans=0.2 2023-11-18 04:09:10,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=51333.333333333336, ans=0.125 2023-11-18 04:09:12,118 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 1.072e+02 1.285e+02 1.536e+02 2.038e+02, threshold=2.570e+02, percent-clipped=0.0 2023-11-18 04:09:19,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=51400.0, ans=0.2 2023-11-18 04:09:19,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=51400.0, ans=0.2 2023-11-18 04:09:19,871 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.96 vs. limit=6.0 2023-11-18 04:09:37,274 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.36 vs. limit=22.5 2023-11-18 04:09:42,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=51533.333333333336, ans=0.125 2023-11-18 04:10:01,756 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7750, loss[loss=0.1485, simple_loss=0.1568, pruned_loss=0.06181, audio_tagging_loss=0.008308, over 15545.00 frames. ], tot_loss[loss=0.1524, simple_loss=0.1515, pruned_loss=0.06327, audio_tagging_loss=0.01342, over 3046176.71 frames. ], batch size: 59, lr: 3.75e-02, grad_scale: 64.0 2023-11-18 04:10:34,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=51866.666666666664, ans=0.125 2023-11-18 04:10:41,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=51866.666666666664, ans=0.125 2023-11-18 04:10:55,313 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:10:58,485 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7800, loss[loss=0.1765, simple_loss=0.1729, pruned_loss=0.07809, audio_tagging_loss=0.01198, over 16782.00 frames. ], tot_loss[loss=0.1527, simple_loss=0.1516, pruned_loss=0.06338, audio_tagging_loss=0.01353, over 3042024.46 frames. ], batch size: 62, lr: 3.75e-02, grad_scale: 64.0 2023-11-18 04:11:04,836 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.657e+01 1.122e+02 1.272e+02 1.519e+02 2.538e+02, threshold=2.545e+02, percent-clipped=0.0 2023-11-18 04:11:23,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=52133.333333333336, ans=0.125 2023-11-18 04:11:34,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=52200.0, ans=0.2 2023-11-18 04:11:51,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=52266.666666666664, ans=0.0 2023-11-18 04:11:54,973 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7850, loss[loss=0.1613, simple_loss=0.1576, pruned_loss=0.06806, audio_tagging_loss=0.01447, over 15476.00 frames. ], tot_loss[loss=0.1519, simple_loss=0.1507, pruned_loss=0.06299, audio_tagging_loss=0.01359, over 3047220.14 frames. ], batch size: 56, lr: 3.74e-02, grad_scale: 64.0 2023-11-18 04:11:57,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=52333.333333333336, ans=0.1 2023-11-18 04:12:06,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=52400.0, ans=0.0 2023-11-18 04:12:11,466 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.89 vs. limit=22.5 2023-11-18 04:12:22,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=52466.666666666664, ans=0.1 2023-11-18 04:12:35,545 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.46 vs. limit=15.0 2023-11-18 04:12:37,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=52533.333333333336, ans=0.2 2023-11-18 04:12:51,439 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7900, loss[loss=0.117, simple_loss=0.1095, pruned_loss=0.04313, audio_tagging_loss=0.01911, over 15555.00 frames. ], tot_loss[loss=0.1503, simple_loss=0.149, pruned_loss=0.06203, audio_tagging_loss=0.01374, over 3050733.29 frames. ], batch size: 59, lr: 3.73e-02, grad_scale: 64.0 2023-11-18 04:12:58,423 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.654e+01 1.083e+02 1.346e+02 1.574e+02 2.605e+02, threshold=2.691e+02, percent-clipped=2.0 2023-11-18 04:13:11,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=52733.333333333336, ans=0.125 2023-11-18 04:13:34,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=52866.666666666664, ans=0.0 2023-11-18 04:13:38,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=52933.333333333336, ans=0.0 2023-11-18 04:13:43,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=52933.333333333336, ans=0.1 2023-11-18 04:13:47,581 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 7950, loss[loss=0.1095, simple_loss=0.09993, pruned_loss=0.04515, audio_tagging_loss=0.01439, over 14009.00 frames. ], tot_loss[loss=0.1499, simple_loss=0.1487, pruned_loss=0.06176, audio_tagging_loss=0.01379, over 3054604.20 frames. ], batch size: 57, lr: 3.73e-02, grad_scale: 64.0 2023-11-18 04:13:48,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=53000.0, ans=0.125 2023-11-18 04:13:51,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=53000.0, ans=0.025 2023-11-18 04:14:01,632 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:14:05,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=53066.666666666664, ans=0.025 2023-11-18 04:14:25,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=53200.0, ans=0.1 2023-11-18 04:14:26,447 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.00 vs. limit=22.5 2023-11-18 04:14:30,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=15.0 2023-11-18 04:14:41,968 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-8000.pt 2023-11-18 04:14:45,300 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8000, loss[loss=0.1527, simple_loss=0.1483, pruned_loss=0.06418, audio_tagging_loss=0.01433, over 15848.00 frames. ], tot_loss[loss=0.1494, simple_loss=0.1478, pruned_loss=0.06157, audio_tagging_loss=0.01394, over 3050911.16 frames. ], batch size: 58, lr: 3.72e-02, grad_scale: 64.0 2023-11-18 04:14:52,308 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.958e+01 1.064e+02 1.192e+02 1.330e+02 2.518e+02, threshold=2.384e+02, percent-clipped=0.0 2023-11-18 04:14:53,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=53333.333333333336, ans=0.1 2023-11-18 04:15:24,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=53533.333333333336, ans=0.0 2023-11-18 04:15:27,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=53533.333333333336, ans=0.1 2023-11-18 04:15:30,854 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:15:41,085 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8050, loss[loss=0.1715, simple_loss=0.1733, pruned_loss=0.07496, audio_tagging_loss=0.009934, over 14970.00 frames. ], tot_loss[loss=0.1492, simple_loss=0.1478, pruned_loss=0.06142, audio_tagging_loss=0.01387, over 3049697.08 frames. ], batch size: 55, lr: 3.72e-02, grad_scale: 64.0 2023-11-18 04:15:48,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=53666.666666666664, ans=0.0 2023-11-18 04:15:59,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=53733.333333333336, ans=0.2 2023-11-18 04:16:08,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=53800.0, ans=0.0 2023-11-18 04:16:21,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=20.43 vs. limit=15.0 2023-11-18 04:16:27,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=53933.333333333336, ans=0.125 2023-11-18 04:16:37,487 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8100, loss[loss=0.1285, simple_loss=0.1257, pruned_loss=0.05292, audio_tagging_loss=0.01266, over 15211.00 frames. ], tot_loss[loss=0.1482, simple_loss=0.1466, pruned_loss=0.06111, audio_tagging_loss=0.01383, over 3038493.03 frames. ], batch size: 58, lr: 3.71e-02, grad_scale: 64.0 2023-11-18 04:16:40,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=54000.0, ans=0.2 2023-11-18 04:16:43,801 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 1.055e+02 1.175e+02 1.442e+02 1.996e+02, threshold=2.349e+02, percent-clipped=0.0 2023-11-18 04:16:59,536 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.62 vs. limit=22.5 2023-11-18 04:17:05,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=54133.333333333336, ans=0.1 2023-11-18 04:17:10,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=54200.0, ans=0.125 2023-11-18 04:17:14,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=54200.0, ans=0.1 2023-11-18 04:17:26,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=54266.666666666664, ans=0.0 2023-11-18 04:17:30,458 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.43 vs. limit=10.0 2023-11-18 04:17:32,890 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8150, loss[loss=0.1307, simple_loss=0.1335, pruned_loss=0.05142, audio_tagging_loss=0.01251, over 16326.00 frames. ], tot_loss[loss=0.1486, simple_loss=0.1476, pruned_loss=0.06133, audio_tagging_loss=0.01349, over 3041341.18 frames. ], batch size: 64, lr: 3.70e-02, grad_scale: 64.0 2023-11-18 04:17:37,789 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.84 vs. limit=22.5 2023-11-18 04:18:24,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=54600.0, ans=0.05 2023-11-18 04:18:29,083 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8200, loss[loss=0.166, simple_loss=0.1665, pruned_loss=0.06905, audio_tagging_loss=0.01376, over 15159.00 frames. ], tot_loss[loss=0.1485, simple_loss=0.1478, pruned_loss=0.06117, audio_tagging_loss=0.0134, over 3040808.19 frames. ], batch size: 55, lr: 3.70e-02, grad_scale: 32.0 2023-11-18 04:18:30,772 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:18:37,081 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.282e+01 1.076e+02 1.233e+02 1.443e+02 5.591e+02, threshold=2.467e+02, percent-clipped=1.0 2023-11-18 04:18:39,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=54733.333333333336, ans=0.1 2023-11-18 04:18:43,228 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.285e+00 2023-11-18 04:19:06,195 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=8.160e+00 2023-11-18 04:19:07,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=54866.666666666664, ans=0.0 2023-11-18 04:19:14,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=54933.333333333336, ans=0.0 2023-11-18 04:19:21,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=54933.333333333336, ans=0.2 2023-11-18 04:19:25,765 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8250, loss[loss=0.1458, simple_loss=0.1418, pruned_loss=0.06044, audio_tagging_loss=0.01443, over 15175.00 frames. ], tot_loss[loss=0.1479, simple_loss=0.1474, pruned_loss=0.06092, audio_tagging_loss=0.01329, over 3037343.84 frames. ], batch size: 56, lr: 3.69e-02, grad_scale: 32.0 2023-11-18 04:19:28,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=55000.0, ans=0.125 2023-11-18 04:19:34,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=55000.0, ans=0.0 2023-11-18 04:19:34,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=55000.0, ans=0.0 2023-11-18 04:20:21,384 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8300, loss[loss=0.1333, simple_loss=0.1305, pruned_loss=0.05383, audio_tagging_loss=0.01417, over 15101.00 frames. ], tot_loss[loss=0.1481, simple_loss=0.1481, pruned_loss=0.06089, audio_tagging_loss=0.01314, over 3047755.18 frames. ], batch size: 54, lr: 3.68e-02, grad_scale: 32.0 2023-11-18 04:20:22,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.47 vs. limit=15.0 2023-11-18 04:20:23,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=55333.333333333336, ans=0.1 2023-11-18 04:20:28,699 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.605e+01 1.079e+02 1.222e+02 1.465e+02 2.413e+02, threshold=2.444e+02, percent-clipped=0.0 2023-11-18 04:20:28,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=55333.333333333336, ans=0.0 2023-11-18 04:20:30,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=55333.333333333336, ans=0.125 2023-11-18 04:20:31,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=55400.0, ans=0.2 2023-11-18 04:20:50,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=55466.666666666664, ans=0.2 2023-11-18 04:20:56,924 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.45 vs. limit=15.0 2023-11-18 04:21:04,619 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.94 vs. limit=15.0 2023-11-18 04:21:17,279 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8350, loss[loss=0.1227, simple_loss=0.1163, pruned_loss=0.04909, audio_tagging_loss=0.01543, over 14706.00 frames. ], tot_loss[loss=0.1476, simple_loss=0.1475, pruned_loss=0.06065, audio_tagging_loss=0.01318, over 3052747.21 frames. ], batch size: 56, lr: 3.68e-02, grad_scale: 32.0 2023-11-18 04:21:19,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=55666.666666666664, ans=0.125 2023-11-18 04:21:22,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=55666.666666666664, ans=0.2 2023-11-18 04:21:30,841 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:21:46,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=55800.0, ans=0.09899494936611666 2023-11-18 04:21:54,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=55866.666666666664, ans=0.125 2023-11-18 04:22:07,477 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=32.49 vs. limit=22.5 2023-11-18 04:22:13,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=56000.0, ans=0.05 2023-11-18 04:22:14,450 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8400, loss[loss=0.1624, simple_loss=0.166, pruned_loss=0.0683, audio_tagging_loss=0.01106, over 15461.00 frames. ], tot_loss[loss=0.1472, simple_loss=0.1471, pruned_loss=0.06036, audio_tagging_loss=0.01331, over 3054706.71 frames. ], batch size: 59, lr: 3.67e-02, grad_scale: 32.0 2023-11-18 04:22:20,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.39 vs. limit=15.0 2023-11-18 04:22:21,889 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.450e+01 1.072e+02 1.183e+02 1.364e+02 2.045e+02, threshold=2.367e+02, percent-clipped=0.0 2023-11-18 04:22:24,728 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=20.13 vs. limit=22.5 2023-11-18 04:22:36,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=56133.333333333336, ans=0.1 2023-11-18 04:22:39,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=56133.333333333336, ans=0.0 2023-11-18 04:22:51,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=56200.0, ans=0.1 2023-11-18 04:22:58,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.65 vs. limit=15.0 2023-11-18 04:22:59,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=56266.666666666664, ans=0.5 2023-11-18 04:23:04,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=56266.666666666664, ans=0.1 2023-11-18 04:23:09,891 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8450, loss[loss=0.1499, simple_loss=0.1574, pruned_loss=0.05889, audio_tagging_loss=0.01227, over 15349.00 frames. ], tot_loss[loss=0.1479, simple_loss=0.1477, pruned_loss=0.06067, audio_tagging_loss=0.01337, over 3051113.42 frames. ], batch size: 58, lr: 3.67e-02, grad_scale: 32.0 2023-11-18 04:23:16,743 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.82 vs. limit=15.0 2023-11-18 04:23:31,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=56466.666666666664, ans=0.125 2023-11-18 04:23:33,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=56466.666666666664, ans=0.0 2023-11-18 04:23:52,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=56533.333333333336, ans=0.025 2023-11-18 04:24:03,997 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.67 vs. limit=15.0 2023-11-18 04:24:05,590 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8500, loss[loss=0.1243, simple_loss=0.1269, pruned_loss=0.04415, audio_tagging_loss=0.01667, over 15430.00 frames. ], tot_loss[loss=0.1476, simple_loss=0.1473, pruned_loss=0.06045, audio_tagging_loss=0.01346, over 3051098.25 frames. ], batch size: 58, lr: 3.66e-02, grad_scale: 32.0 2023-11-18 04:24:13,495 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 1.081e+02 1.253e+02 1.521e+02 2.592e+02, threshold=2.506e+02, percent-clipped=2.0 2023-11-18 04:24:21,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=56733.333333333336, ans=10.0 2023-11-18 04:24:23,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=56733.333333333336, ans=0.125 2023-11-18 04:24:27,003 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.44 vs. limit=22.5 2023-11-18 04:24:52,015 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.62 vs. limit=22.5 2023-11-18 04:25:02,318 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8550, loss[loss=0.1052, simple_loss=0.09349, pruned_loss=0.04324, audio_tagging_loss=0.01519, over 15926.00 frames. ], tot_loss[loss=0.1466, simple_loss=0.1465, pruned_loss=0.05979, audio_tagging_loss=0.01352, over 3053137.73 frames. ], batch size: 62, lr: 3.65e-02, grad_scale: 32.0 2023-11-18 04:25:56,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=57266.666666666664, ans=0.125 2023-11-18 04:25:58,796 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8600, loss[loss=0.1535, simple_loss=0.1488, pruned_loss=0.06502, audio_tagging_loss=0.01409, over 14688.00 frames. ], tot_loss[loss=0.1468, simple_loss=0.1465, pruned_loss=0.05998, audio_tagging_loss=0.01356, over 3049955.07 frames. ], batch size: 56, lr: 3.65e-02, grad_scale: 32.0 2023-11-18 04:26:01,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=57333.333333333336, ans=0.0 2023-11-18 04:26:06,162 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.067e+01 1.036e+02 1.166e+02 1.373e+02 2.331e+02, threshold=2.332e+02, percent-clipped=0.0 2023-11-18 04:26:29,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=57466.666666666664, ans=0.1 2023-11-18 04:26:39,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=57533.333333333336, ans=0.05 2023-11-18 04:26:47,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=57600.0, ans=0.0 2023-11-18 04:26:52,481 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=8.703e+00 2023-11-18 04:26:55,048 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8650, loss[loss=0.2009, simple_loss=0.2212, pruned_loss=0.08128, audio_tagging_loss=0.009002, over 16925.00 frames. ], tot_loss[loss=0.1487, simple_loss=0.1487, pruned_loss=0.06097, audio_tagging_loss=0.01342, over 3059704.03 frames. ], batch size: 60, lr: 3.64e-02, grad_scale: 32.0 2023-11-18 04:27:17,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=57800.0, ans=0.0 2023-11-18 04:27:18,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=57800.0, ans=0.0 2023-11-18 04:27:21,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=57800.0, ans=0.1 2023-11-18 04:27:46,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=57933.333333333336, ans=0.125 2023-11-18 04:27:46,320 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2023-11-18 04:27:51,168 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8700, loss[loss=0.1388, simple_loss=0.1339, pruned_loss=0.05707, audio_tagging_loss=0.01479, over 16357.00 frames. ], tot_loss[loss=0.148, simple_loss=0.1476, pruned_loss=0.06062, audio_tagging_loss=0.01351, over 3056701.56 frames. ], batch size: 63, lr: 3.64e-02, grad_scale: 32.0 2023-11-18 04:27:56,799 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.81 vs. limit=15.0 2023-11-18 04:27:59,162 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.784e+01 1.148e+02 1.309e+02 1.555e+02 2.620e+02, threshold=2.618e+02, percent-clipped=1.0 2023-11-18 04:28:01,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=58000.0, ans=0.125 2023-11-18 04:28:38,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=58266.666666666664, ans=0.2 2023-11-18 04:28:47,662 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8750, loss[loss=0.1541, simple_loss=0.1609, pruned_loss=0.06119, audio_tagging_loss=0.01247, over 14849.00 frames. ], tot_loss[loss=0.1477, simple_loss=0.1476, pruned_loss=0.06033, audio_tagging_loss=0.01357, over 3057230.04 frames. ], batch size: 56, lr: 3.63e-02, grad_scale: 32.0 2023-11-18 04:28:51,403 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.50 vs. limit=22.5 2023-11-18 04:29:03,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.06 vs. limit=15.0 2023-11-18 04:29:07,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=58400.0, ans=0.125 2023-11-18 04:29:09,983 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2023-11-18 04:29:32,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.88 vs. limit=15.0 2023-11-18 04:29:33,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=58600.0, ans=0.0 2023-11-18 04:29:42,978 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8800, loss[loss=0.1465, simple_loss=0.1505, pruned_loss=0.05857, audio_tagging_loss=0.01264, over 15849.00 frames. ], tot_loss[loss=0.1478, simple_loss=0.1476, pruned_loss=0.06037, audio_tagging_loss=0.01363, over 3057570.58 frames. ], batch size: 61, lr: 3.62e-02, grad_scale: 32.0 2023-11-18 04:29:47,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=58666.666666666664, ans=0.125 2023-11-18 04:29:50,757 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.076e+01 1.175e+02 1.354e+02 1.562e+02 2.721e+02, threshold=2.708e+02, percent-clipped=1.0 2023-11-18 04:29:51,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=58666.666666666664, ans=0.0 2023-11-18 04:30:36,862 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.18 vs. limit=12.0 2023-11-18 04:30:39,317 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8850, loss[loss=0.1449, simple_loss=0.1473, pruned_loss=0.05742, audio_tagging_loss=0.0138, over 14558.00 frames. ], tot_loss[loss=0.1473, simple_loss=0.1474, pruned_loss=0.06009, audio_tagging_loss=0.01354, over 3059553.70 frames. ], batch size: 56, lr: 3.62e-02, grad_scale: 32.0 2023-11-18 04:30:45,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=59000.0, ans=0.2 2023-11-18 04:30:48,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=59000.0, ans=10.0 2023-11-18 04:30:50,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=59066.666666666664, ans=0.125 2023-11-18 04:30:51,661 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:30:54,250 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.65 vs. limit=10.0 2023-11-18 04:31:20,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=59200.0, ans=0.125 2023-11-18 04:31:23,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2023-11-18 04:31:29,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=59266.666666666664, ans=0.05 2023-11-18 04:31:35,260 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8900, loss[loss=0.169, simple_loss=0.1693, pruned_loss=0.06913, audio_tagging_loss=0.01523, over 15058.00 frames. ], tot_loss[loss=0.1469, simple_loss=0.1478, pruned_loss=0.05982, audio_tagging_loss=0.01321, over 3058060.13 frames. ], batch size: 55, lr: 3.61e-02, grad_scale: 32.0 2023-11-18 04:31:38,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=59333.333333333336, ans=0.2 2023-11-18 04:31:43,304 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.181e+01 1.040e+02 1.138e+02 1.318e+02 1.926e+02, threshold=2.277e+02, percent-clipped=0.0 2023-11-18 04:31:47,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=59400.0, ans=0.0 2023-11-18 04:32:04,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=59466.666666666664, ans=0.125 2023-11-18 04:32:27,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=59600.0, ans=0.1 2023-11-18 04:32:27,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=59600.0, ans=0.125 2023-11-18 04:32:30,879 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 8950, loss[loss=0.1619, simple_loss=0.1565, pruned_loss=0.06679, audio_tagging_loss=0.01684, over 15332.00 frames. ], tot_loss[loss=0.1481, simple_loss=0.149, pruned_loss=0.06055, audio_tagging_loss=0.01302, over 3063148.56 frames. ], batch size: 56, lr: 3.61e-02, grad_scale: 32.0 2023-11-18 04:32:50,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=59733.333333333336, ans=0.07 2023-11-18 04:32:52,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=59800.0, ans=0.125 2023-11-18 04:33:16,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=59933.333333333336, ans=0.0 2023-11-18 04:33:16,514 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.61 vs. limit=22.5 2023-11-18 04:33:17,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=59933.333333333336, ans=10.0 2023-11-18 04:33:21,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.89 vs. limit=15.0 2023-11-18 04:33:27,436 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9000, loss[loss=0.1524, simple_loss=0.1438, pruned_loss=0.06542, audio_tagging_loss=0.01511, over 14917.00 frames. ], tot_loss[loss=0.1486, simple_loss=0.1495, pruned_loss=0.06083, audio_tagging_loss=0.01302, over 3054323.10 frames. ], batch size: 54, lr: 3.60e-02, grad_scale: 32.0 2023-11-18 04:33:27,438 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 04:34:01,196 INFO [train_asr.py:1147] (0/4) Epoch 1, validation: loss=0.0967, simple_loss=0.07481, pruned_loss=0.01931, audio_tagging_loss=0.03999, over 4681554.00 frames. 2023-11-18 04:34:01,196 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 04:34:09,102 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 1.047e+02 1.193e+02 1.407e+02 2.407e+02, threshold=2.385e+02, percent-clipped=1.0 2023-11-18 04:34:25,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=60133.333333333336, ans=0.5 2023-11-18 04:34:27,940 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=15.0 2023-11-18 04:34:34,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=60200.0, ans=0.125 2023-11-18 04:34:34,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=60200.0, ans=0.1 2023-11-18 04:34:57,628 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9050, loss[loss=0.1045, simple_loss=0.1017, pruned_loss=0.03978, audio_tagging_loss=0.01393, over 15748.00 frames. ], tot_loss[loss=0.1482, simple_loss=0.1491, pruned_loss=0.06063, audio_tagging_loss=0.01302, over 3050978.89 frames. ], batch size: 59, lr: 3.59e-02, grad_scale: 32.0 2023-11-18 04:35:02,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=60333.333333333336, ans=0.2 2023-11-18 04:35:07,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=60400.0, ans=0.0 2023-11-18 04:35:24,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=60466.666666666664, ans=0.125 2023-11-18 04:35:32,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=60533.333333333336, ans=0.015 2023-11-18 04:35:33,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=60533.333333333336, ans=0.1 2023-11-18 04:35:50,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=60600.0, ans=0.125 2023-11-18 04:35:52,805 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9100, loss[loss=0.1558, simple_loss=0.1518, pruned_loss=0.06684, audio_tagging_loss=0.01305, over 14464.00 frames. ], tot_loss[loss=0.147, simple_loss=0.1481, pruned_loss=0.06003, audio_tagging_loss=0.0129, over 3043916.54 frames. ], batch size: 56, lr: 3.59e-02, grad_scale: 32.0 2023-11-18 04:36:00,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=60666.666666666664, ans=0.125 2023-11-18 04:36:00,846 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.380e+01 1.098e+02 1.291e+02 1.456e+02 2.208e+02, threshold=2.583e+02, percent-clipped=0.0 2023-11-18 04:36:06,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=60733.333333333336, ans=0.125 2023-11-18 04:36:15,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=60800.0, ans=0.125 2023-11-18 04:36:27,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=60866.666666666664, ans=0.125 2023-11-18 04:36:34,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=60866.666666666664, ans=0.0 2023-11-18 04:36:35,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=60866.666666666664, ans=0.125 2023-11-18 04:36:47,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=60933.333333333336, ans=0.09899494936611666 2023-11-18 04:36:48,966 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9150, loss[loss=0.1425, simple_loss=0.1523, pruned_loss=0.05305, audio_tagging_loss=0.01327, over 14699.00 frames. ], tot_loss[loss=0.1472, simple_loss=0.1481, pruned_loss=0.06012, audio_tagging_loss=0.013, over 3047091.85 frames. ], batch size: 56, lr: 3.58e-02, grad_scale: 32.0 2023-11-18 04:37:02,795 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.29 vs. limit=22.5 2023-11-18 04:37:31,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=61200.0, ans=0.125 2023-11-18 04:37:38,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=61266.666666666664, ans=0.0 2023-11-18 04:37:44,914 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9200, loss[loss=0.1758, simple_loss=0.1693, pruned_loss=0.07415, audio_tagging_loss=0.01699, over 15448.00 frames. ], tot_loss[loss=0.1481, simple_loss=0.1489, pruned_loss=0.06061, audio_tagging_loss=0.01304, over 3060625.52 frames. ], batch size: 58, lr: 3.58e-02, grad_scale: 32.0 2023-11-18 04:37:52,955 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.523e+01 1.127e+02 1.318e+02 1.536e+02 2.303e+02, threshold=2.636e+02, percent-clipped=0.0 2023-11-18 04:37:53,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=61333.333333333336, ans=0.0 2023-11-18 04:38:06,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=61400.0, ans=0.125 2023-11-18 04:38:11,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=61466.666666666664, ans=0.025 2023-11-18 04:38:14,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=61466.666666666664, ans=0.2 2023-11-18 04:38:23,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=61533.333333333336, ans=0.1 2023-11-18 04:38:32,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=61600.0, ans=0.2 2023-11-18 04:38:34,186 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.82 vs. limit=15.0 2023-11-18 04:38:40,356 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.44 vs. limit=12.0 2023-11-18 04:38:41,848 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9250, loss[loss=0.127, simple_loss=0.1238, pruned_loss=0.05224, audio_tagging_loss=0.01284, over 14509.00 frames. ], tot_loss[loss=0.1473, simple_loss=0.148, pruned_loss=0.06014, audio_tagging_loss=0.01315, over 3066920.39 frames. ], batch size: 56, lr: 3.57e-02, grad_scale: 32.0 2023-11-18 04:38:50,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=61666.666666666664, ans=0.125 2023-11-18 04:38:51,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=61666.666666666664, ans=0.95 2023-11-18 04:38:52,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.79 vs. limit=6.0 2023-11-18 04:38:58,167 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.99 vs. limit=15.0 2023-11-18 04:38:59,017 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.83 vs. limit=22.5 2023-11-18 04:39:01,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=61733.333333333336, ans=0.125 2023-11-18 04:39:09,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=15.0 2023-11-18 04:39:11,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=61800.0, ans=0.125 2023-11-18 04:39:14,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=61866.666666666664, ans=0.0 2023-11-18 04:39:22,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=61866.666666666664, ans=0.0 2023-11-18 04:39:25,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2023-11-18 04:39:32,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=61933.333333333336, ans=0.0 2023-11-18 04:39:34,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=61933.333333333336, ans=0.2 2023-11-18 04:39:37,695 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9300, loss[loss=0.1285, simple_loss=0.1194, pruned_loss=0.0518, audio_tagging_loss=0.01699, over 14226.00 frames. ], tot_loss[loss=0.1469, simple_loss=0.1475, pruned_loss=0.05995, audio_tagging_loss=0.01323, over 3061961.26 frames. ], batch size: 55, lr: 3.57e-02, grad_scale: 32.0 2023-11-18 04:39:38,325 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.67 vs. limit=10.0 2023-11-18 04:39:44,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=62000.0, ans=0.2 2023-11-18 04:39:45,025 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.224e+01 1.082e+02 1.160e+02 1.352e+02 1.912e+02, threshold=2.319e+02, percent-clipped=0.0 2023-11-18 04:39:48,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=62066.666666666664, ans=0.125 2023-11-18 04:39:52,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=62066.666666666664, ans=0.125 2023-11-18 04:39:55,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=62066.666666666664, ans=0.0 2023-11-18 04:40:09,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=62133.333333333336, ans=0.125 2023-11-18 04:40:14,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=62200.0, ans=0.125 2023-11-18 04:40:19,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.62 vs. limit=15.0 2023-11-18 04:40:32,978 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9350, loss[loss=0.1308, simple_loss=0.131, pruned_loss=0.05182, audio_tagging_loss=0.01346, over 16722.00 frames. ], tot_loss[loss=0.1463, simple_loss=0.1468, pruned_loss=0.05956, audio_tagging_loss=0.01339, over 3070119.42 frames. ], batch size: 62, lr: 3.56e-02, grad_scale: 32.0 2023-11-18 04:40:42,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=62333.333333333336, ans=0.0 2023-11-18 04:40:56,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=62466.666666666664, ans=0.0 2023-11-18 04:41:02,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=62466.666666666664, ans=0.0 2023-11-18 04:41:26,775 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.16 vs. limit=5.0 2023-11-18 04:41:29,988 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9400, loss[loss=0.1889, simple_loss=0.1818, pruned_loss=0.0862, audio_tagging_loss=0.01178, over 16843.00 frames. ], tot_loss[loss=0.145, simple_loss=0.1451, pruned_loss=0.05889, audio_tagging_loss=0.01358, over 3067750.96 frames. ], batch size: 60, lr: 3.55e-02, grad_scale: 32.0 2023-11-18 04:41:37,994 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.990e+01 1.022e+02 1.168e+02 1.353e+02 2.252e+02, threshold=2.336e+02, percent-clipped=0.0 2023-11-18 04:41:46,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=62733.333333333336, ans=0.0 2023-11-18 04:42:11,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=62866.666666666664, ans=0.2 2023-11-18 04:42:22,185 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.77 vs. limit=15.0 2023-11-18 04:42:22,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=62933.333333333336, ans=0.025 2023-11-18 04:42:25,906 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9450, loss[loss=0.1528, simple_loss=0.1547, pruned_loss=0.06219, audio_tagging_loss=0.01324, over 16334.00 frames. ], tot_loss[loss=0.144, simple_loss=0.1437, pruned_loss=0.05838, audio_tagging_loss=0.01379, over 3060549.14 frames. ], batch size: 59, lr: 3.55e-02, grad_scale: 32.0 2023-11-18 04:42:25,921 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:42:28,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=63000.0, ans=0.1 2023-11-18 04:42:35,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=63066.666666666664, ans=0.0 2023-11-18 04:42:56,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=63133.333333333336, ans=0.0 2023-11-18 04:43:09,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=63266.666666666664, ans=0.2 2023-11-18 04:43:21,287 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9500, loss[loss=0.1273, simple_loss=0.1383, pruned_loss=0.04298, audio_tagging_loss=0.0152, over 15106.00 frames. ], tot_loss[loss=0.1457, simple_loss=0.1456, pruned_loss=0.05916, audio_tagging_loss=0.01377, over 3065942.84 frames. ], batch size: 56, lr: 3.54e-02, grad_scale: 32.0 2023-11-18 04:43:26,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.71 vs. limit=8.0 2023-11-18 04:43:29,296 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.749e+01 1.117e+02 1.291e+02 1.412e+02 2.358e+02, threshold=2.583e+02, percent-clipped=1.0 2023-11-18 04:43:30,991 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=26.96 vs. limit=22.5 2023-11-18 04:43:41,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=63400.0, ans=0.125 2023-11-18 04:43:43,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=63466.666666666664, ans=0.125 2023-11-18 04:43:54,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=63533.333333333336, ans=0.0 2023-11-18 04:43:56,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=63533.333333333336, ans=0.05 2023-11-18 04:44:00,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=63533.333333333336, ans=0.125 2023-11-18 04:44:01,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=63533.333333333336, ans=0.1 2023-11-18 04:44:09,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=63600.0, ans=0.125 2023-11-18 04:44:11,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=63600.0, ans=0.2 2023-11-18 04:44:17,720 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9550, loss[loss=0.1636, simple_loss=0.1698, pruned_loss=0.06599, audio_tagging_loss=0.01273, over 15487.00 frames. ], tot_loss[loss=0.1449, simple_loss=0.1448, pruned_loss=0.05863, audio_tagging_loss=0.01384, over 3058391.02 frames. ], batch size: 57, lr: 3.54e-02, grad_scale: 32.0 2023-11-18 04:44:26,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=63666.666666666664, ans=0.125 2023-11-18 04:44:31,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=63733.333333333336, ans=0.0 2023-11-18 04:44:38,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=63733.333333333336, ans=0.0 2023-11-18 04:44:52,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=63866.666666666664, ans=0.1 2023-11-18 04:44:52,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.50 vs. limit=15.0 2023-11-18 04:44:55,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=63866.666666666664, ans=0.04949747468305833 2023-11-18 04:45:14,638 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9600, loss[loss=0.1515, simple_loss=0.154, pruned_loss=0.06121, audio_tagging_loss=0.01335, over 14732.00 frames. ], tot_loss[loss=0.1449, simple_loss=0.1447, pruned_loss=0.05863, audio_tagging_loss=0.0139, over 3062116.62 frames. ], batch size: 54, lr: 3.53e-02, grad_scale: 32.0 2023-11-18 04:45:19,582 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=15.17 vs. limit=22.5 2023-11-18 04:45:22,001 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 1.076e+02 1.212e+02 1.383e+02 1.987e+02, threshold=2.424e+02, percent-clipped=0.0 2023-11-18 04:45:23,282 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:45:47,110 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.30 vs. limit=15.0 2023-11-18 04:45:49,477 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.96 vs. limit=6.0 2023-11-18 04:45:53,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=64200.0, ans=0.1 2023-11-18 04:45:53,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=64200.0, ans=0.125 2023-11-18 04:46:09,776 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9650, loss[loss=0.1566, simple_loss=0.1665, pruned_loss=0.06258, audio_tagging_loss=0.0108, over 15235.00 frames. ], tot_loss[loss=0.145, simple_loss=0.1456, pruned_loss=0.05851, audio_tagging_loss=0.01375, over 3059468.40 frames. ], batch size: 57, lr: 3.53e-02, grad_scale: 32.0 2023-11-18 04:46:19,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=64333.333333333336, ans=0.2 2023-11-18 04:46:22,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=64400.0, ans=0.0 2023-11-18 04:46:22,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=64400.0, ans=0.0 2023-11-18 04:46:25,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=64400.0, ans=0.1 2023-11-18 04:46:28,104 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 04:46:34,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=64466.666666666664, ans=0.04949747468305833 2023-11-18 04:46:37,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=64466.666666666664, ans=0.0 2023-11-18 04:46:45,510 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.69 vs. limit=15.0 2023-11-18 04:47:06,183 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9700, loss[loss=0.1735, simple_loss=0.1722, pruned_loss=0.07179, audio_tagging_loss=0.01565, over 16186.00 frames. ], tot_loss[loss=0.1455, simple_loss=0.1463, pruned_loss=0.05887, audio_tagging_loss=0.01351, over 3058181.24 frames. ], batch size: 57, lr: 3.52e-02, grad_scale: 32.0 2023-11-18 04:47:12,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=64666.666666666664, ans=0.1 2023-11-18 04:47:13,618 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.844e+01 1.077e+02 1.252e+02 1.398e+02 2.198e+02, threshold=2.504e+02, percent-clipped=0.0 2023-11-18 04:47:16,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=64733.333333333336, ans=0.0 2023-11-18 04:48:01,918 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9750, loss[loss=0.1384, simple_loss=0.1437, pruned_loss=0.05751, audio_tagging_loss=0.00903, over 14284.00 frames. ], tot_loss[loss=0.1445, simple_loss=0.1455, pruned_loss=0.05839, audio_tagging_loss=0.01337, over 3053414.94 frames. ], batch size: 55, lr: 3.51e-02, grad_scale: 32.0 2023-11-18 04:48:05,177 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.60 vs. limit=22.5 2023-11-18 04:48:24,998 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.71 vs. limit=15.0 2023-11-18 04:48:30,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=65133.333333333336, ans=0.0 2023-11-18 04:48:31,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=65133.333333333336, ans=0.125 2023-11-18 04:48:38,500 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.25 vs. limit=22.5 2023-11-18 04:48:39,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=65200.0, ans=0.125 2023-11-18 04:48:58,270 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9800, loss[loss=0.1788, simple_loss=0.1908, pruned_loss=0.07053, audio_tagging_loss=0.01288, over 14568.00 frames. ], tot_loss[loss=0.1456, simple_loss=0.1468, pruned_loss=0.05905, audio_tagging_loss=0.01322, over 3047022.17 frames. ], batch size: 53, lr: 3.51e-02, grad_scale: 32.0 2023-11-18 04:49:06,099 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.876e+01 1.068e+02 1.214e+02 1.427e+02 2.483e+02, threshold=2.428e+02, percent-clipped=0.0 2023-11-18 04:49:06,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=65333.333333333336, ans=0.125 2023-11-18 04:49:12,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=65400.0, ans=0.0 2023-11-18 04:49:41,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=65533.333333333336, ans=0.025 2023-11-18 04:49:48,831 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:49:53,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=65666.66666666667, ans=0.0 2023-11-18 04:49:54,156 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9850, loss[loss=0.1709, simple_loss=0.1786, pruned_loss=0.07096, audio_tagging_loss=0.01068, over 15379.00 frames. ], tot_loss[loss=0.1459, simple_loss=0.147, pruned_loss=0.05931, audio_tagging_loss=0.0131, over 3056838.83 frames. ], batch size: 55, lr: 3.50e-02, grad_scale: 32.0 2023-11-18 04:50:03,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=65666.66666666667, ans=15.0 2023-11-18 04:50:07,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=65733.33333333333, ans=0.0 2023-11-18 04:50:11,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=65733.33333333333, ans=0.0 2023-11-18 04:50:17,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=65800.0, ans=0.2 2023-11-18 04:50:23,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=65800.0, ans=0.125 2023-11-18 04:50:27,302 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.39 vs. limit=15.0 2023-11-18 04:50:29,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=65866.66666666667, ans=0.125 2023-11-18 04:50:32,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=65866.66666666667, ans=0.125 2023-11-18 04:50:48,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=65933.33333333333, ans=0.125 2023-11-18 04:50:50,828 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9900, loss[loss=0.1786, simple_loss=0.1701, pruned_loss=0.0787, audio_tagging_loss=0.01485, over 14753.00 frames. ], tot_loss[loss=0.1461, simple_loss=0.1475, pruned_loss=0.05936, audio_tagging_loss=0.01303, over 3058431.33 frames. ], batch size: 54, lr: 3.50e-02, grad_scale: 32.0 2023-11-18 04:50:58,267 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.989e+01 1.084e+02 1.192e+02 1.374e+02 2.032e+02, threshold=2.383e+02, percent-clipped=0.0 2023-11-18 04:51:07,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=66066.66666666667, ans=10.0 2023-11-18 04:51:20,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2023-11-18 04:51:46,593 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 9950, loss[loss=0.09292, simple_loss=0.09068, pruned_loss=0.033, audio_tagging_loss=0.01458, over 14908.00 frames. ], tot_loss[loss=0.1465, simple_loss=0.148, pruned_loss=0.05952, audio_tagging_loss=0.01297, over 3056903.11 frames. ], batch size: 57, lr: 3.49e-02, grad_scale: 32.0 2023-11-18 04:51:55,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=66333.33333333333, ans=0.2 2023-11-18 04:51:55,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=66333.33333333333, ans=0.2 2023-11-18 04:52:03,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=66400.0, ans=0.125 2023-11-18 04:52:15,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.87 vs. limit=22.5 2023-11-18 04:52:17,436 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.78 vs. limit=15.0 2023-11-18 04:52:18,569 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.09 vs. limit=10.0 2023-11-18 04:52:43,109 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10000, loss[loss=0.1282, simple_loss=0.1295, pruned_loss=0.04763, audio_tagging_loss=0.01578, over 15775.00 frames. ], tot_loss[loss=0.1444, simple_loss=0.1458, pruned_loss=0.05853, audio_tagging_loss=0.01301, over 3046897.10 frames. ], batch size: 62, lr: 3.49e-02, grad_scale: 32.0 2023-11-18 04:52:50,995 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.963e+01 1.074e+02 1.249e+02 1.429e+02 2.064e+02, threshold=2.499e+02, percent-clipped=0.0 2023-11-18 04:52:54,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=66733.33333333333, ans=0.125 2023-11-18 04:53:05,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.55 vs. limit=15.0 2023-11-18 04:53:22,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=66866.66666666667, ans=0.2 2023-11-18 04:53:25,279 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2023-11-18 04:53:39,079 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10050, loss[loss=0.1301, simple_loss=0.1193, pruned_loss=0.05771, audio_tagging_loss=0.0127, over 14738.00 frames. ], tot_loss[loss=0.1451, simple_loss=0.1465, pruned_loss=0.05887, audio_tagging_loss=0.01292, over 3048888.22 frames. ], batch size: 56, lr: 3.48e-02, grad_scale: 32.0 2023-11-18 04:53:51,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten.whitening_limit, batch_count=67066.66666666667, ans=15.0 2023-11-18 04:53:53,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=67066.66666666667, ans=0.125 2023-11-18 04:53:57,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=67066.66666666667, ans=0.125 2023-11-18 04:54:01,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=67133.33333333333, ans=0.125 2023-11-18 04:54:20,859 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.36 vs. limit=15.0 2023-11-18 04:54:23,476 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=6.921e+00 2023-11-18 04:54:31,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=67266.66666666667, ans=0.09899494936611666 2023-11-18 04:54:32,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=67266.66666666667, ans=0.0 2023-11-18 04:54:34,781 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10100, loss[loss=0.1307, simple_loss=0.1388, pruned_loss=0.05054, audio_tagging_loss=0.01082, over 14985.00 frames. ], tot_loss[loss=0.145, simple_loss=0.1464, pruned_loss=0.05867, audio_tagging_loss=0.01308, over 3041847.25 frames. ], batch size: 55, lr: 3.47e-02, grad_scale: 32.0 2023-11-18 04:54:42,866 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.958e+01 1.070e+02 1.242e+02 1.409e+02 2.518e+02, threshold=2.485e+02, percent-clipped=1.0 2023-11-18 04:54:56,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.06 vs. limit=10.0 2023-11-18 04:55:04,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=67466.66666666667, ans=0.125 2023-11-18 04:55:20,742 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:55:28,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=67600.0, ans=0.1 2023-11-18 04:55:29,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=67600.0, ans=0.125 2023-11-18 04:55:31,489 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10150, loss[loss=0.1047, simple_loss=0.09775, pruned_loss=0.04195, audio_tagging_loss=0.01392, over 15065.00 frames. ], tot_loss[loss=0.1454, simple_loss=0.1468, pruned_loss=0.05884, audio_tagging_loss=0.01318, over 3045685.92 frames. ], batch size: 61, lr: 3.47e-02, grad_scale: 32.0 2023-11-18 04:55:36,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.23 vs. limit=22.5 2023-11-18 04:55:59,260 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:56:17,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=67933.33333333333, ans=0.1 2023-11-18 04:56:21,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=67933.33333333333, ans=0.125 2023-11-18 04:56:27,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=68000.0, ans=0.5 2023-11-18 04:56:27,844 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10200, loss[loss=0.1395, simple_loss=0.1412, pruned_loss=0.05783, audio_tagging_loss=0.01111, over 14956.00 frames. ], tot_loss[loss=0.1458, simple_loss=0.1471, pruned_loss=0.05908, audio_tagging_loss=0.0132, over 3041388.55 frames. ], batch size: 56, lr: 3.46e-02, grad_scale: 64.0 2023-11-18 04:56:31,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten.whitening_limit, batch_count=68000.0, ans=15.0 2023-11-18 04:56:31,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=68000.0, ans=0.1 2023-11-18 04:56:35,839 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.974e+01 1.095e+02 1.241e+02 1.478e+02 2.822e+02, threshold=2.482e+02, percent-clipped=1.0 2023-11-18 04:56:45,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=68066.66666666667, ans=0.2 2023-11-18 04:56:49,786 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 04:57:06,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=68200.0, ans=0.0 2023-11-18 04:57:22,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=68333.33333333333, ans=0.125 2023-11-18 04:57:23,699 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10250, loss[loss=0.1501, simple_loss=0.1534, pruned_loss=0.05973, audio_tagging_loss=0.01363, over 15761.00 frames. ], tot_loss[loss=0.145, simple_loss=0.1464, pruned_loss=0.0585, audio_tagging_loss=0.01334, over 3049098.30 frames. ], batch size: 58, lr: 3.46e-02, grad_scale: 64.0 2023-11-18 04:57:28,434 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2023-11-18 04:57:39,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=68400.0, ans=0.0 2023-11-18 04:57:45,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=68466.66666666667, ans=0.125 2023-11-18 04:57:52,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_ff3.min_abs, batch_count=68466.66666666667, ans=0.2 2023-11-18 04:58:19,291 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10300, loss[loss=0.1637, simple_loss=0.1581, pruned_loss=0.07046, audio_tagging_loss=0.01413, over 14986.00 frames. ], tot_loss[loss=0.1448, simple_loss=0.1458, pruned_loss=0.05843, audio_tagging_loss=0.01346, over 3046389.11 frames. ], batch size: 56, lr: 3.45e-02, grad_scale: 64.0 2023-11-18 04:58:22,967 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=6.0 2023-11-18 04:58:27,351 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.697e+01 1.063e+02 1.210e+02 1.437e+02 2.016e+02, threshold=2.421e+02, percent-clipped=0.0 2023-11-18 04:58:30,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=68733.33333333333, ans=0.125 2023-11-18 04:58:33,310 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.37 vs. limit=15.0 2023-11-18 04:58:36,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=68733.33333333333, ans=0.125 2023-11-18 04:58:38,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=68733.33333333333, ans=0.0 2023-11-18 04:58:45,557 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.88 vs. limit=15.0 2023-11-18 04:59:04,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=68933.33333333333, ans=0.125 2023-11-18 04:59:15,527 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10350, loss[loss=0.1651, simple_loss=0.1717, pruned_loss=0.06876, audio_tagging_loss=0.01055, over 14244.00 frames. ], tot_loss[loss=0.1455, simple_loss=0.1464, pruned_loss=0.05881, audio_tagging_loss=0.01344, over 3045799.27 frames. ], batch size: 54, lr: 3.45e-02, grad_scale: 64.0 2023-11-18 04:59:22,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=69000.0, ans=0.025 2023-11-18 04:59:35,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=69066.66666666667, ans=0.125 2023-11-18 04:59:40,050 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.53 vs. limit=22.5 2023-11-18 04:59:55,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=69200.0, ans=0.125 2023-11-18 05:00:10,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=69333.33333333333, ans=0.125 2023-11-18 05:00:10,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=69333.33333333333, ans=0.125 2023-11-18 05:00:11,310 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10400, loss[loss=0.1342, simple_loss=0.1317, pruned_loss=0.05211, audio_tagging_loss=0.01623, over 15805.00 frames. ], tot_loss[loss=0.1444, simple_loss=0.1452, pruned_loss=0.05816, audio_tagging_loss=0.01366, over 3038334.70 frames. ], batch size: 59, lr: 3.44e-02, grad_scale: 64.0 2023-11-18 05:00:18,711 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.147e+01 1.054e+02 1.220e+02 1.352e+02 2.408e+02, threshold=2.441e+02, percent-clipped=0.0 2023-11-18 05:00:25,112 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.37 vs. limit=15.0 2023-11-18 05:00:37,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=69466.66666666667, ans=0.125 2023-11-18 05:01:04,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=69600.0, ans=0.0 2023-11-18 05:01:06,933 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10450, loss[loss=0.1749, simple_loss=0.1815, pruned_loss=0.07311, audio_tagging_loss=0.01108, over 15208.00 frames. ], tot_loss[loss=0.1432, simple_loss=0.1438, pruned_loss=0.05751, audio_tagging_loss=0.01375, over 3043414.46 frames. ], batch size: 56, lr: 3.44e-02, grad_scale: 64.0 2023-11-18 05:01:25,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=69733.33333333333, ans=0.125 2023-11-18 05:01:35,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=69800.0, ans=0.2 2023-11-18 05:01:42,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=69866.66666666667, ans=0.125 2023-11-18 05:02:03,066 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10500, loss[loss=0.1886, simple_loss=0.1944, pruned_loss=0.07905, audio_tagging_loss=0.01236, over 15374.00 frames. ], tot_loss[loss=0.142, simple_loss=0.1429, pruned_loss=0.05707, audio_tagging_loss=0.01347, over 3048018.25 frames. ], batch size: 56, lr: 3.43e-02, grad_scale: 64.0 2023-11-18 05:02:04,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=70000.0, ans=0.2 2023-11-18 05:02:04,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=70000.0, ans=0.0 2023-11-18 05:02:10,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=70000.0, ans=0.125 2023-11-18 05:02:10,951 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.463e+01 1.096e+02 1.231e+02 1.432e+02 2.125e+02, threshold=2.461e+02, percent-clipped=0.0 2023-11-18 05:02:28,308 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=14.89 vs. limit=15.0 2023-11-18 05:02:30,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=70133.33333333333, ans=10.0 2023-11-18 05:02:39,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=70200.0, ans=0.125 2023-11-18 05:02:40,904 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:02:44,008 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.12 vs. limit=8.0 2023-11-18 05:02:55,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=70266.66666666667, ans=0.0 2023-11-18 05:02:58,482 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10550, loss[loss=0.1282, simple_loss=0.1347, pruned_loss=0.04794, audio_tagging_loss=0.01297, over 15369.00 frames. ], tot_loss[loss=0.1406, simple_loss=0.1417, pruned_loss=0.05642, audio_tagging_loss=0.01337, over 3046156.36 frames. ], batch size: 58, lr: 3.43e-02, grad_scale: 64.0 2023-11-18 05:03:00,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=70333.33333333333, ans=0.1 2023-11-18 05:03:02,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.22 vs. limit=15.0 2023-11-18 05:03:09,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=70400.0, ans=0.0 2023-11-18 05:03:10,738 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.16 vs. limit=6.0 2023-11-18 05:03:21,429 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.741e+00 2023-11-18 05:03:22,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=70466.66666666667, ans=0.125 2023-11-18 05:03:27,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=70466.66666666667, ans=0.1 2023-11-18 05:03:45,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=70600.0, ans=0.125 2023-11-18 05:03:46,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=70600.0, ans=0.1 2023-11-18 05:03:53,329 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10600, loss[loss=0.1261, simple_loss=0.1322, pruned_loss=0.05087, audio_tagging_loss=0.009106, over 15934.00 frames. ], tot_loss[loss=0.1414, simple_loss=0.1427, pruned_loss=0.05691, audio_tagging_loss=0.01311, over 3042831.84 frames. ], batch size: 58, lr: 3.42e-02, grad_scale: 64.0 2023-11-18 05:04:01,190 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.606e+01 1.084e+02 1.194e+02 1.358e+02 2.173e+02, threshold=2.389e+02, percent-clipped=0.0 2023-11-18 05:04:37,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=70933.33333333333, ans=0.2 2023-11-18 05:04:43,250 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=19.33 vs. limit=22.5 2023-11-18 05:04:49,524 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10650, loss[loss=0.1395, simple_loss=0.1417, pruned_loss=0.05733, audio_tagging_loss=0.01136, over 15137.00 frames. ], tot_loss[loss=0.1418, simple_loss=0.1431, pruned_loss=0.05718, audio_tagging_loss=0.01302, over 3039238.13 frames. ], batch size: 55, lr: 3.41e-02, grad_scale: 64.0 2023-11-18 05:04:49,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=71000.0, ans=0.125 2023-11-18 05:04:57,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=71000.0, ans=0.125 2023-11-18 05:05:00,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=71066.66666666667, ans=0.125 2023-11-18 05:05:45,720 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10700, loss[loss=0.1987, simple_loss=0.2029, pruned_loss=0.0866, audio_tagging_loss=0.01068, over 15056.00 frames. ], tot_loss[loss=0.1412, simple_loss=0.1429, pruned_loss=0.05684, audio_tagging_loss=0.01293, over 3036800.93 frames. ], batch size: 53, lr: 3.41e-02, grad_scale: 64.0 2023-11-18 05:05:52,954 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.108e+01 1.048e+02 1.189e+02 1.344e+02 2.146e+02, threshold=2.378e+02, percent-clipped=0.0 2023-11-18 05:06:00,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=71400.0, ans=0.125 2023-11-18 05:06:01,176 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.66 vs. limit=15.0 2023-11-18 05:06:20,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=71533.33333333333, ans=0.125 2023-11-18 05:06:21,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=71533.33333333333, ans=0.125 2023-11-18 05:06:32,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=71600.0, ans=0.0 2023-11-18 05:06:39,933 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10750, loss[loss=0.1101, simple_loss=0.0999, pruned_loss=0.04489, audio_tagging_loss=0.0153, over 14455.00 frames. ], tot_loss[loss=0.1407, simple_loss=0.1423, pruned_loss=0.05652, audio_tagging_loss=0.01302, over 3039870.34 frames. ], batch size: 59, lr: 3.40e-02, grad_scale: 64.0 2023-11-18 05:06:49,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=71666.66666666667, ans=0.125 2023-11-18 05:07:03,353 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.18 vs. limit=10.0 2023-11-18 05:07:04,243 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:07:17,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=71866.66666666667, ans=0.0 2023-11-18 05:07:35,443 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10800, loss[loss=0.188, simple_loss=0.186, pruned_loss=0.07758, audio_tagging_loss=0.0174, over 15846.00 frames. ], tot_loss[loss=0.1417, simple_loss=0.1438, pruned_loss=0.05689, audio_tagging_loss=0.01291, over 3044353.53 frames. ], batch size: 59, lr: 3.40e-02, grad_scale: 64.0 2023-11-18 05:07:41,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=72000.0, ans=0.0 2023-11-18 05:07:43,299 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.252e+01 1.082e+02 1.179e+02 1.367e+02 2.142e+02, threshold=2.358e+02, percent-clipped=0.0 2023-11-18 05:07:51,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=72066.66666666667, ans=10.0 2023-11-18 05:08:04,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.08 vs. limit=22.5 2023-11-18 05:08:30,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=72266.66666666667, ans=0.125 2023-11-18 05:08:32,019 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10850, loss[loss=0.191, simple_loss=0.1998, pruned_loss=0.08, audio_tagging_loss=0.01109, over 15896.00 frames. ], tot_loss[loss=0.142, simple_loss=0.1443, pruned_loss=0.05691, audio_tagging_loss=0.01299, over 3052203.54 frames. ], batch size: 57, lr: 3.39e-02, grad_scale: 64.0 2023-11-18 05:08:33,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=72333.33333333333, ans=0.125 2023-11-18 05:08:53,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=72466.66666666667, ans=0.95 2023-11-18 05:09:22,184 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.58 vs. limit=22.5 2023-11-18 05:09:23,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=72600.0, ans=0.125 2023-11-18 05:09:23,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=72600.0, ans=0.2 2023-11-18 05:09:24,813 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:09:26,869 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10900, loss[loss=0.1497, simple_loss=0.1508, pruned_loss=0.0614, audio_tagging_loss=0.01293, over 14730.00 frames. ], tot_loss[loss=0.1438, simple_loss=0.146, pruned_loss=0.05777, audio_tagging_loss=0.013, over 3055764.15 frames. ], batch size: 56, lr: 3.39e-02, grad_scale: 64.0 2023-11-18 05:09:34,701 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.986e+01 1.097e+02 1.219e+02 1.380e+02 2.178e+02, threshold=2.437e+02, percent-clipped=0.0 2023-11-18 05:09:48,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=72800.0, ans=0.05 2023-11-18 05:09:51,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=72800.0, ans=0.0 2023-11-18 05:09:51,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=72800.0, ans=0.0 2023-11-18 05:10:11,850 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.66 vs. limit=15.0 2023-11-18 05:10:17,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=72933.33333333333, ans=0.125 2023-11-18 05:10:18,681 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=12.64 vs. limit=15.0 2023-11-18 05:10:22,385 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 10950, loss[loss=0.1361, simple_loss=0.1366, pruned_loss=0.05437, audio_tagging_loss=0.01343, over 15426.00 frames. ], tot_loss[loss=0.1437, simple_loss=0.1458, pruned_loss=0.05769, audio_tagging_loss=0.01312, over 3052584.70 frames. ], batch size: 57, lr: 3.38e-02, grad_scale: 64.0 2023-11-18 05:10:23,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=73000.0, ans=0.125 2023-11-18 05:10:36,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=73066.66666666667, ans=0.125 2023-11-18 05:11:18,767 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11000, loss[loss=0.1695, simple_loss=0.1637, pruned_loss=0.07395, audio_tagging_loss=0.01373, over 14925.00 frames. ], tot_loss[loss=0.1432, simple_loss=0.1451, pruned_loss=0.05743, audio_tagging_loss=0.01321, over 3051640.83 frames. ], batch size: 56, lr: 3.38e-02, grad_scale: 64.0 2023-11-18 05:11:19,215 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=32.15 vs. limit=15.0 2023-11-18 05:11:26,683 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.286e+01 1.063e+02 1.239e+02 1.487e+02 2.361e+02, threshold=2.479e+02, percent-clipped=0.0 2023-11-18 05:11:28,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=73333.33333333333, ans=0.0 2023-11-18 05:11:28,867 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:11:47,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=73466.66666666667, ans=0.1 2023-11-18 05:12:13,953 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11050, loss[loss=0.1529, simple_loss=0.1641, pruned_loss=0.06063, audio_tagging_loss=0.01022, over 15116.00 frames. ], tot_loss[loss=0.1417, simple_loss=0.1433, pruned_loss=0.05661, audio_tagging_loss=0.0134, over 3049240.06 frames. ], batch size: 56, lr: 3.37e-02, grad_scale: 64.0 2023-11-18 05:12:16,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=73666.66666666667, ans=0.2 2023-11-18 05:12:24,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=73733.33333333333, ans=0.1 2023-11-18 05:12:27,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=73733.33333333333, ans=0.125 2023-11-18 05:12:33,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=73733.33333333333, ans=0.1 2023-11-18 05:12:41,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.65 vs. limit=10.0 2023-11-18 05:12:54,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=73866.66666666667, ans=0.1 2023-11-18 05:12:54,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.52 vs. limit=15.0 2023-11-18 05:12:56,091 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.57 vs. limit=10.0 2023-11-18 05:13:09,742 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11100, loss[loss=0.1208, simple_loss=0.1222, pruned_loss=0.04594, audio_tagging_loss=0.0138, over 16070.00 frames. ], tot_loss[loss=0.1417, simple_loss=0.143, pruned_loss=0.05664, audio_tagging_loss=0.01356, over 3054189.13 frames. ], batch size: 60, lr: 3.37e-02, grad_scale: 64.0 2023-11-18 05:13:17,722 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.064e+01 1.115e+02 1.316e+02 1.523e+02 2.373e+02, threshold=2.632e+02, percent-clipped=0.0 2023-11-18 05:13:30,134 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:13:50,810 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.04 vs. limit=15.0 2023-11-18 05:13:52,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=74200.0, ans=0.0 2023-11-18 05:13:55,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=74266.66666666667, ans=0.0 2023-11-18 05:14:06,316 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11150, loss[loss=0.1397, simple_loss=0.1354, pruned_loss=0.05667, audio_tagging_loss=0.0153, over 15301.00 frames. ], tot_loss[loss=0.1426, simple_loss=0.1442, pruned_loss=0.05689, audio_tagging_loss=0.01363, over 3056439.47 frames. ], batch size: 58, lr: 3.36e-02, grad_scale: 64.0 2023-11-18 05:14:10,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=74333.33333333333, ans=0.125 2023-11-18 05:14:16,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.27 vs. limit=22.5 2023-11-18 05:14:26,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=74400.0, ans=0.025 2023-11-18 05:14:30,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=74466.66666666667, ans=0.0 2023-11-18 05:14:32,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.20 vs. limit=15.0 2023-11-18 05:14:33,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=74466.66666666667, ans=0.0 2023-11-18 05:14:51,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=74600.0, ans=0.0 2023-11-18 05:14:53,702 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=3.87 vs. limit=5.0 2023-11-18 05:14:54,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=74600.0, ans=0.125 2023-11-18 05:14:56,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=74600.0, ans=0.2 2023-11-18 05:14:57,362 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:15:01,604 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11200, loss[loss=0.1416, simple_loss=0.1439, pruned_loss=0.05613, audio_tagging_loss=0.0135, over 16022.00 frames. ], tot_loss[loss=0.1417, simple_loss=0.1432, pruned_loss=0.05638, audio_tagging_loss=0.0137, over 3055816.10 frames. ], batch size: 60, lr: 3.36e-02, grad_scale: 64.0 2023-11-18 05:15:09,623 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.922e+01 1.084e+02 1.213e+02 1.367e+02 1.851e+02, threshold=2.426e+02, percent-clipped=0.0 2023-11-18 05:15:19,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=74733.33333333333, ans=0.1 2023-11-18 05:15:57,788 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11250, loss[loss=0.1655, simple_loss=0.1706, pruned_loss=0.07086, audio_tagging_loss=0.009326, over 16031.00 frames. ], tot_loss[loss=0.1416, simple_loss=0.1429, pruned_loss=0.05655, audio_tagging_loss=0.01357, over 3060665.05 frames. ], batch size: 57, lr: 3.35e-02, grad_scale: 64.0 2023-11-18 05:15:58,376 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.98 vs. limit=15.0 2023-11-18 05:16:25,621 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.76 vs. limit=10.0 2023-11-18 05:16:27,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=75133.33333333333, ans=0.125 2023-11-18 05:16:53,043 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11300, loss[loss=0.1649, simple_loss=0.1811, pruned_loss=0.06561, audio_tagging_loss=0.008759, over 13952.00 frames. ], tot_loss[loss=0.1421, simple_loss=0.1438, pruned_loss=0.05689, audio_tagging_loss=0.01328, over 3051237.71 frames. ], batch size: 52, lr: 3.35e-02, grad_scale: 64.0 2023-11-18 05:17:00,938 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.188e+01 1.067e+02 1.239e+02 1.530e+02 2.211e+02, threshold=2.479e+02, percent-clipped=0.0 2023-11-18 05:17:08,001 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=20.90 vs. limit=22.5 2023-11-18 05:17:13,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=75400.0, ans=0.1 2023-11-18 05:17:15,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.64 vs. limit=15.0 2023-11-18 05:17:41,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=75600.0, ans=0.2 2023-11-18 05:17:48,716 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11350, loss[loss=0.1228, simple_loss=0.1211, pruned_loss=0.04862, audio_tagging_loss=0.01358, over 16024.00 frames. ], tot_loss[loss=0.1413, simple_loss=0.1433, pruned_loss=0.05653, audio_tagging_loss=0.01312, over 3044058.43 frames. ], batch size: 61, lr: 3.34e-02, grad_scale: 64.0 2023-11-18 05:17:53,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=75666.66666666667, ans=0.0 2023-11-18 05:17:53,663 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.77 vs. limit=15.0 2023-11-18 05:18:02,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=75733.33333333333, ans=0.0 2023-11-18 05:18:21,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=75866.66666666667, ans=0.0 2023-11-18 05:18:44,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=76000.0, ans=0.1 2023-11-18 05:18:45,301 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11400, loss[loss=0.1071, simple_loss=0.1035, pruned_loss=0.04221, audio_tagging_loss=0.01313, over 14365.00 frames. ], tot_loss[loss=0.1403, simple_loss=0.1425, pruned_loss=0.05592, audio_tagging_loss=0.01309, over 3041004.30 frames. ], batch size: 57, lr: 3.34e-02, grad_scale: 64.0 2023-11-18 05:18:52,638 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.156e+01 1.039e+02 1.156e+02 1.287e+02 1.628e+02, threshold=2.311e+02, percent-clipped=0.0 2023-11-18 05:18:52,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=76000.0, ans=0.0 2023-11-18 05:19:10,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=76133.33333333333, ans=0.0 2023-11-18 05:19:12,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=5.05 vs. limit=5.0 2023-11-18 05:19:25,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=76200.0, ans=0.09899494936611666 2023-11-18 05:19:27,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=76200.0, ans=0.0 2023-11-18 05:19:40,280 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.54 vs. limit=15.0 2023-11-18 05:19:40,945 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11450, loss[loss=0.1195, simple_loss=0.1186, pruned_loss=0.04624, audio_tagging_loss=0.01397, over 14796.00 frames. ], tot_loss[loss=0.1403, simple_loss=0.1425, pruned_loss=0.05586, audio_tagging_loss=0.01316, over 3043402.52 frames. ], batch size: 57, lr: 3.33e-02, grad_scale: 64.0 2023-11-18 05:19:41,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2023-11-18 05:19:47,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2023-11-18 05:19:53,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.34 vs. limit=15.0 2023-11-18 05:20:16,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=76533.33333333333, ans=0.1 2023-11-18 05:20:23,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=76533.33333333333, ans=0.125 2023-11-18 05:20:36,078 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11500, loss[loss=0.1643, simple_loss=0.1792, pruned_loss=0.06583, audio_tagging_loss=0.00889, over 15114.00 frames. ], tot_loss[loss=0.1394, simple_loss=0.1413, pruned_loss=0.05547, audio_tagging_loss=0.01326, over 3044770.73 frames. ], batch size: 56, lr: 3.33e-02, grad_scale: 64.0 2023-11-18 05:20:43,408 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.208e+01 1.030e+02 1.194e+02 1.379e+02 2.068e+02, threshold=2.389e+02, percent-clipped=0.0 2023-11-18 05:20:48,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=76733.33333333333, ans=0.125 2023-11-18 05:20:54,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=76733.33333333333, ans=0.125 2023-11-18 05:21:07,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=76800.0, ans=0.125 2023-11-18 05:21:22,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2023-11-18 05:21:30,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=77000.0, ans=0.5 2023-11-18 05:21:31,743 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11550, loss[loss=0.1237, simple_loss=0.1312, pruned_loss=0.04555, audio_tagging_loss=0.01256, over 15419.00 frames. ], tot_loss[loss=0.1394, simple_loss=0.1411, pruned_loss=0.05558, audio_tagging_loss=0.01325, over 3051691.06 frames. ], batch size: 57, lr: 3.32e-02, grad_scale: 64.0 2023-11-18 05:22:06,037 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:22:06,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=77200.0, ans=10.0 2023-11-18 05:22:27,526 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.50 vs. limit=15.0 2023-11-18 05:22:28,059 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11600, loss[loss=0.1748, simple_loss=0.1738, pruned_loss=0.07401, audio_tagging_loss=0.01392, over 15211.00 frames. ], tot_loss[loss=0.1399, simple_loss=0.1418, pruned_loss=0.05585, audio_tagging_loss=0.01319, over 3051572.57 frames. ], batch size: 56, lr: 3.32e-02, grad_scale: 64.0 2023-11-18 05:22:29,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.04 vs. limit=15.0 2023-11-18 05:22:35,973 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.930e+01 1.030e+02 1.201e+02 1.372e+02 2.300e+02, threshold=2.402e+02, percent-clipped=0.0 2023-11-18 05:22:52,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=77466.66666666667, ans=0.5 2023-11-18 05:23:00,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=77533.33333333333, ans=0.125 2023-11-18 05:23:11,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=77600.0, ans=0.0 2023-11-18 05:23:13,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=77600.0, ans=0.125 2023-11-18 05:23:23,711 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11650, loss[loss=0.1531, simple_loss=0.1571, pruned_loss=0.06134, audio_tagging_loss=0.01324, over 16216.00 frames. ], tot_loss[loss=0.1412, simple_loss=0.1429, pruned_loss=0.05652, audio_tagging_loss=0.01317, over 3052755.83 frames. ], batch size: 59, lr: 3.31e-02, grad_scale: 64.0 2023-11-18 05:23:39,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=77733.33333333333, ans=0.1 2023-11-18 05:23:52,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=77800.0, ans=0.2 2023-11-18 05:24:10,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=77933.33333333333, ans=0.1 2023-11-18 05:24:18,903 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11700, loss[loss=0.1061, simple_loss=0.1032, pruned_loss=0.04067, audio_tagging_loss=0.01379, over 14860.00 frames. ], tot_loss[loss=0.1407, simple_loss=0.1426, pruned_loss=0.05615, audio_tagging_loss=0.01325, over 3051088.74 frames. ], batch size: 58, lr: 3.31e-02, grad_scale: 64.0 2023-11-18 05:24:26,789 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.375e+01 1.130e+02 1.304e+02 1.460e+02 2.076e+02, threshold=2.607e+02, percent-clipped=0.0 2023-11-18 05:24:27,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.97 vs. limit=15.0 2023-11-18 05:24:43,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=78133.33333333333, ans=0.125 2023-11-18 05:24:58,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=78200.0, ans=0.125 2023-11-18 05:25:11,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.13 vs. limit=10.0 2023-11-18 05:25:14,877 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11750, loss[loss=0.1317, simple_loss=0.1364, pruned_loss=0.05283, audio_tagging_loss=0.01062, over 15282.00 frames. ], tot_loss[loss=0.1417, simple_loss=0.1439, pruned_loss=0.05657, audio_tagging_loss=0.01313, over 3055330.33 frames. ], batch size: 55, lr: 3.30e-02, grad_scale: 64.0 2023-11-18 05:25:31,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=78400.0, ans=0.1 2023-11-18 05:25:37,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=78466.66666666667, ans=0.125 2023-11-18 05:25:40,559 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.55 vs. limit=22.5 2023-11-18 05:25:47,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=78533.33333333333, ans=0.0 2023-11-18 05:25:54,718 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.50 vs. limit=15.0 2023-11-18 05:26:04,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=78600.0, ans=0.0 2023-11-18 05:26:11,068 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11800, loss[loss=0.1307, simple_loss=0.1284, pruned_loss=0.04541, audio_tagging_loss=0.0211, over 14570.00 frames. ], tot_loss[loss=0.1418, simple_loss=0.1436, pruned_loss=0.05674, audio_tagging_loss=0.01323, over 3047383.32 frames. ], batch size: 55, lr: 3.30e-02, grad_scale: 32.0 2023-11-18 05:26:19,539 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 1.101e+02 1.270e+02 1.502e+02 2.355e+02, threshold=2.541e+02, percent-clipped=0.0 2023-11-18 05:26:19,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=78666.66666666667, ans=0.1 2023-11-18 05:26:54,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=78866.66666666667, ans=0.125 2023-11-18 05:26:55,539 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.13 vs. limit=10.0 2023-11-18 05:27:01,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=78933.33333333333, ans=0.125 2023-11-18 05:27:06,417 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11850, loss[loss=0.139, simple_loss=0.1399, pruned_loss=0.05546, audio_tagging_loss=0.01363, over 15535.00 frames. ], tot_loss[loss=0.1415, simple_loss=0.1435, pruned_loss=0.05647, audio_tagging_loss=0.01331, over 3038870.09 frames. ], batch size: 58, lr: 3.29e-02, grad_scale: 32.0 2023-11-18 05:27:15,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=79000.0, ans=0.125 2023-11-18 05:27:21,906 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.07 vs. limit=15.0 2023-11-18 05:27:25,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=79066.66666666667, ans=0.125 2023-11-18 05:27:30,014 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.65 vs. limit=15.0 2023-11-18 05:27:49,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=79200.0, ans=10.0 2023-11-18 05:28:02,172 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11900, loss[loss=0.1852, simple_loss=0.1836, pruned_loss=0.08016, audio_tagging_loss=0.01327, over 13762.00 frames. ], tot_loss[loss=0.1421, simple_loss=0.1441, pruned_loss=0.05671, audio_tagging_loss=0.01333, over 3045422.60 frames. ], batch size: 55, lr: 3.29e-02, grad_scale: 32.0 2023-11-18 05:28:11,750 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.709e+01 1.049e+02 1.249e+02 1.472e+02 4.248e+02, threshold=2.498e+02, percent-clipped=1.0 2023-11-18 05:28:12,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=79333.33333333333, ans=0.1 2023-11-18 05:28:15,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=79400.0, ans=0.0 2023-11-18 05:28:28,631 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.67 vs. limit=15.0 2023-11-18 05:28:48,024 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.60 vs. limit=22.5 2023-11-18 05:28:58,966 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 11950, loss[loss=0.1632, simple_loss=0.1628, pruned_loss=0.07158, audio_tagging_loss=0.01018, over 14717.00 frames. ], tot_loss[loss=0.1421, simple_loss=0.1441, pruned_loss=0.05659, audio_tagging_loss=0.01349, over 3043933.63 frames. ], batch size: 55, lr: 3.28e-02, grad_scale: 32.0 2023-11-18 05:29:03,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=79666.66666666667, ans=0.125 2023-11-18 05:29:05,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=79666.66666666667, ans=0.07 2023-11-18 05:29:31,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=79866.66666666667, ans=0.0 2023-11-18 05:29:42,214 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=15.0 2023-11-18 05:29:52,272 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-12000.pt 2023-11-18 05:29:54,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=80000.0, ans=0.09899494936611666 2023-11-18 05:29:55,203 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.86 vs. limit=15.0 2023-11-18 05:29:55,531 INFO [train_asr.py:1115] (0/4) Epoch 1, batch 12000, loss[loss=0.1309, simple_loss=0.1303, pruned_loss=0.04943, audio_tagging_loss=0.01625, over 15541.00 frames. ], tot_loss[loss=0.1414, simple_loss=0.1435, pruned_loss=0.05613, audio_tagging_loss=0.0135, over 3040390.29 frames. ], batch size: 57, lr: 3.28e-02, grad_scale: 16.0 2023-11-18 05:29:55,534 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 05:30:31,626 INFO [train_asr.py:1147] (0/4) Epoch 1, validation: loss=0.09272, simple_loss=0.07249, pruned_loss=0.01766, audio_tagging_loss=0.03882, over 4681554.00 frames. 2023-11-18 05:30:31,627 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 05:30:42,376 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.048e+01 1.066e+02 1.219e+02 1.451e+02 6.762e+02, threshold=2.438e+02, percent-clipped=1.0 2023-11-18 05:30:44,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=80066.66666666667, ans=0.125 2023-11-18 05:30:57,994 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-1.pt 2023-11-18 05:31:38,186 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 0, loss[loss=0.1267, simple_loss=0.1147, pruned_loss=0.03667, audio_tagging_loss=0.03268, over 15424.00 frames. ], tot_loss[loss=0.1267, simple_loss=0.1147, pruned_loss=0.03667, audio_tagging_loss=0.03268, over 15424.00 frames. ], batch size: 59, lr: 3.21e-02, grad_scale: 32.0 2023-11-18 05:31:38,188 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 05:32:03,706 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([2.4176, 1.5246, 0.8473, 1.9339, 2.0672, 2.3416, 2.2549, 2.0650], device='cuda:0') 2023-11-18 05:32:10,424 INFO [train_asr.py:1147] (0/4) Epoch 2, validation: loss=0.09083, simple_loss=0.07252, pruned_loss=0.0178, audio_tagging_loss=0.03677, over 4681554.00 frames. 2023-11-18 05:32:10,424 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 05:32:12,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=80160.0, ans=0.2 2023-11-18 05:32:21,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=80226.66666666667, ans=0.125 2023-11-18 05:32:21,197 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:32:25,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=80226.66666666667, ans=0.0 2023-11-18 05:32:31,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=80293.33333333333, ans=0.015 2023-11-18 05:32:31,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=80293.33333333333, ans=0.0 2023-11-18 05:32:35,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=80293.33333333333, ans=0.07 2023-11-18 05:32:58,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=80426.66666666667, ans=0.0 2023-11-18 05:33:00,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=80426.66666666667, ans=0.0 2023-11-18 05:33:05,837 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 50, loss[loss=0.1393, simple_loss=0.1222, pruned_loss=0.04868, audio_tagging_loss=0.02958, over 14696.00 frames. ], tot_loss[loss=0.1508, simple_loss=0.1401, pruned_loss=0.05462, audio_tagging_loss=0.02611, over 686290.05 frames. ], batch size: 56, lr: 3.21e-02, grad_scale: 32.0 2023-11-18 05:33:09,755 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=15.61 vs. limit=15.0 2023-11-18 05:33:25,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=80560.0, ans=0.125 2023-11-18 05:33:46,329 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.563e+01 1.150e+02 1.281e+02 1.485e+02 2.294e+02, threshold=2.563e+02, percent-clipped=0.0 2023-11-18 05:33:49,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=80760.0, ans=0.0 2023-11-18 05:33:57,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=80760.0, ans=0.125 2023-11-18 05:34:01,929 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 100, loss[loss=0.1582, simple_loss=0.1486, pruned_loss=0.06204, audio_tagging_loss=0.0218, over 14450.00 frames. ], tot_loss[loss=0.1494, simple_loss=0.1409, pruned_loss=0.05441, audio_tagging_loss=0.02452, over 1207804.67 frames. ], batch size: 55, lr: 3.20e-02, grad_scale: 32.0 2023-11-18 05:34:05,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=80826.66666666667, ans=0.05 2023-11-18 05:34:20,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=80893.33333333333, ans=0.125 2023-11-18 05:34:30,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=80960.0, ans=0.125 2023-11-18 05:34:41,540 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.91 vs. limit=15.0 2023-11-18 05:34:48,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=81093.33333333333, ans=0.0 2023-11-18 05:34:49,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.05 vs. limit=22.5 2023-11-18 05:34:57,986 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 150, loss[loss=0.1309, simple_loss=0.1411, pruned_loss=0.04198, audio_tagging_loss=0.01836, over 16234.00 frames. ], tot_loss[loss=0.1459, simple_loss=0.1401, pruned_loss=0.05392, audio_tagging_loss=0.02195, over 1614409.97 frames. ], batch size: 61, lr: 3.20e-02, grad_scale: 32.0 2023-11-18 05:35:16,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=81226.66666666667, ans=0.125 2023-11-18 05:35:23,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=81293.33333333333, ans=0.125 2023-11-18 05:35:38,724 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.932e+01 1.103e+02 1.211e+02 1.385e+02 1.770e+02, threshold=2.422e+02, percent-clipped=0.0 2023-11-18 05:35:39,543 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.47 vs. limit=6.0 2023-11-18 05:35:40,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=81360.0, ans=0.125 2023-11-18 05:35:54,755 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 200, loss[loss=0.1736, simple_loss=0.164, pruned_loss=0.07761, audio_tagging_loss=0.01399, over 14308.00 frames. ], tot_loss[loss=0.147, simple_loss=0.1436, pruned_loss=0.05605, audio_tagging_loss=0.01914, over 1936595.68 frames. ], batch size: 53, lr: 3.19e-02, grad_scale: 32.0 2023-11-18 05:36:16,970 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.04 vs. limit=15.0 2023-11-18 05:36:28,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=81693.33333333333, ans=0.1 2023-11-18 05:36:42,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=81760.0, ans=0.125 2023-11-18 05:36:51,495 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 250, loss[loss=0.1216, simple_loss=0.1295, pruned_loss=0.04202, audio_tagging_loss=0.01481, over 15229.00 frames. ], tot_loss[loss=0.1456, simple_loss=0.1442, pruned_loss=0.05629, audio_tagging_loss=0.01721, over 2181703.00 frames. ], batch size: 56, lr: 3.19e-02, grad_scale: 32.0 2023-11-18 05:37:02,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=81893.33333333333, ans=0.125 2023-11-18 05:37:03,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=81893.33333333333, ans=0.2 2023-11-18 05:37:06,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=81893.33333333333, ans=0.0 2023-11-18 05:37:09,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=81893.33333333333, ans=0.0 2023-11-18 05:37:10,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=81893.33333333333, ans=0.0 2023-11-18 05:37:31,582 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 1.100e+02 1.268e+02 1.445e+02 2.035e+02, threshold=2.536e+02, percent-clipped=0.0 2023-11-18 05:37:36,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=82093.33333333333, ans=0.0 2023-11-18 05:37:41,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=82093.33333333333, ans=0.0 2023-11-18 05:37:44,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=82093.33333333333, ans=15.0 2023-11-18 05:37:47,874 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 300, loss[loss=0.1091, simple_loss=0.1047, pruned_loss=0.03906, audio_tagging_loss=0.01766, over 14962.00 frames. ], tot_loss[loss=0.1453, simple_loss=0.1455, pruned_loss=0.05664, audio_tagging_loss=0.01595, over 2373107.28 frames. ], batch size: 57, lr: 3.18e-02, grad_scale: 32.0 2023-11-18 05:37:56,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=82160.0, ans=0.07 2023-11-18 05:38:09,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=82293.33333333333, ans=0.0 2023-11-18 05:38:09,822 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.84 vs. limit=22.5 2023-11-18 05:38:23,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.90 vs. limit=22.5 2023-11-18 05:38:36,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=82426.66666666667, ans=0.0 2023-11-18 05:38:43,910 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 350, loss[loss=0.1149, simple_loss=0.1137, pruned_loss=0.04681, audio_tagging_loss=0.01128, over 14304.00 frames. ], tot_loss[loss=0.1441, simple_loss=0.1451, pruned_loss=0.05645, audio_tagging_loss=0.01508, over 2526693.30 frames. ], batch size: 56, lr: 3.18e-02, grad_scale: 32.0 2023-11-18 05:38:45,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=82493.33333333333, ans=0.1 2023-11-18 05:38:47,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=82493.33333333333, ans=0.125 2023-11-18 05:38:55,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=82560.0, ans=0.0 2023-11-18 05:39:01,956 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.67 vs. limit=15.0 2023-11-18 05:39:02,410 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:39:02,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=82560.0, ans=0.0 2023-11-18 05:39:17,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=82693.33333333333, ans=0.125 2023-11-18 05:39:24,852 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 1.093e+02 1.219e+02 1.382e+02 1.971e+02, threshold=2.439e+02, percent-clipped=0.0 2023-11-18 05:39:32,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=82760.0, ans=0.2 2023-11-18 05:39:40,366 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 400, loss[loss=0.1158, simple_loss=0.1034, pruned_loss=0.04641, audio_tagging_loss=0.01765, over 16149.00 frames. ], tot_loss[loss=0.1415, simple_loss=0.143, pruned_loss=0.05544, audio_tagging_loss=0.01455, over 2649724.68 frames. ], batch size: 64, lr: 3.17e-02, grad_scale: 32.0 2023-11-18 05:39:40,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=82826.66666666667, ans=0.125 2023-11-18 05:39:49,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=82826.66666666667, ans=0.2 2023-11-18 05:39:53,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=82893.33333333333, ans=0.125 2023-11-18 05:40:03,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=82960.0, ans=0.125 2023-11-18 05:40:20,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=83026.66666666667, ans=0.1 2023-11-18 05:40:29,351 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.41 vs. limit=5.0 2023-11-18 05:40:32,504 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:40:34,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=83093.33333333333, ans=0.1 2023-11-18 05:40:36,434 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 450, loss[loss=0.1284, simple_loss=0.1326, pruned_loss=0.05057, audio_tagging_loss=0.01148, over 15295.00 frames. ], tot_loss[loss=0.141, simple_loss=0.1429, pruned_loss=0.05544, audio_tagging_loss=0.01411, over 2733734.19 frames. ], batch size: 58, lr: 3.17e-02, grad_scale: 32.0 2023-11-18 05:40:43,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=83160.0, ans=0.125 2023-11-18 05:40:51,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=83226.66666666667, ans=0.125 2023-11-18 05:41:01,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=83293.33333333333, ans=0.125 2023-11-18 05:41:16,693 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.351e+01 1.050e+02 1.181e+02 1.351e+02 2.147e+02, threshold=2.363e+02, percent-clipped=0.0 2023-11-18 05:41:17,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=83360.0, ans=0.125 2023-11-18 05:41:23,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=83426.66666666667, ans=0.125 2023-11-18 05:41:32,208 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 500, loss[loss=0.1159, simple_loss=0.1205, pruned_loss=0.04228, audio_tagging_loss=0.01335, over 15300.00 frames. ], tot_loss[loss=0.1396, simple_loss=0.1417, pruned_loss=0.05499, audio_tagging_loss=0.01381, over 2804691.33 frames. ], batch size: 60, lr: 3.16e-02, grad_scale: 32.0 2023-11-18 05:41:32,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=83493.33333333333, ans=0.125 2023-11-18 05:41:33,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=83493.33333333333, ans=0.1 2023-11-18 05:41:36,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=83493.33333333333, ans=0.2 2023-11-18 05:41:59,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=83626.66666666667, ans=0.125 2023-11-18 05:42:04,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=83693.33333333333, ans=0.125 2023-11-18 05:42:06,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=83693.33333333333, ans=0.125 2023-11-18 05:42:10,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=83693.33333333333, ans=0.125 2023-11-18 05:42:13,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=83693.33333333333, ans=0.2 2023-11-18 05:42:21,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=83760.0, ans=0.125 2023-11-18 05:42:27,861 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 550, loss[loss=0.1615, simple_loss=0.1715, pruned_loss=0.0656, audio_tagging_loss=0.0101, over 14923.00 frames. ], tot_loss[loss=0.1412, simple_loss=0.1437, pruned_loss=0.05588, audio_tagging_loss=0.01352, over 2862859.57 frames. ], batch size: 56, lr: 3.16e-02, grad_scale: 32.0 2023-11-18 05:42:42,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=83893.33333333333, ans=0.5 2023-11-18 05:42:58,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=83960.0, ans=0.125 2023-11-18 05:42:59,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=83960.0, ans=0.125 2023-11-18 05:43:08,664 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.035e+01 1.150e+02 1.343e+02 1.676e+02 2.273e+02, threshold=2.687e+02, percent-clipped=0.0 2023-11-18 05:43:11,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=84026.66666666667, ans=0.2 2023-11-18 05:43:24,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=84160.0, ans=0.0 2023-11-18 05:43:25,064 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 600, loss[loss=0.1418, simple_loss=0.1323, pruned_loss=0.05968, audio_tagging_loss=0.01597, over 14776.00 frames. ], tot_loss[loss=0.1401, simple_loss=0.1424, pruned_loss=0.05549, audio_tagging_loss=0.01334, over 2898763.14 frames. ], batch size: 55, lr: 3.15e-02, grad_scale: 32.0 2023-11-18 05:43:25,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=84160.0, ans=0.035 2023-11-18 05:43:29,925 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=10.03 vs. limit=15.0 2023-11-18 05:43:36,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=84226.66666666667, ans=0.05 2023-11-18 05:43:47,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=84293.33333333333, ans=0.125 2023-11-18 05:44:02,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=84360.0, ans=0.09899494936611666 2023-11-18 05:44:07,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.51 vs. limit=15.0 2023-11-18 05:44:21,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=84493.33333333333, ans=0.2 2023-11-18 05:44:21,964 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 650, loss[loss=0.1182, simple_loss=0.1203, pruned_loss=0.04294, audio_tagging_loss=0.01512, over 14535.00 frames. ], tot_loss[loss=0.1376, simple_loss=0.1402, pruned_loss=0.05415, audio_tagging_loss=0.01331, over 2935842.40 frames. ], batch size: 56, lr: 3.15e-02, grad_scale: 32.0 2023-11-18 05:44:22,744 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.84 vs. limit=12.0 2023-11-18 05:44:23,610 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.44 vs. limit=6.0 2023-11-18 05:45:01,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=84693.33333333333, ans=10.0 2023-11-18 05:45:02,892 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.174e+01 1.067e+02 1.188e+02 1.445e+02 2.872e+02, threshold=2.375e+02, percent-clipped=1.0 2023-11-18 05:45:06,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=84760.0, ans=0.125 2023-11-18 05:45:11,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=84760.0, ans=0.0 2023-11-18 05:45:17,840 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 700, loss[loss=0.1074, simple_loss=0.1027, pruned_loss=0.03864, audio_tagging_loss=0.01739, over 14713.00 frames. ], tot_loss[loss=0.1379, simple_loss=0.1404, pruned_loss=0.05437, audio_tagging_loss=0.01337, over 2958457.03 frames. ], batch size: 58, lr: 3.14e-02, grad_scale: 32.0 2023-11-18 05:45:24,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=84826.66666666667, ans=0.09899494936611666 2023-11-18 05:45:28,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=84893.33333333333, ans=0.0 2023-11-18 05:45:43,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.03 vs. limit=15.0 2023-11-18 05:45:46,799 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.50 vs. limit=22.5 2023-11-18 05:46:13,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.03 vs. limit=15.0 2023-11-18 05:46:15,220 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 750, loss[loss=0.1423, simple_loss=0.14, pruned_loss=0.05257, audio_tagging_loss=0.01975, over 15632.00 frames. ], tot_loss[loss=0.1383, simple_loss=0.1412, pruned_loss=0.05439, audio_tagging_loss=0.01336, over 2981545.29 frames. ], batch size: 58, lr: 3.14e-02, grad_scale: 32.0 2023-11-18 05:46:37,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=85293.33333333333, ans=0.125 2023-11-18 05:46:43,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=85293.33333333333, ans=0.125 2023-11-18 05:46:56,129 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.449e+01 1.066e+02 1.181e+02 1.360e+02 2.052e+02, threshold=2.361e+02, percent-clipped=0.0 2023-11-18 05:46:56,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=85360.0, ans=0.1 2023-11-18 05:47:11,524 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 800, loss[loss=0.1373, simple_loss=0.1474, pruned_loss=0.05159, audio_tagging_loss=0.01204, over 15196.00 frames. ], tot_loss[loss=0.1381, simple_loss=0.1409, pruned_loss=0.05423, audio_tagging_loss=0.0134, over 2999583.68 frames. ], batch size: 56, lr: 3.14e-02, grad_scale: 32.0 2023-11-18 05:47:22,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=85560.0, ans=0.125 2023-11-18 05:47:54,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=85693.33333333333, ans=0.125 2023-11-18 05:47:58,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=85760.0, ans=0.125 2023-11-18 05:47:58,581 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.97 vs. limit=22.5 2023-11-18 05:48:07,703 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 850, loss[loss=0.1373, simple_loss=0.1299, pruned_loss=0.05769, audio_tagging_loss=0.01471, over 14499.00 frames. ], tot_loss[loss=0.1389, simple_loss=0.1416, pruned_loss=0.05462, audio_tagging_loss=0.01343, over 3009173.59 frames. ], batch size: 55, lr: 3.13e-02, grad_scale: 32.0 2023-11-18 05:48:08,362 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2023-11-18 05:48:08,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=85826.66666666667, ans=0.125 2023-11-18 05:48:11,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=85826.66666666667, ans=0.0 2023-11-18 05:48:17,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=85826.66666666667, ans=0.0 2023-11-18 05:48:31,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=85960.0, ans=0.0 2023-11-18 05:48:48,456 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.063e+01 1.087e+02 1.227e+02 1.407e+02 2.790e+02, threshold=2.454e+02, percent-clipped=1.0 2023-11-18 05:48:56,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=86093.33333333333, ans=0.0 2023-11-18 05:49:05,239 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 900, loss[loss=0.08894, simple_loss=0.08936, pruned_loss=0.02591, audio_tagging_loss=0.01835, over 14875.00 frames. ], tot_loss[loss=0.1398, simple_loss=0.1428, pruned_loss=0.05484, audio_tagging_loss=0.01358, over 3023555.56 frames. ], batch size: 59, lr: 3.13e-02, grad_scale: 32.0 2023-11-18 05:49:18,222 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.79 vs. limit=15.0 2023-11-18 05:49:27,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=86293.33333333333, ans=0.125 2023-11-18 05:49:54,505 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:49:57,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.13 vs. limit=10.0 2023-11-18 05:50:01,367 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 950, loss[loss=0.11, simple_loss=0.1188, pruned_loss=0.03694, audio_tagging_loss=0.01365, over 14567.00 frames. ], tot_loss[loss=0.1387, simple_loss=0.1416, pruned_loss=0.05454, audio_tagging_loss=0.01338, over 3022248.41 frames. ], batch size: 55, lr: 3.12e-02, grad_scale: 32.0 2023-11-18 05:50:04,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=86493.33333333333, ans=0.125 2023-11-18 05:50:05,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=86493.33333333333, ans=0.125 2023-11-18 05:50:13,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=86560.0, ans=0.1 2023-11-18 05:50:18,897 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.46 vs. limit=15.0 2023-11-18 05:50:24,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=86626.66666666667, ans=0.125 2023-11-18 05:50:33,815 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.73 vs. limit=15.0 2023-11-18 05:50:42,202 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.269e+01 1.077e+02 1.200e+02 1.388e+02 2.127e+02, threshold=2.401e+02, percent-clipped=0.0 2023-11-18 05:50:47,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=86760.0, ans=0.125 2023-11-18 05:50:48,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=86760.0, ans=15.0 2023-11-18 05:50:57,284 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1000, loss[loss=0.1712, simple_loss=0.1802, pruned_loss=0.07226, audio_tagging_loss=0.008877, over 15219.00 frames. ], tot_loss[loss=0.1378, simple_loss=0.1412, pruned_loss=0.05417, audio_tagging_loss=0.01305, over 3029835.55 frames. ], batch size: 56, lr: 3.12e-02, grad_scale: 32.0 2023-11-18 05:50:59,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=86826.66666666667, ans=0.0 2023-11-18 05:51:10,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=86893.33333333333, ans=0.1 2023-11-18 05:51:21,407 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:51:23,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=86960.0, ans=0.125 2023-11-18 05:51:26,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=86960.0, ans=0.125 2023-11-18 05:51:31,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=87026.66666666667, ans=0.125 2023-11-18 05:51:37,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=87026.66666666667, ans=0.125 2023-11-18 05:51:43,481 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:51:53,424 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1050, loss[loss=0.179, simple_loss=0.178, pruned_loss=0.08063, audio_tagging_loss=0.009404, over 16916.00 frames. ], tot_loss[loss=0.1382, simple_loss=0.1418, pruned_loss=0.05447, audio_tagging_loss=0.01289, over 3031910.96 frames. ], batch size: 64, lr: 3.11e-02, grad_scale: 32.0 2023-11-18 05:51:56,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=87160.0, ans=0.125 2023-11-18 05:52:01,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=87160.0, ans=0.0 2023-11-18 05:52:06,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=87226.66666666667, ans=10.0 2023-11-18 05:52:09,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=87226.66666666667, ans=0.0 2023-11-18 05:52:20,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=87293.33333333333, ans=0.0 2023-11-18 05:52:21,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.05 vs. limit=6.0 2023-11-18 05:52:34,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.017e+01 1.050e+02 1.244e+02 1.396e+02 2.108e+02, threshold=2.488e+02, percent-clipped=0.0 2023-11-18 05:52:41,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=87426.66666666667, ans=0.125 2023-11-18 05:52:46,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=87426.66666666667, ans=0.125 2023-11-18 05:52:50,689 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1100, loss[loss=0.1546, simple_loss=0.1657, pruned_loss=0.06079, audio_tagging_loss=0.01097, over 15260.00 frames. ], tot_loss[loss=0.1382, simple_loss=0.1418, pruned_loss=0.05447, audio_tagging_loss=0.01283, over 3034588.00 frames. ], batch size: 55, lr: 3.11e-02, grad_scale: 32.0 2023-11-18 05:52:52,888 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:52:54,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=87493.33333333333, ans=0.125 2023-11-18 05:52:54,480 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.67 vs. limit=10.0 2023-11-18 05:53:22,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=87626.66666666667, ans=0.2 2023-11-18 05:53:29,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=87693.33333333333, ans=0.0 2023-11-18 05:53:45,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=87760.0, ans=0.0 2023-11-18 05:53:46,907 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1150, loss[loss=0.1342, simple_loss=0.1338, pruned_loss=0.05734, audio_tagging_loss=0.009918, over 14917.00 frames. ], tot_loss[loss=0.1378, simple_loss=0.1414, pruned_loss=0.05438, audio_tagging_loss=0.01274, over 3039387.18 frames. ], batch size: 58, lr: 3.10e-02, grad_scale: 32.0 2023-11-18 05:54:07,135 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.19 vs. limit=15.0 2023-11-18 05:54:14,120 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.20 vs. limit=15.0 2023-11-18 05:54:28,327 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.667e+01 1.025e+02 1.107e+02 1.275e+02 1.816e+02, threshold=2.214e+02, percent-clipped=0.0 2023-11-18 05:54:43,952 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1200, loss[loss=0.1572, simple_loss=0.1582, pruned_loss=0.06738, audio_tagging_loss=0.01066, over 14541.00 frames. ], tot_loss[loss=0.1375, simple_loss=0.1412, pruned_loss=0.05419, audio_tagging_loss=0.01274, over 3047109.30 frames. ], batch size: 54, lr: 3.10e-02, grad_scale: 32.0 2023-11-18 05:54:49,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=88160.0, ans=0.125 2023-11-18 05:54:51,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=88160.0, ans=0.125 2023-11-18 05:55:40,047 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1250, loss[loss=0.1364, simple_loss=0.1367, pruned_loss=0.05422, audio_tagging_loss=0.01388, over 15089.00 frames. ], tot_loss[loss=0.1372, simple_loss=0.1409, pruned_loss=0.05406, audio_tagging_loss=0.01267, over 3044231.21 frames. ], batch size: 56, lr: 3.09e-02, grad_scale: 32.0 2023-11-18 05:55:40,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=88493.33333333333, ans=0.95 2023-11-18 05:55:46,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=88493.33333333333, ans=0.125 2023-11-18 05:55:47,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=88493.33333333333, ans=0.0 2023-11-18 05:55:48,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=88493.33333333333, ans=0.125 2023-11-18 05:56:20,666 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.599e+01 1.015e+02 1.167e+02 1.344e+02 2.286e+02, threshold=2.335e+02, percent-clipped=1.0 2023-11-18 05:56:25,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=88760.0, ans=0.125 2023-11-18 05:56:29,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=88760.0, ans=0.125 2023-11-18 05:56:36,637 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1300, loss[loss=0.1292, simple_loss=0.1336, pruned_loss=0.0515, audio_tagging_loss=0.0109, over 16846.00 frames. ], tot_loss[loss=0.137, simple_loss=0.1409, pruned_loss=0.05389, audio_tagging_loss=0.01267, over 3051118.13 frames. ], batch size: 63, lr: 3.09e-02, grad_scale: 32.0 2023-11-18 05:56:40,398 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.14 vs. limit=10.0 2023-11-18 05:56:49,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=88893.33333333333, ans=0.07 2023-11-18 05:56:52,750 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.82 vs. limit=22.5 2023-11-18 05:57:13,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=89026.66666666667, ans=0.125 2023-11-18 05:57:19,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.38 vs. limit=15.0 2023-11-18 05:57:24,364 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 05:57:33,130 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1350, loss[loss=0.1591, simple_loss=0.1528, pruned_loss=0.06862, audio_tagging_loss=0.01413, over 15447.00 frames. ], tot_loss[loss=0.1369, simple_loss=0.1408, pruned_loss=0.05377, audio_tagging_loss=0.01275, over 3049705.41 frames. ], batch size: 59, lr: 3.09e-02, grad_scale: 32.0 2023-11-18 05:57:43,948 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.50 vs. limit=15.0 2023-11-18 05:57:44,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=89226.66666666667, ans=0.0 2023-11-18 05:57:55,288 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.01 vs. limit=10.0 2023-11-18 05:58:11,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=89360.0, ans=0.0 2023-11-18 05:58:14,276 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.570e+01 1.078e+02 1.206e+02 1.341e+02 1.953e+02, threshold=2.412e+02, percent-clipped=0.0 2023-11-18 05:58:14,316 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 05:58:29,864 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1400, loss[loss=0.1117, simple_loss=0.106, pruned_loss=0.0416, audio_tagging_loss=0.01706, over 14361.00 frames. ], tot_loss[loss=0.1368, simple_loss=0.1405, pruned_loss=0.05376, audio_tagging_loss=0.01281, over 3048770.12 frames. ], batch size: 57, lr: 3.08e-02, grad_scale: 32.0 2023-11-18 05:58:46,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=89560.0, ans=10.0 2023-11-18 05:58:47,747 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.40 vs. limit=10.0 2023-11-18 05:58:56,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=89626.66666666667, ans=0.125 2023-11-18 05:58:59,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=89626.66666666667, ans=0.125 2023-11-18 05:59:18,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=89760.0, ans=0.1 2023-11-18 05:59:20,768 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=8.25 vs. limit=8.0 2023-11-18 05:59:26,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=89826.66666666667, ans=0.07 2023-11-18 05:59:27,062 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1450, loss[loss=0.1198, simple_loss=0.1274, pruned_loss=0.0433, audio_tagging_loss=0.01284, over 14949.00 frames. ], tot_loss[loss=0.1375, simple_loss=0.1412, pruned_loss=0.05405, audio_tagging_loss=0.01286, over 3050046.97 frames. ], batch size: 57, lr: 3.08e-02, grad_scale: 32.0 2023-11-18 05:59:27,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=89826.66666666667, ans=0.05 2023-11-18 05:59:32,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=89826.66666666667, ans=0.2 2023-11-18 05:59:38,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=89893.33333333333, ans=0.5 2023-11-18 05:59:39,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=89893.33333333333, ans=0.125 2023-11-18 05:59:48,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=89960.0, ans=0.0 2023-11-18 05:59:54,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=89960.0, ans=0.125 2023-11-18 05:59:54,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=89960.0, ans=0.0 2023-11-18 05:59:56,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=89960.0, ans=0.125 2023-11-18 06:00:07,186 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.834e+01 1.064e+02 1.199e+02 1.327e+02 1.919e+02, threshold=2.398e+02, percent-clipped=0.0 2023-11-18 06:00:08,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=90026.66666666667, ans=0.05 2023-11-18 06:00:08,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=90026.66666666667, ans=0.125 2023-11-18 06:00:18,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=90093.33333333333, ans=0.125 2023-11-18 06:00:20,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90093.33333333333, ans=0.1 2023-11-18 06:00:21,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=90160.0, ans=0.1 2023-11-18 06:00:23,251 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1500, loss[loss=0.1558, simple_loss=0.1575, pruned_loss=0.06055, audio_tagging_loss=0.01649, over 14987.00 frames. ], tot_loss[loss=0.1382, simple_loss=0.1418, pruned_loss=0.05435, audio_tagging_loss=0.01301, over 3050500.92 frames. ], batch size: 54, lr: 3.07e-02, grad_scale: 32.0 2023-11-18 06:00:36,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=90226.66666666667, ans=0.2 2023-11-18 06:00:38,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=90226.66666666667, ans=0.125 2023-11-18 06:00:41,650 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2023-11-18 06:00:43,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=90226.66666666667, ans=0.025 2023-11-18 06:00:45,069 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.663e+00 2023-11-18 06:00:55,346 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.43 vs. limit=15.0 2023-11-18 06:00:57,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=90360.0, ans=0.125 2023-11-18 06:01:11,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=90426.66666666667, ans=0.0 2023-11-18 06:01:17,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=90426.66666666667, ans=0.125 2023-11-18 06:01:19,522 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1550, loss[loss=0.1129, simple_loss=0.111, pruned_loss=0.04056, audio_tagging_loss=0.0168, over 15200.00 frames. ], tot_loss[loss=0.1381, simple_loss=0.1414, pruned_loss=0.05412, audio_tagging_loss=0.01327, over 3047622.93 frames. ], batch size: 61, lr: 3.07e-02, grad_scale: 32.0 2023-11-18 06:01:23,443 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.20 vs. limit=12.0 2023-11-18 06:01:33,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=90560.0, ans=0.125 2023-11-18 06:02:00,209 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 1.048e+02 1.182e+02 1.332e+02 1.868e+02, threshold=2.363e+02, percent-clipped=0.0 2023-11-18 06:02:05,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=90760.0, ans=0.5 2023-11-18 06:02:14,159 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.65 vs. limit=15.0 2023-11-18 06:02:15,846 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1600, loss[loss=0.1645, simple_loss=0.175, pruned_loss=0.06636, audio_tagging_loss=0.01062, over 15677.00 frames. ], tot_loss[loss=0.1394, simple_loss=0.1429, pruned_loss=0.05472, audio_tagging_loss=0.01322, over 3049182.50 frames. ], batch size: 58, lr: 3.06e-02, grad_scale: 32.0 2023-11-18 06:02:20,743 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2023-11-18 06:02:28,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=90893.33333333333, ans=0.1 2023-11-18 06:03:00,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=91093.33333333333, ans=0.1 2023-11-18 06:03:11,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=91160.0, ans=0.125 2023-11-18 06:03:11,789 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.38 vs. limit=6.0 2023-11-18 06:03:12,236 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1650, loss[loss=0.1166, simple_loss=0.1089, pruned_loss=0.04548, audio_tagging_loss=0.01664, over 13978.00 frames. ], tot_loss[loss=0.1388, simple_loss=0.1422, pruned_loss=0.05436, audio_tagging_loss=0.01338, over 3050414.04 frames. ], batch size: 55, lr: 3.06e-02, grad_scale: 32.0 2023-11-18 06:03:14,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=91160.0, ans=0.2 2023-11-18 06:03:15,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91160.0, ans=0.1 2023-11-18 06:03:16,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91160.0, ans=0.1 2023-11-18 06:03:16,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=91160.0, ans=0.0 2023-11-18 06:03:38,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=91293.33333333333, ans=0.0 2023-11-18 06:03:42,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=91293.33333333333, ans=0.1 2023-11-18 06:03:46,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=91360.0, ans=0.125 2023-11-18 06:03:53,160 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.238e+01 1.052e+02 1.201e+02 1.408e+02 1.916e+02, threshold=2.401e+02, percent-clipped=0.0 2023-11-18 06:04:09,193 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1700, loss[loss=0.1379, simple_loss=0.1413, pruned_loss=0.05277, audio_tagging_loss=0.01453, over 15414.00 frames. ], tot_loss[loss=0.1382, simple_loss=0.1414, pruned_loss=0.05406, audio_tagging_loss=0.01342, over 3041285.68 frames. ], batch size: 58, lr: 3.06e-02, grad_scale: 32.0 2023-11-18 06:04:10,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=91493.33333333333, ans=0.125 2023-11-18 06:04:20,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=91560.0, ans=0.0 2023-11-18 06:04:24,338 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.14 vs. limit=12.0 2023-11-18 06:04:51,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=91693.33333333333, ans=0.125 2023-11-18 06:05:00,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=91760.0, ans=0.0 2023-11-18 06:05:02,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=91760.0, ans=0.0 2023-11-18 06:05:06,089 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1750, loss[loss=0.1661, simple_loss=0.1719, pruned_loss=0.07112, audio_tagging_loss=0.009059, over 15605.00 frames. ], tot_loss[loss=0.1383, simple_loss=0.1417, pruned_loss=0.05417, audio_tagging_loss=0.01323, over 3047326.20 frames. ], batch size: 57, lr: 3.05e-02, grad_scale: 32.0 2023-11-18 06:05:10,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=91826.66666666667, ans=0.0 2023-11-18 06:05:16,418 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.87 vs. limit=22.5 2023-11-18 06:05:18,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=91893.33333333333, ans=0.125 2023-11-18 06:05:19,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=91893.33333333333, ans=0.125 2023-11-18 06:05:19,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=91893.33333333333, ans=0.2 2023-11-18 06:05:22,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.58 vs. limit=10.0 2023-11-18 06:05:26,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=91893.33333333333, ans=0.2 2023-11-18 06:05:37,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=91960.0, ans=0.125 2023-11-18 06:05:39,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=92026.66666666667, ans=0.125 2023-11-18 06:05:47,228 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.450e+01 1.111e+02 1.237e+02 1.383e+02 2.082e+02, threshold=2.473e+02, percent-clipped=0.0 2023-11-18 06:05:51,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=92093.33333333333, ans=0.1 2023-11-18 06:05:52,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=92093.33333333333, ans=0.125 2023-11-18 06:05:54,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=92093.33333333333, ans=0.1 2023-11-18 06:05:56,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=92093.33333333333, ans=0.125 2023-11-18 06:06:02,399 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1800, loss[loss=0.1496, simple_loss=0.1483, pruned_loss=0.06069, audio_tagging_loss=0.01474, over 15201.00 frames. ], tot_loss[loss=0.1379, simple_loss=0.1418, pruned_loss=0.05399, audio_tagging_loss=0.01301, over 3046906.47 frames. ], batch size: 59, lr: 3.05e-02, grad_scale: 32.0 2023-11-18 06:06:09,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=92160.0, ans=0.0 2023-11-18 06:06:17,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=92226.66666666667, ans=0.05 2023-11-18 06:06:29,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=92293.33333333333, ans=0.0 2023-11-18 06:06:30,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=92293.33333333333, ans=10.0 2023-11-18 06:06:32,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=92293.33333333333, ans=0.0 2023-11-18 06:06:45,957 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.03 vs. limit=15.0 2023-11-18 06:06:57,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=92426.66666666667, ans=0.125 2023-11-18 06:06:59,993 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1850, loss[loss=0.1074, simple_loss=0.1127, pruned_loss=0.03696, audio_tagging_loss=0.01413, over 13831.00 frames. ], tot_loss[loss=0.1369, simple_loss=0.1407, pruned_loss=0.05353, audio_tagging_loss=0.01295, over 3048382.13 frames. ], batch size: 53, lr: 3.04e-02, grad_scale: 32.0 2023-11-18 06:07:00,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=92493.33333333333, ans=0.125 2023-11-18 06:07:04,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=92493.33333333333, ans=0.1 2023-11-18 06:07:09,423 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.18 vs. limit=15.0 2023-11-18 06:07:09,599 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2023-11-18 06:07:13,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=92560.0, ans=0.1 2023-11-18 06:07:24,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=92626.66666666667, ans=0.125 2023-11-18 06:07:34,606 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.935e+00 2023-11-18 06:07:35,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=92693.33333333333, ans=0.0 2023-11-18 06:07:35,890 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2023-11-18 06:07:40,259 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.965e+01 1.045e+02 1.179e+02 1.336e+02 1.806e+02, threshold=2.358e+02, percent-clipped=0.0 2023-11-18 06:07:52,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=92760.0, ans=0.07 2023-11-18 06:07:53,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=92760.0, ans=0.125 2023-11-18 06:07:55,796 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1900, loss[loss=0.1606, simple_loss=0.1707, pruned_loss=0.06488, audio_tagging_loss=0.01036, over 15058.00 frames. ], tot_loss[loss=0.1378, simple_loss=0.142, pruned_loss=0.05398, audio_tagging_loss=0.0128, over 3053847.61 frames. ], batch size: 55, lr: 3.04e-02, grad_scale: 32.0 2023-11-18 06:08:05,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2023-11-18 06:08:51,655 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 1950, loss[loss=0.1298, simple_loss=0.1268, pruned_loss=0.05019, audio_tagging_loss=0.01619, over 15217.00 frames. ], tot_loss[loss=0.1366, simple_loss=0.1407, pruned_loss=0.05345, audio_tagging_loss=0.0128, over 3045301.07 frames. ], batch size: 57, lr: 3.03e-02, grad_scale: 32.0 2023-11-18 06:09:32,959 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.608e+01 1.040e+02 1.152e+02 1.328e+02 1.978e+02, threshold=2.303e+02, percent-clipped=0.0 2023-11-18 06:09:33,229 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=8.114e+00 2023-11-18 06:09:49,683 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2000, loss[loss=0.1136, simple_loss=0.121, pruned_loss=0.03856, audio_tagging_loss=0.01454, over 15064.00 frames. ], tot_loss[loss=0.1358, simple_loss=0.1398, pruned_loss=0.05311, audio_tagging_loss=0.01276, over 3039998.44 frames. ], batch size: 55, lr: 3.03e-02, grad_scale: 64.0 2023-11-18 06:09:49,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=93493.33333333333, ans=0.1 2023-11-18 06:09:54,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=93493.33333333333, ans=0.125 2023-11-18 06:10:01,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=93560.0, ans=0.1 2023-11-18 06:10:02,513 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.97 vs. limit=15.0 2023-11-18 06:10:06,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=93560.0, ans=0.0 2023-11-18 06:10:06,768 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.75 vs. limit=6.0 2023-11-18 06:10:09,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=93560.0, ans=0.0 2023-11-18 06:10:30,191 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2023-11-18 06:10:45,929 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2050, loss[loss=0.1447, simple_loss=0.1403, pruned_loss=0.06094, audio_tagging_loss=0.01365, over 14242.00 frames. ], tot_loss[loss=0.137, simple_loss=0.1411, pruned_loss=0.05375, audio_tagging_loss=0.0127, over 3050267.37 frames. ], batch size: 56, lr: 3.03e-02, grad_scale: 64.0 2023-11-18 06:11:04,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=93893.33333333333, ans=0.1 2023-11-18 06:11:24,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=94026.66666666667, ans=0.125 2023-11-18 06:11:26,278 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.950e+01 1.053e+02 1.201e+02 1.345e+02 1.920e+02, threshold=2.401e+02, percent-clipped=0.0 2023-11-18 06:11:36,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=94093.33333333333, ans=0.125 2023-11-18 06:11:41,162 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2100, loss[loss=0.1677, simple_loss=0.1668, pruned_loss=0.07352, audio_tagging_loss=0.01073, over 14686.00 frames. ], tot_loss[loss=0.1369, simple_loss=0.1413, pruned_loss=0.05364, audio_tagging_loss=0.01261, over 3048751.85 frames. ], batch size: 54, lr: 3.02e-02, grad_scale: 64.0 2023-11-18 06:11:43,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=94160.0, ans=0.125 2023-11-18 06:11:48,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=94160.0, ans=0.125 2023-11-18 06:11:53,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.91 vs. limit=15.0 2023-11-18 06:11:54,121 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.87 vs. limit=15.0 2023-11-18 06:11:57,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=94226.66666666667, ans=0.0 2023-11-18 06:11:57,944 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.83 vs. limit=15.0 2023-11-18 06:12:01,136 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.45 vs. limit=15.0 2023-11-18 06:12:22,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=94360.0, ans=0.035 2023-11-18 06:12:36,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=94493.33333333333, ans=0.125 2023-11-18 06:12:37,007 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2150, loss[loss=0.1416, simple_loss=0.148, pruned_loss=0.05682, audio_tagging_loss=0.01081, over 16873.00 frames. ], tot_loss[loss=0.1365, simple_loss=0.1409, pruned_loss=0.05341, audio_tagging_loss=0.01258, over 3056199.92 frames. ], batch size: 63, lr: 3.02e-02, grad_scale: 64.0 2023-11-18 06:12:48,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=94560.0, ans=0.125 2023-11-18 06:12:50,120 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.95 vs. limit=15.0 2023-11-18 06:13:01,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.15 vs. limit=22.5 2023-11-18 06:13:05,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=94626.66666666667, ans=0.1 2023-11-18 06:13:10,121 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:13:11,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=94693.33333333333, ans=0.0 2023-11-18 06:13:16,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=94693.33333333333, ans=0.125 2023-11-18 06:13:18,104 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.172e+01 1.043e+02 1.205e+02 1.372e+02 2.009e+02, threshold=2.410e+02, percent-clipped=0.0 2023-11-18 06:13:23,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=94760.0, ans=0.1 2023-11-18 06:13:25,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=94760.0, ans=0.0 2023-11-18 06:13:31,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=94760.0, ans=0.125 2023-11-18 06:13:32,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=94760.0, ans=0.0 2023-11-18 06:13:33,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=94826.66666666667, ans=0.125 2023-11-18 06:13:34,783 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2200, loss[loss=0.09679, simple_loss=0.1011, pruned_loss=0.03275, audio_tagging_loss=0.0135, over 15065.00 frames. ], tot_loss[loss=0.1352, simple_loss=0.1393, pruned_loss=0.05285, audio_tagging_loss=0.01271, over 3056816.16 frames. ], batch size: 56, lr: 3.01e-02, grad_scale: 64.0 2023-11-18 06:13:42,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=94826.66666666667, ans=0.2 2023-11-18 06:13:42,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=94826.66666666667, ans=10.0 2023-11-18 06:13:47,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=94893.33333333333, ans=0.125 2023-11-18 06:13:52,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.03 vs. limit=22.5 2023-11-18 06:13:53,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=94893.33333333333, ans=0.0 2023-11-18 06:13:58,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=94960.0, ans=0.125 2023-11-18 06:14:03,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=94960.0, ans=0.2 2023-11-18 06:14:30,425 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2250, loss[loss=0.1114, simple_loss=0.1137, pruned_loss=0.04128, audio_tagging_loss=0.01322, over 15426.00 frames. ], tot_loss[loss=0.1349, simple_loss=0.1393, pruned_loss=0.05259, audio_tagging_loss=0.0127, over 3054054.38 frames. ], batch size: 57, lr: 3.01e-02, grad_scale: 32.0 2023-11-18 06:14:45,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=95226.66666666667, ans=0.125 2023-11-18 06:14:59,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=95293.33333333333, ans=0.125 2023-11-18 06:15:00,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=95293.33333333333, ans=0.125 2023-11-18 06:15:01,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=95293.33333333333, ans=0.2 2023-11-18 06:15:07,207 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.969e+00 2023-11-18 06:15:12,286 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.736e+01 1.062e+02 1.230e+02 1.401e+02 2.481e+02, threshold=2.461e+02, percent-clipped=1.0 2023-11-18 06:15:18,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=95426.66666666667, ans=0.125 2023-11-18 06:15:26,914 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2300, loss[loss=0.1319, simple_loss=0.137, pruned_loss=0.04783, audio_tagging_loss=0.01557, over 15272.00 frames. ], tot_loss[loss=0.135, simple_loss=0.1394, pruned_loss=0.05257, audio_tagging_loss=0.01276, over 3050240.97 frames. ], batch size: 58, lr: 3.01e-02, grad_scale: 32.0 2023-11-18 06:15:40,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=95560.0, ans=0.125 2023-11-18 06:15:53,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=95626.66666666667, ans=0.0 2023-11-18 06:16:03,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=95693.33333333333, ans=0.125 2023-11-18 06:16:05,753 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.73 vs. limit=15.0 2023-11-18 06:16:06,950 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.04 vs. limit=15.0 2023-11-18 06:16:13,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=95760.0, ans=0.1 2023-11-18 06:16:15,590 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:16:24,204 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2350, loss[loss=0.17, simple_loss=0.1754, pruned_loss=0.07204, audio_tagging_loss=0.01029, over 15765.00 frames. ], tot_loss[loss=0.1353, simple_loss=0.1397, pruned_loss=0.05251, audio_tagging_loss=0.01287, over 3051433.77 frames. ], batch size: 59, lr: 3.00e-02, grad_scale: 32.0 2023-11-18 06:16:26,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=95826.66666666667, ans=0.0 2023-11-18 06:16:26,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=95826.66666666667, ans=0.125 2023-11-18 06:16:27,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.10 vs. limit=15.0 2023-11-18 06:16:41,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=95893.33333333333, ans=0.125 2023-11-18 06:16:50,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=95960.0, ans=0.0 2023-11-18 06:17:03,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=96026.66666666667, ans=0.0 2023-11-18 06:17:06,434 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.925e+01 1.033e+02 1.167e+02 1.342e+02 2.194e+02, threshold=2.335e+02, percent-clipped=0.0 2023-11-18 06:17:14,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=96093.33333333333, ans=0.1 2023-11-18 06:17:20,438 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2400, loss[loss=0.1153, simple_loss=0.1242, pruned_loss=0.0413, audio_tagging_loss=0.0119, over 14601.00 frames. ], tot_loss[loss=0.136, simple_loss=0.1407, pruned_loss=0.05275, audio_tagging_loss=0.01289, over 3049863.74 frames. ], batch size: 55, lr: 3.00e-02, grad_scale: 32.0 2023-11-18 06:17:27,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=96160.0, ans=0.0 2023-11-18 06:17:58,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=96360.0, ans=0.07 2023-11-18 06:17:58,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=96360.0, ans=0.125 2023-11-18 06:18:16,571 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2450, loss[loss=0.1413, simple_loss=0.1462, pruned_loss=0.05617, audio_tagging_loss=0.012, over 14779.00 frames. ], tot_loss[loss=0.1358, simple_loss=0.1408, pruned_loss=0.05253, audio_tagging_loss=0.01291, over 3047398.33 frames. ], batch size: 56, lr: 2.99e-02, grad_scale: 32.0 2023-11-18 06:18:37,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=96560.0, ans=0.0 2023-11-18 06:18:46,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=96626.66666666667, ans=0.025 2023-11-18 06:18:46,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=96626.66666666667, ans=0.125 2023-11-18 06:18:58,691 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.568e+01 1.053e+02 1.171e+02 1.330e+02 1.894e+02, threshold=2.342e+02, percent-clipped=0.0 2023-11-18 06:18:59,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=96693.33333333333, ans=0.0 2023-11-18 06:19:02,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=96760.0, ans=0.95 2023-11-18 06:19:13,740 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2500, loss[loss=0.1249, simple_loss=0.1431, pruned_loss=0.04363, audio_tagging_loss=0.009764, over 15252.00 frames. ], tot_loss[loss=0.1364, simple_loss=0.1411, pruned_loss=0.05279, audio_tagging_loss=0.01302, over 3043282.56 frames. ], batch size: 57, lr: 2.99e-02, grad_scale: 32.0 2023-11-18 06:19:31,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=96893.33333333333, ans=0.1 2023-11-18 06:19:33,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=96893.33333333333, ans=0.0 2023-11-18 06:19:46,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=97026.66666666667, ans=0.1 2023-11-18 06:19:48,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=97026.66666666667, ans=0.1 2023-11-18 06:19:50,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=97026.66666666667, ans=0.1 2023-11-18 06:19:52,944 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:20:10,031 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2550, loss[loss=0.1148, simple_loss=0.119, pruned_loss=0.0449, audio_tagging_loss=0.01041, over 15204.00 frames. ], tot_loss[loss=0.1352, simple_loss=0.1394, pruned_loss=0.05242, audio_tagging_loss=0.01305, over 3042297.90 frames. ], batch size: 57, lr: 2.98e-02, grad_scale: 32.0 2023-11-18 06:20:13,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=97160.0, ans=0.125 2023-11-18 06:20:13,815 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2023-11-18 06:20:15,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=97160.0, ans=0.0 2023-11-18 06:20:30,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=97226.66666666667, ans=0.125 2023-11-18 06:20:34,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=97293.33333333333, ans=0.2 2023-11-18 06:20:47,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=97360.0, ans=0.125 2023-11-18 06:20:51,781 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.040e+01 1.036e+02 1.193e+02 1.343e+02 1.842e+02, threshold=2.386e+02, percent-clipped=0.0 2023-11-18 06:21:02,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=97426.66666666667, ans=0.09899494936611666 2023-11-18 06:21:06,248 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2600, loss[loss=0.1237, simple_loss=0.1224, pruned_loss=0.04812, audio_tagging_loss=0.01438, over 14477.00 frames. ], tot_loss[loss=0.1346, simple_loss=0.1388, pruned_loss=0.05218, audio_tagging_loss=0.01299, over 3040396.73 frames. ], batch size: 56, lr: 2.98e-02, grad_scale: 32.0 2023-11-18 06:21:06,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=97493.33333333333, ans=0.09899494936611666 2023-11-18 06:21:08,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=97493.33333333333, ans=0.1 2023-11-18 06:21:10,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=97493.33333333333, ans=0.035 2023-11-18 06:21:20,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=97560.0, ans=10.0 2023-11-18 06:21:33,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=97626.66666666667, ans=0.2 2023-11-18 06:21:42,042 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.97 vs. limit=15.0 2023-11-18 06:21:42,705 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.74 vs. limit=15.0 2023-11-18 06:22:02,763 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2650, loss[loss=0.1218, simple_loss=0.1216, pruned_loss=0.04655, audio_tagging_loss=0.01442, over 15035.00 frames. ], tot_loss[loss=0.1349, simple_loss=0.1393, pruned_loss=0.05237, audio_tagging_loss=0.01289, over 3042375.43 frames. ], batch size: 58, lr: 2.98e-02, grad_scale: 32.0 2023-11-18 06:22:12,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=97826.66666666667, ans=0.125 2023-11-18 06:22:32,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=97960.0, ans=0.125 2023-11-18 06:22:34,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=97960.0, ans=0.125 2023-11-18 06:22:44,862 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.930e+01 1.069e+02 1.231e+02 1.397e+02 2.138e+02, threshold=2.463e+02, percent-clipped=0.0 2023-11-18 06:22:45,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=98026.66666666667, ans=0.125 2023-11-18 06:22:55,582 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.06 vs. limit=22.5 2023-11-18 06:22:59,925 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2700, loss[loss=0.1271, simple_loss=0.1383, pruned_loss=0.04261, audio_tagging_loss=0.01536, over 15645.00 frames. ], tot_loss[loss=0.1344, simple_loss=0.1389, pruned_loss=0.05211, audio_tagging_loss=0.01283, over 3038629.00 frames. ], batch size: 60, lr: 2.97e-02, grad_scale: 32.0 2023-11-18 06:23:10,298 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.20 vs. limit=15.0 2023-11-18 06:23:35,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=98360.0, ans=0.0 2023-11-18 06:23:56,188 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2750, loss[loss=0.163, simple_loss=0.1787, pruned_loss=0.06481, audio_tagging_loss=0.008841, over 15974.00 frames. ], tot_loss[loss=0.1341, simple_loss=0.1385, pruned_loss=0.05212, audio_tagging_loss=0.01276, over 3042814.58 frames. ], batch size: 60, lr: 2.97e-02, grad_scale: 32.0 2023-11-18 06:24:04,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=98493.33333333333, ans=0.125 2023-11-18 06:24:04,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=98493.33333333333, ans=0.0 2023-11-18 06:24:06,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=98560.0, ans=0.125 2023-11-18 06:24:07,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=98560.0, ans=0.1 2023-11-18 06:24:10,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=98560.0, ans=0.125 2023-11-18 06:24:11,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=98560.0, ans=0.07 2023-11-18 06:24:18,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=98626.66666666667, ans=0.2 2023-11-18 06:24:20,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=98626.66666666667, ans=0.2 2023-11-18 06:24:36,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=98693.33333333333, ans=0.125 2023-11-18 06:24:37,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.45 vs. limit=12.0 2023-11-18 06:24:37,491 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 1.008e+02 1.194e+02 1.354e+02 1.877e+02, threshold=2.388e+02, percent-clipped=0.0 2023-11-18 06:24:42,372 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:24:47,757 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.23 vs. limit=22.5 2023-11-18 06:24:52,450 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2800, loss[loss=0.1106, simple_loss=0.1198, pruned_loss=0.04106, audio_tagging_loss=0.009669, over 14934.00 frames. ], tot_loss[loss=0.1327, simple_loss=0.137, pruned_loss=0.05148, audio_tagging_loss=0.01275, over 3038673.84 frames. ], batch size: 56, lr: 2.96e-02, grad_scale: 32.0 2023-11-18 06:25:07,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=98893.33333333333, ans=0.0 2023-11-18 06:25:24,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.60 vs. limit=10.0 2023-11-18 06:25:27,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=99026.66666666667, ans=0.125 2023-11-18 06:25:48,865 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2850, loss[loss=0.1803, simple_loss=0.1849, pruned_loss=0.07734, audio_tagging_loss=0.01049, over 15970.00 frames. ], tot_loss[loss=0.1338, simple_loss=0.1382, pruned_loss=0.05197, audio_tagging_loss=0.01271, over 3035635.91 frames. ], batch size: 60, lr: 2.96e-02, grad_scale: 32.0 2023-11-18 06:26:23,695 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.85 vs. limit=22.5 2023-11-18 06:26:24,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=99360.0, ans=0.125 2023-11-18 06:26:30,676 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.061e+01 1.055e+02 1.276e+02 1.437e+02 2.072e+02, threshold=2.552e+02, percent-clipped=0.0 2023-11-18 06:26:31,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=99360.0, ans=0.0 2023-11-18 06:26:45,309 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2900, loss[loss=0.1239, simple_loss=0.1319, pruned_loss=0.04532, audio_tagging_loss=0.01261, over 15293.00 frames. ], tot_loss[loss=0.1337, simple_loss=0.138, pruned_loss=0.05193, audio_tagging_loss=0.01275, over 3041585.80 frames. ], batch size: 59, lr: 2.96e-02, grad_scale: 32.0 2023-11-18 06:26:46,938 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.65 vs. limit=15.0 2023-11-18 06:27:07,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=99626.66666666667, ans=0.0 2023-11-18 06:27:09,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=99626.66666666667, ans=0.125 2023-11-18 06:27:10,153 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.65 vs. limit=15.0 2023-11-18 06:27:13,431 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.00 vs. limit=15.0 2023-11-18 06:27:20,646 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.28 vs. limit=22.5 2023-11-18 06:27:28,613 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:27:36,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=99760.0, ans=0.1 2023-11-18 06:27:41,819 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.31 vs. limit=6.0 2023-11-18 06:27:42,251 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 2950, loss[loss=0.1756, simple_loss=0.1914, pruned_loss=0.06669, audio_tagging_loss=0.01326, over 15270.00 frames. ], tot_loss[loss=0.1348, simple_loss=0.1394, pruned_loss=0.05231, audio_tagging_loss=0.01281, over 3044633.80 frames. ], batch size: 57, lr: 2.95e-02, grad_scale: 16.0 2023-11-18 06:27:54,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=99893.33333333333, ans=0.125 2023-11-18 06:27:54,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=99893.33333333333, ans=0.0 2023-11-18 06:28:23,573 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.71 vs. limit=12.0 2023-11-18 06:28:25,091 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.080e+01 1.058e+02 1.250e+02 1.448e+02 1.793e+02, threshold=2.500e+02, percent-clipped=0.0 2023-11-18 06:28:32,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.50 vs. limit=10.0 2023-11-18 06:28:36,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=100093.33333333333, ans=0.015 2023-11-18 06:28:38,686 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3000, loss[loss=0.1825, simple_loss=0.1862, pruned_loss=0.07764, audio_tagging_loss=0.01177, over 15213.00 frames. ], tot_loss[loss=0.1347, simple_loss=0.1391, pruned_loss=0.05223, audio_tagging_loss=0.01293, over 3050996.70 frames. ], batch size: 55, lr: 2.95e-02, grad_scale: 16.0 2023-11-18 06:28:38,688 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 06:28:52,224 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.3.encoder.layers.3.self_attn_weights, attn_weights_entropy = tensor([1.3175, 1.3737, 0.4846, 1.5201, 1.2484, 1.7440, 1.5564, 1.5993], device='cuda:0') 2023-11-18 06:29:07,378 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.0.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.9855, 5.8680, 5.9583, 5.6727], device='cuda:0') 2023-11-18 06:29:12,301 INFO [train_asr.py:1147] (0/4) Epoch 2, validation: loss=0.0901, simple_loss=0.07118, pruned_loss=0.01674, audio_tagging_loss=0.03777, over 4681554.00 frames. 2023-11-18 06:29:12,301 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 06:29:18,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=100160.0, ans=0.125 2023-11-18 06:29:39,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=100293.33333333333, ans=0.2 2023-11-18 06:29:45,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=100360.0, ans=0.125 2023-11-18 06:29:45,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=100360.0, ans=0.125 2023-11-18 06:30:08,708 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3050, loss[loss=0.1118, simple_loss=0.1151, pruned_loss=0.03941, audio_tagging_loss=0.01481, over 14447.00 frames. ], tot_loss[loss=0.1349, simple_loss=0.1392, pruned_loss=0.05241, audio_tagging_loss=0.01292, over 3047302.54 frames. ], batch size: 55, lr: 2.94e-02, grad_scale: 16.0 2023-11-18 06:30:24,934 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.33 vs. limit=15.0 2023-11-18 06:30:38,295 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=18.61 vs. limit=15.0 2023-11-18 06:30:39,919 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:30:51,087 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.130e+01 1.055e+02 1.164e+02 1.306e+02 1.882e+02, threshold=2.329e+02, percent-clipped=0.0 2023-11-18 06:30:51,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=100693.33333333333, ans=0.125 2023-11-18 06:31:03,126 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.90 vs. limit=10.0 2023-11-18 06:31:04,644 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3100, loss[loss=0.1162, simple_loss=0.1214, pruned_loss=0.04005, audio_tagging_loss=0.01544, over 15818.00 frames. ], tot_loss[loss=0.1352, simple_loss=0.1394, pruned_loss=0.0525, audio_tagging_loss=0.01306, over 3050353.13 frames. ], batch size: 59, lr: 2.94e-02, grad_scale: 16.0 2023-11-18 06:31:16,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=100893.33333333333, ans=0.125 2023-11-18 06:31:34,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=100960.0, ans=0.125 2023-11-18 06:31:41,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=101026.66666666667, ans=0.0 2023-11-18 06:31:44,655 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2023-11-18 06:31:49,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=101093.33333333333, ans=0.1 2023-11-18 06:31:51,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=101093.33333333333, ans=0.1 2023-11-18 06:31:53,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=101093.33333333333, ans=0.1 2023-11-18 06:31:59,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.30 vs. limit=15.0 2023-11-18 06:32:00,116 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3150, loss[loss=0.1352, simple_loss=0.1316, pruned_loss=0.04531, audio_tagging_loss=0.02407, over 15846.00 frames. ], tot_loss[loss=0.1363, simple_loss=0.1409, pruned_loss=0.0528, audio_tagging_loss=0.01301, over 3050412.38 frames. ], batch size: 61, lr: 2.94e-02, grad_scale: 16.0 2023-11-18 06:32:15,165 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.79 vs. limit=12.0 2023-11-18 06:32:17,493 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.96 vs. limit=10.0 2023-11-18 06:32:37,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=101360.0, ans=0.07 2023-11-18 06:32:41,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=101360.0, ans=0.0 2023-11-18 06:32:43,611 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.183e+01 1.035e+02 1.177e+02 1.341e+02 1.863e+02, threshold=2.355e+02, percent-clipped=0.0 2023-11-18 06:32:55,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=101426.66666666667, ans=0.0 2023-11-18 06:32:58,090 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3200, loss[loss=0.1597, simple_loss=0.1703, pruned_loss=0.0615, audio_tagging_loss=0.0131, over 14487.00 frames. ], tot_loss[loss=0.1359, simple_loss=0.1405, pruned_loss=0.05246, audio_tagging_loss=0.0132, over 3048346.26 frames. ], batch size: 53, lr: 2.93e-02, grad_scale: 32.0 2023-11-18 06:33:00,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=101493.33333333333, ans=0.1 2023-11-18 06:33:10,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=101560.0, ans=0.0 2023-11-18 06:33:16,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=101560.0, ans=0.125 2023-11-18 06:33:27,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=101626.66666666667, ans=0.0 2023-11-18 06:33:28,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=101626.66666666667, ans=0.0 2023-11-18 06:33:41,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=101693.33333333333, ans=0.0 2023-11-18 06:33:44,988 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.71 vs. limit=15.0 2023-11-18 06:33:47,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=101760.0, ans=0.07 2023-11-18 06:33:50,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=101760.0, ans=0.125 2023-11-18 06:33:54,459 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3250, loss[loss=0.09428, simple_loss=0.08683, pruned_loss=0.03056, audio_tagging_loss=0.0203, over 14578.00 frames. ], tot_loss[loss=0.1347, simple_loss=0.1393, pruned_loss=0.05169, audio_tagging_loss=0.01328, over 3049989.87 frames. ], batch size: 57, lr: 2.93e-02, grad_scale: 32.0 2023-11-18 06:33:54,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=101826.66666666667, ans=0.035 2023-11-18 06:33:59,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=101826.66666666667, ans=0.125 2023-11-18 06:34:03,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.94 vs. limit=15.0 2023-11-18 06:34:07,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=101893.33333333333, ans=0.0 2023-11-18 06:34:14,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=101893.33333333333, ans=0.0 2023-11-18 06:34:33,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=102026.66666666667, ans=0.0 2023-11-18 06:34:37,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.501e+01 1.067e+02 1.209e+02 1.454e+02 2.188e+02, threshold=2.419e+02, percent-clipped=0.0 2023-11-18 06:34:37,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=102026.66666666667, ans=0.125 2023-11-18 06:34:40,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=102093.33333333333, ans=10.0 2023-11-18 06:34:42,144 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=12.0 2023-11-18 06:34:50,086 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3300, loss[loss=0.1253, simple_loss=0.1309, pruned_loss=0.05013, audio_tagging_loss=0.009661, over 14109.00 frames. ], tot_loss[loss=0.134, simple_loss=0.1382, pruned_loss=0.05159, audio_tagging_loss=0.01331, over 3046253.82 frames. ], batch size: 53, lr: 2.93e-02, grad_scale: 32.0 2023-11-18 06:34:57,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=102160.0, ans=0.0 2023-11-18 06:35:15,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=102293.33333333333, ans=0.0 2023-11-18 06:35:24,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=102360.0, ans=0.0 2023-11-18 06:35:36,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=102426.66666666667, ans=0.2 2023-11-18 06:35:39,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=102426.66666666667, ans=0.2 2023-11-18 06:35:46,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=102493.33333333333, ans=0.125 2023-11-18 06:35:46,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=102493.33333333333, ans=0.1 2023-11-18 06:35:46,868 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3350, loss[loss=0.1638, simple_loss=0.1715, pruned_loss=0.06791, audio_tagging_loss=0.01019, over 15332.00 frames. ], tot_loss[loss=0.1342, simple_loss=0.1389, pruned_loss=0.05166, audio_tagging_loss=0.01311, over 3047143.72 frames. ], batch size: 57, lr: 2.92e-02, grad_scale: 32.0 2023-11-18 06:36:03,516 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.47 vs. limit=10.0 2023-11-18 06:36:10,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=102626.66666666667, ans=10.0 2023-11-18 06:36:30,157 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.335e+01 1.052e+02 1.183e+02 1.313e+02 1.850e+02, threshold=2.366e+02, percent-clipped=0.0 2023-11-18 06:36:38,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=102760.0, ans=0.1 2023-11-18 06:36:39,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.24 vs. limit=15.0 2023-11-18 06:36:44,257 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3400, loss[loss=0.1125, simple_loss=0.1196, pruned_loss=0.03876, audio_tagging_loss=0.01396, over 16021.00 frames. ], tot_loss[loss=0.1333, simple_loss=0.1378, pruned_loss=0.05129, audio_tagging_loss=0.01312, over 3048051.51 frames. ], batch size: 60, lr: 2.92e-02, grad_scale: 32.0 2023-11-18 06:36:57,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.65 vs. limit=6.0 2023-11-18 06:37:11,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=102960.0, ans=0.1 2023-11-18 06:37:20,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.64 vs. limit=15.0 2023-11-18 06:37:32,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=103093.33333333333, ans=0.04949747468305833 2023-11-18 06:37:34,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-11-18 06:37:39,620 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3450, loss[loss=0.1453, simple_loss=0.1506, pruned_loss=0.05689, audio_tagging_loss=0.01309, over 14264.00 frames. ], tot_loss[loss=0.1338, simple_loss=0.1384, pruned_loss=0.05159, audio_tagging_loss=0.01304, over 3046444.19 frames. ], batch size: 56, lr: 2.91e-02, grad_scale: 32.0 2023-11-18 06:37:39,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=103160.0, ans=0.125 2023-11-18 06:37:52,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=103226.66666666667, ans=0.125 2023-11-18 06:37:58,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=103226.66666666667, ans=0.2 2023-11-18 06:38:03,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=103293.33333333333, ans=0.125 2023-11-18 06:38:08,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=103293.33333333333, ans=0.0 2023-11-18 06:38:09,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.79 vs. limit=10.0 2023-11-18 06:38:21,874 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.158e+01 1.088e+02 1.277e+02 1.401e+02 2.193e+02, threshold=2.554e+02, percent-clipped=0.0 2023-11-18 06:38:28,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=103426.66666666667, ans=0.0 2023-11-18 06:38:33,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=103426.66666666667, ans=0.0 2023-11-18 06:38:35,891 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3500, loss[loss=0.143, simple_loss=0.1548, pruned_loss=0.05561, audio_tagging_loss=0.009973, over 15528.00 frames. ], tot_loss[loss=0.1342, simple_loss=0.139, pruned_loss=0.05188, audio_tagging_loss=0.0128, over 3044395.83 frames. ], batch size: 56, lr: 2.91e-02, grad_scale: 32.0 2023-11-18 06:38:51,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=103560.0, ans=0.2 2023-11-18 06:38:58,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=103626.66666666667, ans=0.1 2023-11-18 06:39:03,762 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:39:05,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=103626.66666666667, ans=0.1 2023-11-18 06:39:13,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=103693.33333333333, ans=0.025 2023-11-18 06:39:22,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=103760.0, ans=0.125 2023-11-18 06:39:32,482 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3550, loss[loss=0.1384, simple_loss=0.1462, pruned_loss=0.0528, audio_tagging_loss=0.01254, over 14288.00 frames. ], tot_loss[loss=0.1337, simple_loss=0.1386, pruned_loss=0.0516, audio_tagging_loss=0.01281, over 3039080.88 frames. ], batch size: 54, lr: 2.91e-02, grad_scale: 32.0 2023-11-18 06:39:52,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.42 vs. limit=12.0 2023-11-18 06:39:57,142 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.93 vs. limit=15.0 2023-11-18 06:39:58,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=103960.0, ans=0.125 2023-11-18 06:40:00,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.36 vs. limit=22.5 2023-11-18 06:40:08,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=104026.66666666667, ans=0.05 2023-11-18 06:40:12,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=104026.66666666667, ans=0.125 2023-11-18 06:40:15,311 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.977e+01 9.988e+01 1.160e+02 1.284e+02 2.391e+02, threshold=2.320e+02, percent-clipped=0.0 2023-11-18 06:40:28,310 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3600, loss[loss=0.1246, simple_loss=0.1303, pruned_loss=0.04755, audio_tagging_loss=0.01191, over 15147.00 frames. ], tot_loss[loss=0.1334, simple_loss=0.1384, pruned_loss=0.05153, audio_tagging_loss=0.01272, over 3038385.28 frames. ], batch size: 59, lr: 2.90e-02, grad_scale: 32.0 2023-11-18 06:40:49,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=104293.33333333333, ans=0.0 2023-11-18 06:40:59,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=104293.33333333333, ans=0.2 2023-11-18 06:41:06,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=104360.0, ans=0.125 2023-11-18 06:41:24,472 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3650, loss[loss=0.1339, simple_loss=0.1367, pruned_loss=0.05379, audio_tagging_loss=0.0117, over 14347.00 frames. ], tot_loss[loss=0.1332, simple_loss=0.1382, pruned_loss=0.0514, audio_tagging_loss=0.01273, over 3042182.47 frames. ], batch size: 55, lr: 2.90e-02, grad_scale: 32.0 2023-11-18 06:41:47,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=104626.66666666667, ans=0.2 2023-11-18 06:41:51,268 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.11 vs. limit=6.0 2023-11-18 06:42:03,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=104693.33333333333, ans=0.125 2023-11-18 06:42:04,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=104693.33333333333, ans=0.0 2023-11-18 06:42:07,178 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.564e+01 1.051e+02 1.152e+02 1.363e+02 2.191e+02, threshold=2.304e+02, percent-clipped=0.0 2023-11-18 06:42:14,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=104760.0, ans=0.2 2023-11-18 06:42:17,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=104760.0, ans=0.0 2023-11-18 06:42:20,867 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3700, loss[loss=0.1439, simple_loss=0.1522, pruned_loss=0.05555, audio_tagging_loss=0.01228, over 14862.00 frames. ], tot_loss[loss=0.1333, simple_loss=0.1378, pruned_loss=0.05151, audio_tagging_loss=0.01285, over 3042400.99 frames. ], batch size: 54, lr: 2.90e-02, grad_scale: 32.0 2023-11-18 06:42:25,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=104826.66666666667, ans=0.0 2023-11-18 06:42:25,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=104826.66666666667, ans=0.125 2023-11-18 06:42:33,984 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.07 vs. limit=15.0 2023-11-18 06:42:35,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=104893.33333333333, ans=0.125 2023-11-18 06:42:35,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=104893.33333333333, ans=0.0 2023-11-18 06:42:44,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=104960.0, ans=0.2 2023-11-18 06:42:47,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=104960.0, ans=0.0 2023-11-18 06:43:03,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.53 vs. limit=15.0 2023-11-18 06:43:17,351 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3750, loss[loss=0.1033, simple_loss=0.1111, pruned_loss=0.03758, audio_tagging_loss=0.01022, over 15117.00 frames. ], tot_loss[loss=0.135, simple_loss=0.1397, pruned_loss=0.05234, audio_tagging_loss=0.01278, over 3042652.64 frames. ], batch size: 57, lr: 2.89e-02, grad_scale: 32.0 2023-11-18 06:43:39,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.94 vs. limit=15.0 2023-11-18 06:43:40,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.23 vs. limit=6.0 2023-11-18 06:43:41,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=105293.33333333333, ans=0.125 2023-11-18 06:43:49,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=105293.33333333333, ans=0.0 2023-11-18 06:43:56,415 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:44:00,708 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.685e+01 1.091e+02 1.248e+02 1.454e+02 2.022e+02, threshold=2.495e+02, percent-clipped=0.0 2023-11-18 06:44:14,165 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3800, loss[loss=0.1329, simple_loss=0.1419, pruned_loss=0.04998, audio_tagging_loss=0.01203, over 16559.00 frames. ], tot_loss[loss=0.1344, simple_loss=0.1393, pruned_loss=0.05189, audio_tagging_loss=0.01283, over 3043602.80 frames. ], batch size: 63, lr: 2.89e-02, grad_scale: 32.0 2023-11-18 06:44:19,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=105493.33333333333, ans=0.0 2023-11-18 06:44:31,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=105560.0, ans=0.125 2023-11-18 06:44:56,342 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:45:04,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=105760.0, ans=0.125 2023-11-18 06:45:10,955 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3850, loss[loss=0.1086, simple_loss=0.1058, pruned_loss=0.03989, audio_tagging_loss=0.01578, over 17055.00 frames. ], tot_loss[loss=0.1342, simple_loss=0.1392, pruned_loss=0.05169, audio_tagging_loss=0.01294, over 3044061.73 frames. ], batch size: 66, lr: 2.88e-02, grad_scale: 32.0 2023-11-18 06:45:13,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=105826.66666666667, ans=0.125 2023-11-18 06:45:21,198 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.48 vs. limit=15.0 2023-11-18 06:45:27,059 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.05 vs. limit=12.0 2023-11-18 06:45:34,603 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.25 vs. limit=15.0 2023-11-18 06:45:35,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=105960.0, ans=0.125 2023-11-18 06:45:53,829 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 1.026e+02 1.153e+02 1.299e+02 2.070e+02, threshold=2.305e+02, percent-clipped=0.0 2023-11-18 06:45:59,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.71 vs. limit=15.0 2023-11-18 06:46:05,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=106160.0, ans=0.125 2023-11-18 06:46:06,610 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3900, loss[loss=0.1293, simple_loss=0.1214, pruned_loss=0.05267, audio_tagging_loss=0.01591, over 16853.00 frames. ], tot_loss[loss=0.1342, simple_loss=0.1388, pruned_loss=0.05182, audio_tagging_loss=0.01299, over 3043558.56 frames. ], batch size: 62, lr: 2.88e-02, grad_scale: 32.0 2023-11-18 06:46:11,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=106160.0, ans=0.2 2023-11-18 06:46:13,052 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2023-11-18 06:46:14,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=106160.0, ans=0.0 2023-11-18 06:46:32,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=106293.33333333333, ans=0.125 2023-11-18 06:46:50,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=106426.66666666667, ans=0.125 2023-11-18 06:47:03,423 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 3950, loss[loss=0.1394, simple_loss=0.1483, pruned_loss=0.05203, audio_tagging_loss=0.01323, over 15948.00 frames. ], tot_loss[loss=0.1358, simple_loss=0.1405, pruned_loss=0.05251, audio_tagging_loss=0.01303, over 3040187.28 frames. ], batch size: 59, lr: 2.88e-02, grad_scale: 32.0 2023-11-18 06:47:06,167 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.46 vs. limit=22.5 2023-11-18 06:47:06,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=21.66 vs. limit=22.5 2023-11-18 06:47:23,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2023-11-18 06:47:28,619 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2023-11-18 06:47:31,737 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-16000.pt 2023-11-18 06:47:48,083 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:47:48,874 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 1.025e+02 1.127e+02 1.249e+02 1.832e+02, threshold=2.254e+02, percent-clipped=0.0 2023-11-18 06:48:02,287 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4000, loss[loss=0.09989, simple_loss=0.09575, pruned_loss=0.03589, audio_tagging_loss=0.01612, over 13998.00 frames. ], tot_loss[loss=0.1356, simple_loss=0.1403, pruned_loss=0.05228, audio_tagging_loss=0.0131, over 3043428.99 frames. ], batch size: 55, lr: 2.87e-02, grad_scale: 32.0 2023-11-18 06:48:06,681 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.39 vs. limit=22.5 2023-11-18 06:48:32,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=106960.0, ans=0.1 2023-11-18 06:48:33,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=106960.0, ans=0.125 2023-11-18 06:48:52,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.91 vs. limit=6.0 2023-11-18 06:48:58,675 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4050, loss[loss=0.1274, simple_loss=0.1296, pruned_loss=0.0468, audio_tagging_loss=0.01583, over 15480.00 frames. ], tot_loss[loss=0.1347, simple_loss=0.1394, pruned_loss=0.05177, audio_tagging_loss=0.01316, over 3042767.86 frames. ], batch size: 61, lr: 2.87e-02, grad_scale: 32.0 2023-11-18 06:48:58,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=107160.0, ans=0.2 2023-11-18 06:48:59,821 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:49:01,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=107160.0, ans=0.125 2023-11-18 06:49:01,416 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.53 vs. limit=15.0 2023-11-18 06:49:09,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=107226.66666666667, ans=0.0 2023-11-18 06:49:10,440 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.98 vs. limit=15.0 2023-11-18 06:49:12,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=107226.66666666667, ans=0.05 2023-11-18 06:49:16,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=107226.66666666667, ans=0.0 2023-11-18 06:49:31,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=107293.33333333333, ans=0.0 2023-11-18 06:49:37,060 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.55 vs. limit=22.5 2023-11-18 06:49:39,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=107360.0, ans=0.125 2023-11-18 06:49:41,708 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.845e+01 1.077e+02 1.199e+02 1.331e+02 2.496e+02, threshold=2.397e+02, percent-clipped=1.0 2023-11-18 06:49:55,693 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4100, loss[loss=0.1628, simple_loss=0.1735, pruned_loss=0.06631, audio_tagging_loss=0.009777, over 15354.00 frames. ], tot_loss[loss=0.1354, simple_loss=0.1404, pruned_loss=0.05211, audio_tagging_loss=0.01303, over 3039994.84 frames. ], batch size: 56, lr: 2.87e-02, grad_scale: 32.0 2023-11-18 06:49:59,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=107493.33333333333, ans=0.2 2023-11-18 06:50:00,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=107493.33333333333, ans=0.0 2023-11-18 06:50:03,763 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.59 vs. limit=15.0 2023-11-18 06:50:22,016 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.31 vs. limit=12.0 2023-11-18 06:50:45,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=107760.0, ans=0.1 2023-11-18 06:50:51,906 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4150, loss[loss=0.149, simple_loss=0.1542, pruned_loss=0.05946, audio_tagging_loss=0.01247, over 15191.00 frames. ], tot_loss[loss=0.1356, simple_loss=0.1409, pruned_loss=0.05233, audio_tagging_loss=0.01285, over 3045316.21 frames. ], batch size: 56, lr: 2.86e-02, grad_scale: 32.0 2023-11-18 06:50:57,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=107826.66666666667, ans=0.125 2023-11-18 06:50:59,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=107826.66666666667, ans=0.125 2023-11-18 06:51:00,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=107826.66666666667, ans=0.125 2023-11-18 06:51:31,974 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:51:32,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=108026.66666666667, ans=0.125 2023-11-18 06:51:33,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=108026.66666666667, ans=0.0 2023-11-18 06:51:34,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=108026.66666666667, ans=0.125 2023-11-18 06:51:35,153 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.314e+01 1.032e+02 1.149e+02 1.336e+02 2.371e+02, threshold=2.297e+02, percent-clipped=0.0 2023-11-18 06:51:36,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=108093.33333333333, ans=0.125 2023-11-18 06:51:48,695 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4200, loss[loss=0.1538, simple_loss=0.1625, pruned_loss=0.06188, audio_tagging_loss=0.01068, over 15896.00 frames. ], tot_loss[loss=0.1355, simple_loss=0.1411, pruned_loss=0.05227, audio_tagging_loss=0.01265, over 3044916.84 frames. ], batch size: 56, lr: 2.86e-02, grad_scale: 32.0 2023-11-18 06:51:58,779 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.22 vs. limit=15.0 2023-11-18 06:52:04,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=108226.66666666667, ans=0.0 2023-11-18 06:52:08,467 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.27 vs. limit=15.0 2023-11-18 06:52:15,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=108293.33333333333, ans=0.2 2023-11-18 06:52:26,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=108360.0, ans=0.05 2023-11-18 06:52:38,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=108426.66666666667, ans=0.05 2023-11-18 06:52:44,350 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4250, loss[loss=0.118, simple_loss=0.1218, pruned_loss=0.04289, audio_tagging_loss=0.01419, over 14830.00 frames. ], tot_loss[loss=0.134, simple_loss=0.1398, pruned_loss=0.05147, audio_tagging_loss=0.01262, over 3046528.94 frames. ], batch size: 55, lr: 2.85e-02, grad_scale: 32.0 2023-11-18 06:52:46,532 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.65 vs. limit=15.0 2023-11-18 06:52:52,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=108493.33333333333, ans=0.0 2023-11-18 06:53:02,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=108560.0, ans=0.1 2023-11-18 06:53:06,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=108626.66666666667, ans=0.125 2023-11-18 06:53:26,873 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 9.019e+01 1.076e+02 1.189e+02 1.301e+02 1.957e+02, threshold=2.378e+02, percent-clipped=0.0 2023-11-18 06:53:41,497 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4300, loss[loss=0.1041, simple_loss=0.1076, pruned_loss=0.03731, audio_tagging_loss=0.01297, over 15567.00 frames. ], tot_loss[loss=0.1345, simple_loss=0.1405, pruned_loss=0.05169, audio_tagging_loss=0.01254, over 3052059.65 frames. ], batch size: 59, lr: 2.85e-02, grad_scale: 32.0 2023-11-18 06:53:47,834 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.21 vs. limit=22.5 2023-11-18 06:54:07,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=108960.0, ans=0.0 2023-11-18 06:54:07,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=108960.0, ans=0.125 2023-11-18 06:54:08,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=108960.0, ans=0.0 2023-11-18 06:54:11,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=108960.0, ans=0.125 2023-11-18 06:54:14,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.38 vs. limit=12.0 2023-11-18 06:54:16,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=109026.66666666667, ans=0.1 2023-11-18 06:54:37,456 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4350, loss[loss=0.166, simple_loss=0.1937, pruned_loss=0.05921, audio_tagging_loss=0.009949, over 16498.00 frames. ], tot_loss[loss=0.1354, simple_loss=0.1413, pruned_loss=0.05204, audio_tagging_loss=0.01265, over 3048300.96 frames. ], batch size: 57, lr: 2.85e-02, grad_scale: 32.0 2023-11-18 06:54:47,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=109226.66666666667, ans=0.125 2023-11-18 06:54:48,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=109226.66666666667, ans=0.125 2023-11-18 06:54:55,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=109226.66666666667, ans=0.1 2023-11-18 06:54:55,833 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-11-18 06:55:20,485 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.140e+01 1.013e+02 1.155e+02 1.315e+02 2.105e+02, threshold=2.309e+02, percent-clipped=0.0 2023-11-18 06:55:33,448 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4400, loss[loss=0.1156, simple_loss=0.129, pruned_loss=0.03992, audio_tagging_loss=0.01112, over 15283.00 frames. ], tot_loss[loss=0.1353, simple_loss=0.1413, pruned_loss=0.05204, audio_tagging_loss=0.0126, over 3044902.92 frames. ], batch size: 58, lr: 2.84e-02, grad_scale: 32.0 2023-11-18 06:55:44,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=109560.0, ans=0.0 2023-11-18 06:55:47,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=109560.0, ans=0.125 2023-11-18 06:55:52,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=109560.0, ans=0.1 2023-11-18 06:55:58,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=109626.66666666667, ans=0.125 2023-11-18 06:56:17,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=109760.0, ans=0.2 2023-11-18 06:56:24,582 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.47 vs. limit=15.0 2023-11-18 06:56:29,201 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4450, loss[loss=0.1395, simple_loss=0.1492, pruned_loss=0.05354, audio_tagging_loss=0.01141, over 15959.00 frames. ], tot_loss[loss=0.135, simple_loss=0.1409, pruned_loss=0.05205, audio_tagging_loss=0.01252, over 3042665.83 frames. ], batch size: 59, lr: 2.84e-02, grad_scale: 32.0 2023-11-18 06:56:35,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=109826.66666666667, ans=10.0 2023-11-18 06:56:41,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=109893.33333333333, ans=0.0 2023-11-18 06:56:47,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=109893.33333333333, ans=0.0 2023-11-18 06:56:51,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=109960.0, ans=0.125 2023-11-18 06:57:03,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=110026.66666666667, ans=0.125 2023-11-18 06:57:06,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=110026.66666666667, ans=0.035 2023-11-18 06:57:11,964 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.260e+01 1.110e+02 1.220e+02 1.457e+02 2.260e+02, threshold=2.440e+02, percent-clipped=0.0 2023-11-18 06:57:26,476 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4500, loss[loss=0.1057, simple_loss=0.1116, pruned_loss=0.03934, audio_tagging_loss=0.01058, over 14521.00 frames. ], tot_loss[loss=0.1346, simple_loss=0.1402, pruned_loss=0.05197, audio_tagging_loss=0.01255, over 3046338.79 frames. ], batch size: 56, lr: 2.84e-02, grad_scale: 32.0 2023-11-18 06:57:40,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=110226.66666666667, ans=0.1 2023-11-18 06:57:54,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=110293.33333333333, ans=0.2 2023-11-18 06:58:06,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=110360.0, ans=0.0 2023-11-18 06:58:15,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=110426.66666666667, ans=0.0 2023-11-18 06:58:22,425 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4550, loss[loss=0.1339, simple_loss=0.1469, pruned_loss=0.05202, audio_tagging_loss=0.008435, over 15576.00 frames. ], tot_loss[loss=0.1337, simple_loss=0.1391, pruned_loss=0.0516, audio_tagging_loss=0.01258, over 3047574.39 frames. ], batch size: 58, lr: 2.83e-02, grad_scale: 32.0 2023-11-18 06:58:25,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=110493.33333333333, ans=0.2 2023-11-18 06:58:35,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=110560.0, ans=0.0 2023-11-18 06:58:41,959 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 06:59:03,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=110693.33333333333, ans=0.125 2023-11-18 06:59:05,467 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.958e+01 1.009e+02 1.148e+02 1.280e+02 1.877e+02, threshold=2.295e+02, percent-clipped=0.0 2023-11-18 06:59:05,508 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 06:59:15,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=110760.0, ans=0.125 2023-11-18 06:59:16,203 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.06 vs. limit=15.0 2023-11-18 06:59:18,758 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4600, loss[loss=0.1165, simple_loss=0.1057, pruned_loss=0.0444, audio_tagging_loss=0.01921, over 15906.00 frames. ], tot_loss[loss=0.1333, simple_loss=0.1387, pruned_loss=0.05132, audio_tagging_loss=0.01263, over 3045865.29 frames. ], batch size: 62, lr: 2.83e-02, grad_scale: 32.0 2023-11-18 06:59:32,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=110893.33333333333, ans=0.0 2023-11-18 06:59:41,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=110960.0, ans=0.09899494936611666 2023-11-18 06:59:49,960 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.90 vs. limit=22.5 2023-11-18 07:00:15,447 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4650, loss[loss=0.1228, simple_loss=0.1344, pruned_loss=0.04287, audio_tagging_loss=0.01268, over 16535.00 frames. ], tot_loss[loss=0.134, simple_loss=0.1393, pruned_loss=0.05172, audio_tagging_loss=0.01267, over 3045522.81 frames. ], batch size: 61, lr: 2.83e-02, grad_scale: 32.0 2023-11-18 07:00:27,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=111226.66666666667, ans=0.025 2023-11-18 07:00:32,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=111226.66666666667, ans=0.125 2023-11-18 07:00:38,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=111293.33333333333, ans=0.0 2023-11-18 07:00:43,078 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=12.0 2023-11-18 07:00:50,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=111360.0, ans=0.0 2023-11-18 07:00:57,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=111360.0, ans=0.0 2023-11-18 07:00:58,025 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.738e+01 1.062e+02 1.161e+02 1.332e+02 2.161e+02, threshold=2.322e+02, percent-clipped=0.0 2023-11-18 07:00:58,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=111360.0, ans=0.0 2023-11-18 07:01:08,256 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.72 vs. limit=15.0 2023-11-18 07:01:09,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=111426.66666666667, ans=0.0 2023-11-18 07:01:10,907 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4700, loss[loss=0.1567, simple_loss=0.1605, pruned_loss=0.0609, audio_tagging_loss=0.01554, over 16140.00 frames. ], tot_loss[loss=0.1326, simple_loss=0.1378, pruned_loss=0.05087, audio_tagging_loss=0.01287, over 3047634.39 frames. ], batch size: 59, lr: 2.82e-02, grad_scale: 32.0 2023-11-18 07:01:14,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2023-11-18 07:01:16,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=111493.33333333333, ans=0.0 2023-11-18 07:01:24,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=111560.0, ans=0.05 2023-11-18 07:01:30,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=111560.0, ans=0.125 2023-11-18 07:01:43,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=111626.66666666667, ans=0.125 2023-11-18 07:01:50,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=111693.33333333333, ans=0.0 2023-11-18 07:01:59,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=111760.0, ans=0.035 2023-11-18 07:02:00,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=111760.0, ans=0.125 2023-11-18 07:02:06,894 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4750, loss[loss=0.09863, simple_loss=0.1002, pruned_loss=0.03207, audio_tagging_loss=0.01649, over 14914.00 frames. ], tot_loss[loss=0.1326, simple_loss=0.1378, pruned_loss=0.05072, audio_tagging_loss=0.01297, over 3041861.93 frames. ], batch size: 56, lr: 2.82e-02, grad_scale: 32.0 2023-11-18 07:02:31,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=111960.0, ans=0.0 2023-11-18 07:02:36,958 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.26 vs. limit=15.0 2023-11-18 07:02:49,768 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.988e+01 1.063e+02 1.146e+02 1.305e+02 1.876e+02, threshold=2.292e+02, percent-clipped=0.0 2023-11-18 07:03:03,776 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4800, loss[loss=0.1222, simple_loss=0.1232, pruned_loss=0.04436, audio_tagging_loss=0.01626, over 15158.00 frames. ], tot_loss[loss=0.1332, simple_loss=0.1386, pruned_loss=0.05084, audio_tagging_loss=0.01304, over 3049016.96 frames. ], batch size: 58, lr: 2.82e-02, grad_scale: 32.0 2023-11-18 07:03:29,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=112293.33333333333, ans=0.125 2023-11-18 07:03:49,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=112426.66666666667, ans=0.125 2023-11-18 07:03:52,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=112426.66666666667, ans=0.125 2023-11-18 07:03:59,459 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.76 vs. limit=22.5 2023-11-18 07:03:59,955 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4850, loss[loss=0.1045, simple_loss=0.1085, pruned_loss=0.03785, audio_tagging_loss=0.01238, over 16036.00 frames. ], tot_loss[loss=0.1325, simple_loss=0.1379, pruned_loss=0.05041, audio_tagging_loss=0.01316, over 3050020.92 frames. ], batch size: 63, lr: 2.81e-02, grad_scale: 32.0 2023-11-18 07:04:02,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=112493.33333333333, ans=0.125 2023-11-18 07:04:05,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=112493.33333333333, ans=0.2 2023-11-18 07:04:14,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=112560.0, ans=0.125 2023-11-18 07:04:16,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=112560.0, ans=0.125 2023-11-18 07:04:18,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=112560.0, ans=0.1 2023-11-18 07:04:29,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=112626.66666666667, ans=0.2 2023-11-18 07:04:38,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=112693.33333333333, ans=0.0 2023-11-18 07:04:39,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=112693.33333333333, ans=0.125 2023-11-18 07:04:42,629 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.822e+01 1.047e+02 1.164e+02 1.344e+02 1.766e+02, threshold=2.328e+02, percent-clipped=0.0 2023-11-18 07:04:56,049 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4900, loss[loss=0.08933, simple_loss=0.08078, pruned_loss=0.03389, audio_tagging_loss=0.01504, over 14619.00 frames. ], tot_loss[loss=0.1325, simple_loss=0.138, pruned_loss=0.05051, audio_tagging_loss=0.01296, over 3049839.69 frames. ], batch size: 58, lr: 2.81e-02, grad_scale: 32.0 2023-11-18 07:05:05,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=112826.66666666667, ans=0.2 2023-11-18 07:05:06,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=112893.33333333333, ans=0.0 2023-11-18 07:05:12,362 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2023-11-18 07:05:23,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=112960.0, ans=0.125 2023-11-18 07:05:25,377 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:05:25,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=112960.0, ans=0.1 2023-11-18 07:05:51,914 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 4950, loss[loss=0.1754, simple_loss=0.1983, pruned_loss=0.06575, audio_tagging_loss=0.01044, over 15621.00 frames. ], tot_loss[loss=0.1322, simple_loss=0.1385, pruned_loss=0.05034, audio_tagging_loss=0.01268, over 3044820.25 frames. ], batch size: 54, lr: 2.81e-02, grad_scale: 64.0 2023-11-18 07:05:55,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=113160.0, ans=0.0 2023-11-18 07:05:56,941 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:06:13,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=113293.33333333333, ans=0.125 2023-11-18 07:06:26,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=113360.0, ans=0.0 2023-11-18 07:06:33,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=113360.0, ans=0.125 2023-11-18 07:06:34,752 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.114e+01 1.049e+02 1.181e+02 1.339e+02 2.582e+02, threshold=2.362e+02, percent-clipped=1.0 2023-11-18 07:06:48,204 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5000, loss[loss=0.1142, simple_loss=0.1177, pruned_loss=0.04228, audio_tagging_loss=0.01304, over 14009.00 frames. ], tot_loss[loss=0.1313, simple_loss=0.1373, pruned_loss=0.05014, audio_tagging_loss=0.0125, over 3047364.86 frames. ], batch size: 55, lr: 2.80e-02, grad_scale: 64.0 2023-11-18 07:06:51,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2023-11-18 07:06:55,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=113493.33333333333, ans=0.125 2023-11-18 07:07:16,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=113626.66666666667, ans=0.125 2023-11-18 07:07:35,341 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.69 vs. limit=12.0 2023-11-18 07:07:44,827 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5050, loss[loss=0.1726, simple_loss=0.1761, pruned_loss=0.07259, audio_tagging_loss=0.01195, over 15275.00 frames. ], tot_loss[loss=0.1298, simple_loss=0.1358, pruned_loss=0.0495, audio_tagging_loss=0.01242, over 3052698.12 frames. ], batch size: 58, lr: 2.80e-02, grad_scale: 64.0 2023-11-18 07:08:14,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.20 vs. limit=6.0 2023-11-18 07:08:22,463 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.64 vs. limit=15.0 2023-11-18 07:08:27,547 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.042e+01 1.016e+02 1.164e+02 1.342e+02 1.810e+02, threshold=2.328e+02, percent-clipped=0.0 2023-11-18 07:08:38,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=114093.33333333333, ans=0.125 2023-11-18 07:08:41,044 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5100, loss[loss=0.1612, simple_loss=0.1632, pruned_loss=0.06738, audio_tagging_loss=0.01227, over 15597.00 frames. ], tot_loss[loss=0.1298, simple_loss=0.1355, pruned_loss=0.04943, audio_tagging_loss=0.01258, over 3045191.91 frames. ], batch size: 57, lr: 2.79e-02, grad_scale: 64.0 2023-11-18 07:08:42,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=114160.0, ans=0.125 2023-11-18 07:09:02,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=114293.33333333333, ans=0.0 2023-11-18 07:09:33,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=114426.66666666667, ans=0.0 2023-11-18 07:09:34,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=114426.66666666667, ans=0.125 2023-11-18 07:09:36,982 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.78 vs. limit=12.0 2023-11-18 07:09:37,376 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5150, loss[loss=0.1246, simple_loss=0.1254, pruned_loss=0.04291, audio_tagging_loss=0.01903, over 15426.00 frames. ], tot_loss[loss=0.13, simple_loss=0.1356, pruned_loss=0.04953, audio_tagging_loss=0.01268, over 3044000.71 frames. ], batch size: 58, lr: 2.79e-02, grad_scale: 16.0 2023-11-18 07:09:45,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=114493.33333333333, ans=0.125 2023-11-18 07:09:47,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=114560.0, ans=0.125 2023-11-18 07:09:51,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=114560.0, ans=0.05 2023-11-18 07:10:22,537 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.209e+01 1.018e+02 1.157e+02 1.320e+02 3.492e+02, threshold=2.315e+02, percent-clipped=2.0 2023-11-18 07:10:33,997 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5200, loss[loss=0.127, simple_loss=0.1355, pruned_loss=0.04694, audio_tagging_loss=0.01234, over 14993.00 frames. ], tot_loss[loss=0.1303, simple_loss=0.136, pruned_loss=0.04959, audio_tagging_loss=0.01272, over 3044803.38 frames. ], batch size: 56, lr: 2.79e-02, grad_scale: 32.0 2023-11-18 07:10:35,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=114826.66666666667, ans=0.125 2023-11-18 07:10:37,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.48 vs. limit=22.5 2023-11-18 07:10:38,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=114826.66666666667, ans=0.2 2023-11-18 07:10:39,103 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.56 vs. limit=15.0 2023-11-18 07:10:49,184 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.19 vs. limit=10.0 2023-11-18 07:10:56,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=114960.0, ans=0.0 2023-11-18 07:10:58,798 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:11:00,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=114960.0, ans=0.0 2023-11-18 07:11:07,439 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:11:13,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.02 vs. limit=10.0 2023-11-18 07:11:18,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=115093.33333333333, ans=0.1 2023-11-18 07:11:29,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=115160.0, ans=0.04949747468305833 2023-11-18 07:11:30,237 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5250, loss[loss=0.1354, simple_loss=0.1497, pruned_loss=0.04824, audio_tagging_loss=0.01225, over 15778.00 frames. ], tot_loss[loss=0.1297, simple_loss=0.1354, pruned_loss=0.04929, audio_tagging_loss=0.01268, over 3038318.80 frames. ], batch size: 56, lr: 2.78e-02, grad_scale: 32.0 2023-11-18 07:11:36,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=115160.0, ans=0.0 2023-11-18 07:11:40,993 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=7.46 vs. limit=15.0 2023-11-18 07:11:41,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=115226.66666666667, ans=0.1 2023-11-18 07:11:44,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=115226.66666666667, ans=0.125 2023-11-18 07:11:56,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=115293.33333333333, ans=0.125 2023-11-18 07:12:08,093 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.84 vs. limit=22.5 2023-11-18 07:12:12,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=115360.0, ans=10.0 2023-11-18 07:12:15,368 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.068e+01 1.019e+02 1.120e+02 1.285e+02 1.660e+02, threshold=2.240e+02, percent-clipped=0.0 2023-11-18 07:12:24,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=115426.66666666667, ans=0.0 2023-11-18 07:12:26,633 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5300, loss[loss=0.0915, simple_loss=0.09562, pruned_loss=0.02862, audio_tagging_loss=0.01507, over 15499.00 frames. ], tot_loss[loss=0.1313, simple_loss=0.137, pruned_loss=0.04998, audio_tagging_loss=0.01285, over 3041557.65 frames. ], batch size: 60, lr: 2.78e-02, grad_scale: 32.0 2023-11-18 07:12:33,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_ff3.min_abs, batch_count=115493.33333333333, ans=0.2 2023-11-18 07:12:33,820 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2023-11-18 07:12:46,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=115560.0, ans=0.0 2023-11-18 07:12:58,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=115626.66666666667, ans=0.05 2023-11-18 07:13:04,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=115693.33333333333, ans=0.125 2023-11-18 07:13:07,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=115693.33333333333, ans=0.1 2023-11-18 07:13:22,351 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5350, loss[loss=0.1586, simple_loss=0.1728, pruned_loss=0.06539, audio_tagging_loss=0.006862, over 15606.00 frames. ], tot_loss[loss=0.1307, simple_loss=0.1363, pruned_loss=0.0498, audio_tagging_loss=0.01278, over 3036422.73 frames. ], batch size: 56, lr: 2.78e-02, grad_scale: 32.0 2023-11-18 07:13:45,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=115960.0, ans=0.0 2023-11-18 07:13:47,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=115960.0, ans=0.125 2023-11-18 07:14:04,009 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:14:07,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=116093.33333333333, ans=0.125 2023-11-18 07:14:08,566 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.439e+01 1.052e+02 1.204e+02 1.359e+02 2.060e+02, threshold=2.407e+02, percent-clipped=0.0 2023-11-18 07:14:15,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=116093.33333333333, ans=0.1 2023-11-18 07:14:20,306 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5400, loss[loss=0.1051, simple_loss=0.1068, pruned_loss=0.03633, audio_tagging_loss=0.01531, over 14986.00 frames. ], tot_loss[loss=0.131, simple_loss=0.1364, pruned_loss=0.04992, audio_tagging_loss=0.01282, over 3043722.30 frames. ], batch size: 57, lr: 2.77e-02, grad_scale: 32.0 2023-11-18 07:14:31,033 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=18.26 vs. limit=15.0 2023-11-18 07:14:55,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=116360.0, ans=0.0 2023-11-18 07:15:16,454 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5450, loss[loss=0.1422, simple_loss=0.139, pruned_loss=0.05727, audio_tagging_loss=0.01542, over 16121.00 frames. ], tot_loss[loss=0.1308, simple_loss=0.1364, pruned_loss=0.04981, audio_tagging_loss=0.01285, over 3048126.67 frames. ], batch size: 62, lr: 2.77e-02, grad_scale: 32.0 2023-11-18 07:15:20,088 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=6.95 vs. limit=15.0 2023-11-18 07:15:40,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=116626.66666666667, ans=0.125 2023-11-18 07:15:42,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=116626.66666666667, ans=0.125 2023-11-18 07:15:52,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=116693.33333333333, ans=0.125 2023-11-18 07:16:01,589 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.036e+01 1.012e+02 1.167e+02 1.341e+02 1.969e+02, threshold=2.335e+02, percent-clipped=0.0 2023-11-18 07:16:03,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=116760.0, ans=0.2 2023-11-18 07:16:12,379 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5500, loss[loss=0.1251, simple_loss=0.1312, pruned_loss=0.0473, audio_tagging_loss=0.01222, over 15407.00 frames. ], tot_loss[loss=0.1311, simple_loss=0.1369, pruned_loss=0.04995, audio_tagging_loss=0.01272, over 3055487.27 frames. ], batch size: 58, lr: 2.77e-02, grad_scale: 32.0 2023-11-18 07:16:19,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=116826.66666666667, ans=0.0 2023-11-18 07:16:23,739 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:16:50,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=117026.66666666667, ans=0.2 2023-11-18 07:17:04,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=117093.33333333333, ans=0.1 2023-11-18 07:17:08,006 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5550, loss[loss=0.1307, simple_loss=0.1433, pruned_loss=0.04773, audio_tagging_loss=0.01138, over 14390.00 frames. ], tot_loss[loss=0.1304, simple_loss=0.1358, pruned_loss=0.04959, audio_tagging_loss=0.01293, over 3050376.35 frames. ], batch size: 56, lr: 2.76e-02, grad_scale: 32.0 2023-11-18 07:17:12,853 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.23 vs. limit=15.0 2023-11-18 07:17:16,401 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.24 vs. limit=10.0 2023-11-18 07:17:20,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=117226.66666666667, ans=0.125 2023-11-18 07:17:26,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=117226.66666666667, ans=0.2 2023-11-18 07:17:30,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=117293.33333333333, ans=0.0 2023-11-18 07:17:34,912 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.59 vs. limit=22.5 2023-11-18 07:17:43,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=117360.0, ans=0.09899494936611666 2023-11-18 07:17:48,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=117360.0, ans=0.125 2023-11-18 07:17:48,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=117360.0, ans=0.125 2023-11-18 07:17:53,709 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.091e+01 1.048e+02 1.161e+02 1.291e+02 1.886e+02, threshold=2.323e+02, percent-clipped=0.0 2023-11-18 07:18:05,485 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5600, loss[loss=0.1435, simple_loss=0.1553, pruned_loss=0.05374, audio_tagging_loss=0.01212, over 14347.00 frames. ], tot_loss[loss=0.1316, simple_loss=0.1375, pruned_loss=0.04985, audio_tagging_loss=0.01305, over 3054753.89 frames. ], batch size: 58, lr: 2.76e-02, grad_scale: 32.0 2023-11-18 07:18:07,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=117493.33333333333, ans=0.125 2023-11-18 07:18:18,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=117560.0, ans=0.125 2023-11-18 07:18:29,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.83 vs. limit=15.0 2023-11-18 07:18:30,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=117626.66666666667, ans=0.1 2023-11-18 07:18:30,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=117626.66666666667, ans=0.0 2023-11-18 07:18:36,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=117626.66666666667, ans=0.0 2023-11-18 07:18:42,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=117693.33333333333, ans=0.035 2023-11-18 07:18:45,164 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 07:18:57,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=117760.0, ans=0.1 2023-11-18 07:19:01,203 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5650, loss[loss=0.1747, simple_loss=0.1792, pruned_loss=0.07201, audio_tagging_loss=0.01308, over 15751.00 frames. ], tot_loss[loss=0.1324, simple_loss=0.1383, pruned_loss=0.05021, audio_tagging_loss=0.01305, over 3052898.66 frames. ], batch size: 58, lr: 2.76e-02, grad_scale: 32.0 2023-11-18 07:19:01,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=117826.66666666667, ans=0.125 2023-11-18 07:19:09,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=117826.66666666667, ans=0.125 2023-11-18 07:19:10,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=117826.66666666667, ans=0.1 2023-11-18 07:19:11,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=117893.33333333333, ans=0.0 2023-11-18 07:19:28,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=117960.0, ans=0.0 2023-11-18 07:19:28,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=117960.0, ans=0.125 2023-11-18 07:19:39,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=118026.66666666667, ans=0.125 2023-11-18 07:19:42,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.37 vs. limit=15.0 2023-11-18 07:19:44,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=118026.66666666667, ans=0.1 2023-11-18 07:19:46,193 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.436e+01 1.030e+02 1.132e+02 1.306e+02 2.340e+02, threshold=2.264e+02, percent-clipped=1.0 2023-11-18 07:19:50,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=118093.33333333333, ans=0.125 2023-11-18 07:19:53,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=118093.33333333333, ans=0.0 2023-11-18 07:19:57,373 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5700, loss[loss=0.09499, simple_loss=0.08382, pruned_loss=0.03608, audio_tagging_loss=0.017, over 14907.00 frames. ], tot_loss[loss=0.1322, simple_loss=0.1378, pruned_loss=0.05039, audio_tagging_loss=0.01292, over 3055316.22 frames. ], batch size: 59, lr: 2.75e-02, grad_scale: 32.0 2023-11-18 07:20:01,721 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2023-11-18 07:20:21,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=118293.33333333333, ans=0.125 2023-11-18 07:20:28,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=118293.33333333333, ans=0.0 2023-11-18 07:20:28,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=118293.33333333333, ans=0.2 2023-11-18 07:20:36,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=118360.0, ans=0.0 2023-11-18 07:20:46,672 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:20:47,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=118426.66666666667, ans=0.0 2023-11-18 07:20:48,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=118426.66666666667, ans=0.1 2023-11-18 07:20:53,853 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5750, loss[loss=0.1006, simple_loss=0.09841, pruned_loss=0.04071, audio_tagging_loss=0.01072, over 16360.00 frames. ], tot_loss[loss=0.1316, simple_loss=0.1372, pruned_loss=0.05027, audio_tagging_loss=0.01269, over 3055077.60 frames. ], batch size: 64, lr: 2.75e-02, grad_scale: 32.0 2023-11-18 07:21:00,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=15.0 2023-11-18 07:21:03,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=118560.0, ans=0.2 2023-11-18 07:21:14,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=118626.66666666667, ans=0.2 2023-11-18 07:21:38,451 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 1.021e+02 1.146e+02 1.318e+02 2.072e+02, threshold=2.291e+02, percent-clipped=0.0 2023-11-18 07:21:49,063 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5800, loss[loss=0.09915, simple_loss=0.1135, pruned_loss=0.03057, audio_tagging_loss=0.01184, over 15044.00 frames. ], tot_loss[loss=0.1294, simple_loss=0.1353, pruned_loss=0.04921, audio_tagging_loss=0.01258, over 3053168.67 frames. ], batch size: 55, lr: 2.75e-02, grad_scale: 32.0 2023-11-18 07:21:52,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=118826.66666666667, ans=0.125 2023-11-18 07:21:59,371 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:22:00,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=118893.33333333333, ans=0.04949747468305833 2023-11-18 07:22:08,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=118893.33333333333, ans=0.125 2023-11-18 07:22:12,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.66 vs. limit=6.0 2023-11-18 07:22:15,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=118960.0, ans=0.2 2023-11-18 07:22:32,923 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:22:44,824 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5850, loss[loss=0.08612, simple_loss=0.084, pruned_loss=0.02971, audio_tagging_loss=0.01441, over 14608.00 frames. ], tot_loss[loss=0.1285, simple_loss=0.1342, pruned_loss=0.04876, audio_tagging_loss=0.01268, over 3051619.21 frames. ], batch size: 56, lr: 2.74e-02, grad_scale: 32.0 2023-11-18 07:23:04,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=119226.66666666667, ans=0.0 2023-11-18 07:23:29,154 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.248e+01 1.029e+02 1.172e+02 1.323e+02 1.755e+02, threshold=2.344e+02, percent-clipped=0.0 2023-11-18 07:23:40,988 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5900, loss[loss=0.1444, simple_loss=0.1496, pruned_loss=0.0553, audio_tagging_loss=0.01426, over 17152.00 frames. ], tot_loss[loss=0.1274, simple_loss=0.1331, pruned_loss=0.04809, audio_tagging_loss=0.0128, over 3046642.76 frames. ], batch size: 64, lr: 2.74e-02, grad_scale: 32.0 2023-11-18 07:23:42,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=119493.33333333333, ans=0.125 2023-11-18 07:24:22,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=119693.33333333333, ans=0.0 2023-11-18 07:24:36,658 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 5950, loss[loss=0.1179, simple_loss=0.1208, pruned_loss=0.04683, audio_tagging_loss=0.0107, over 16028.00 frames. ], tot_loss[loss=0.1279, simple_loss=0.1337, pruned_loss=0.04829, audio_tagging_loss=0.01275, over 3051947.53 frames. ], batch size: 60, lr: 2.74e-02, grad_scale: 32.0 2023-11-18 07:24:41,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=119826.66666666667, ans=0.0 2023-11-18 07:24:42,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=119826.66666666667, ans=0.125 2023-11-18 07:24:48,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=119893.33333333333, ans=0.125 2023-11-18 07:24:59,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=119960.0, ans=0.1 2023-11-18 07:25:09,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=120026.66666666667, ans=0.125 2023-11-18 07:25:21,029 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.274e+01 1.016e+02 1.169e+02 1.321e+02 1.949e+02, threshold=2.338e+02, percent-clipped=0.0 2023-11-18 07:25:32,023 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6000, loss[loss=0.1179, simple_loss=0.1241, pruned_loss=0.04357, audio_tagging_loss=0.01232, over 14958.00 frames. ], tot_loss[loss=0.1276, simple_loss=0.1335, pruned_loss=0.04812, audio_tagging_loss=0.0127, over 3045616.10 frames. ], batch size: 58, lr: 2.73e-02, grad_scale: 32.0 2023-11-18 07:25:32,025 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 07:26:00,556 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.1815, 3.6200, 1.7093, 4.0341], device='cuda:0') 2023-11-18 07:26:04,342 INFO [train_asr.py:1147] (0/4) Epoch 2, validation: loss=0.08772, simple_loss=0.06916, pruned_loss=0.01519, audio_tagging_loss=0.03794, over 4681554.00 frames. 2023-11-18 07:26:04,342 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 07:26:21,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=120226.66666666667, ans=0.0 2023-11-18 07:26:23,302 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.98 vs. limit=15.0 2023-11-18 07:26:34,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=120293.33333333333, ans=0.2 2023-11-18 07:26:36,768 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.58 vs. limit=12.0 2023-11-18 07:26:39,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=120360.0, ans=0.125 2023-11-18 07:26:44,870 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 07:26:53,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=120426.66666666667, ans=0.1 2023-11-18 07:26:57,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=120426.66666666667, ans=0.125 2023-11-18 07:27:00,584 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6050, loss[loss=0.1423, simple_loss=0.1627, pruned_loss=0.05056, audio_tagging_loss=0.01033, over 16456.00 frames. ], tot_loss[loss=0.1284, simple_loss=0.1343, pruned_loss=0.04862, audio_tagging_loss=0.01261, over 3050433.13 frames. ], batch size: 59, lr: 2.73e-02, grad_scale: 32.0 2023-11-18 07:27:04,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=120493.33333333333, ans=0.0 2023-11-18 07:27:23,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=120626.66666666667, ans=0.1 2023-11-18 07:27:46,216 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.607e+01 1.101e+02 1.234e+02 1.349e+02 2.388e+02, threshold=2.468e+02, percent-clipped=1.0 2023-11-18 07:27:46,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=120760.0, ans=0.0 2023-11-18 07:27:48,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=120760.0, ans=0.2 2023-11-18 07:27:57,583 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6100, loss[loss=0.1022, simple_loss=0.1022, pruned_loss=0.03439, audio_tagging_loss=0.0167, over 15513.00 frames. ], tot_loss[loss=0.1281, simple_loss=0.134, pruned_loss=0.04833, audio_tagging_loss=0.01274, over 3050577.30 frames. ], batch size: 60, lr: 2.73e-02, grad_scale: 32.0 2023-11-18 07:28:12,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=120893.33333333333, ans=0.125 2023-11-18 07:28:30,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=121026.66666666667, ans=0.0 2023-11-18 07:28:54,893 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6150, loss[loss=0.1662, simple_loss=0.1722, pruned_loss=0.06798, audio_tagging_loss=0.01218, over 16272.00 frames. ], tot_loss[loss=0.1287, simple_loss=0.1346, pruned_loss=0.04866, audio_tagging_loss=0.01269, over 3049386.19 frames. ], batch size: 60, lr: 2.73e-02, grad_scale: 32.0 2023-11-18 07:29:02,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=121160.0, ans=0.125 2023-11-18 07:29:26,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=121293.33333333333, ans=0.0 2023-11-18 07:29:27,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=121360.0, ans=0.125 2023-11-18 07:29:32,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.80 vs. limit=22.5 2023-11-18 07:29:37,386 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=12.0 2023-11-18 07:29:40,651 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.796e+01 1.065e+02 1.218e+02 1.371e+02 2.442e+02, threshold=2.436e+02, percent-clipped=0.0 2023-11-18 07:29:46,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=121426.66666666667, ans=0.125 2023-11-18 07:29:52,074 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6200, loss[loss=0.1316, simple_loss=0.1382, pruned_loss=0.05072, audio_tagging_loss=0.01173, over 15358.00 frames. ], tot_loss[loss=0.1291, simple_loss=0.1345, pruned_loss=0.04889, audio_tagging_loss=0.01294, over 3044784.58 frames. ], batch size: 57, lr: 2.72e-02, grad_scale: 32.0 2023-11-18 07:29:56,904 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.86 vs. limit=22.5 2023-11-18 07:30:12,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=121560.0, ans=0.0 2023-11-18 07:30:15,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=121626.66666666667, ans=0.125 2023-11-18 07:30:44,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=121760.0, ans=0.125 2023-11-18 07:30:48,817 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6250, loss[loss=0.0755, simple_loss=0.07045, pruned_loss=0.02125, audio_tagging_loss=0.01902, over 15626.00 frames. ], tot_loss[loss=0.1281, simple_loss=0.1332, pruned_loss=0.04833, audio_tagging_loss=0.01314, over 3046361.06 frames. ], batch size: 59, lr: 2.72e-02, grad_scale: 32.0 2023-11-18 07:30:54,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=121826.66666666667, ans=0.125 2023-11-18 07:31:22,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=122026.66666666667, ans=0.125 2023-11-18 07:31:24,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=122026.66666666667, ans=0.2 2023-11-18 07:31:33,887 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 1.000e+02 1.109e+02 1.237e+02 1.670e+02, threshold=2.218e+02, percent-clipped=0.0 2023-11-18 07:31:34,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=122093.33333333333, ans=0.1 2023-11-18 07:31:42,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=122093.33333333333, ans=0.0 2023-11-18 07:31:45,282 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6300, loss[loss=0.1584, simple_loss=0.1671, pruned_loss=0.06204, audio_tagging_loss=0.01275, over 15640.00 frames. ], tot_loss[loss=0.1283, simple_loss=0.1336, pruned_loss=0.04844, audio_tagging_loss=0.01305, over 3042948.83 frames. ], batch size: 56, lr: 2.72e-02, grad_scale: 32.0 2023-11-18 07:31:50,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=122160.0, ans=0.0 2023-11-18 07:31:59,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.28 vs. limit=15.0 2023-11-18 07:32:03,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=122226.66666666667, ans=0.125 2023-11-18 07:32:05,470 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.75 vs. limit=15.0 2023-11-18 07:32:16,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=122293.33333333333, ans=0.125 2023-11-18 07:32:25,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=122360.0, ans=0.0 2023-11-18 07:32:31,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=122426.66666666667, ans=0.125 2023-11-18 07:32:42,050 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6350, loss[loss=0.1311, simple_loss=0.1406, pruned_loss=0.04913, audio_tagging_loss=0.01167, over 14959.00 frames. ], tot_loss[loss=0.1291, simple_loss=0.1344, pruned_loss=0.04883, audio_tagging_loss=0.01311, over 3051788.83 frames. ], batch size: 54, lr: 2.71e-02, grad_scale: 32.0 2023-11-18 07:32:45,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=122493.33333333333, ans=0.125 2023-11-18 07:32:45,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=122493.33333333333, ans=0.125 2023-11-18 07:32:52,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=122560.0, ans=0.1 2023-11-18 07:32:57,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=122560.0, ans=0.125 2023-11-18 07:32:59,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=122560.0, ans=0.04949747468305833 2023-11-18 07:33:27,173 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.01 vs. limit=15.0 2023-11-18 07:33:27,680 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.961e+01 1.030e+02 1.146e+02 1.327e+02 2.114e+02, threshold=2.291e+02, percent-clipped=0.0 2023-11-18 07:33:37,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=122760.0, ans=0.1 2023-11-18 07:33:37,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=122760.0, ans=0.05 2023-11-18 07:33:38,768 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:33:39,612 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6400, loss[loss=0.1191, simple_loss=0.1203, pruned_loss=0.04433, audio_tagging_loss=0.01456, over 15221.00 frames. ], tot_loss[loss=0.1292, simple_loss=0.1346, pruned_loss=0.0488, audio_tagging_loss=0.01308, over 3051808.56 frames. ], batch size: 58, lr: 2.71e-02, grad_scale: 32.0 2023-11-18 07:33:59,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=122893.33333333333, ans=0.125 2023-11-18 07:33:59,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.04 vs. limit=22.5 2023-11-18 07:34:33,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=123093.33333333333, ans=0.125 2023-11-18 07:34:35,489 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6450, loss[loss=0.1281, simple_loss=0.1325, pruned_loss=0.04828, audio_tagging_loss=0.01355, over 16056.00 frames. ], tot_loss[loss=0.1289, simple_loss=0.1341, pruned_loss=0.04865, audio_tagging_loss=0.01319, over 3048092.62 frames. ], batch size: 59, lr: 2.71e-02, grad_scale: 32.0 2023-11-18 07:34:40,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.21 vs. limit=15.0 2023-11-18 07:35:11,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=123360.0, ans=0.0 2023-11-18 07:35:14,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=123360.0, ans=0.125 2023-11-18 07:35:18,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=123360.0, ans=0.0 2023-11-18 07:35:20,681 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.451e+01 1.021e+02 1.177e+02 1.311e+02 2.345e+02, threshold=2.354e+02, percent-clipped=1.0 2023-11-18 07:35:25,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=123426.66666666667, ans=0.0 2023-11-18 07:35:31,884 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6500, loss[loss=0.1043, simple_loss=0.1063, pruned_loss=0.03812, audio_tagging_loss=0.01309, over 14258.00 frames. ], tot_loss[loss=0.129, simple_loss=0.1343, pruned_loss=0.04883, audio_tagging_loss=0.01304, over 3044431.80 frames. ], batch size: 55, lr: 2.70e-02, grad_scale: 32.0 2023-11-18 07:35:35,870 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.66 vs. limit=15.0 2023-11-18 07:35:53,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=123626.66666666667, ans=0.125 2023-11-18 07:36:19,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=123760.0, ans=0.125 2023-11-18 07:36:22,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=123760.0, ans=15.0 2023-11-18 07:36:28,334 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6550, loss[loss=0.1165, simple_loss=0.1168, pruned_loss=0.04401, audio_tagging_loss=0.01409, over 14626.00 frames. ], tot_loss[loss=0.1292, simple_loss=0.1349, pruned_loss=0.04898, audio_tagging_loss=0.0128, over 3040688.40 frames. ], batch size: 58, lr: 2.70e-02, grad_scale: 32.0 2023-11-18 07:36:42,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=123893.33333333333, ans=0.125 2023-11-18 07:37:13,824 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 9.989e+01 1.139e+02 1.347e+02 1.768e+02, threshold=2.277e+02, percent-clipped=0.0 2023-11-18 07:37:18,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=124093.33333333333, ans=0.125 2023-11-18 07:37:24,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=124160.0, ans=0.125 2023-11-18 07:37:25,647 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6600, loss[loss=0.1071, simple_loss=0.1113, pruned_loss=0.03949, audio_tagging_loss=0.01196, over 14521.00 frames. ], tot_loss[loss=0.1289, simple_loss=0.135, pruned_loss=0.04877, audio_tagging_loss=0.01267, over 3039163.88 frames. ], batch size: 57, lr: 2.70e-02, grad_scale: 32.0 2023-11-18 07:37:27,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=124160.0, ans=0.0 2023-11-18 07:37:31,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=124160.0, ans=0.125 2023-11-18 07:37:41,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=124226.66666666667, ans=0.1 2023-11-18 07:37:46,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=124293.33333333333, ans=0.125 2023-11-18 07:38:00,393 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.16 vs. limit=22.5 2023-11-18 07:38:15,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=124426.66666666667, ans=0.0 2023-11-18 07:38:22,528 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6650, loss[loss=0.06877, simple_loss=0.06846, pruned_loss=0.02174, audio_tagging_loss=0.01279, over 14847.00 frames. ], tot_loss[loss=0.1274, simple_loss=0.1334, pruned_loss=0.04805, audio_tagging_loss=0.01267, over 3045115.70 frames. ], batch size: 57, lr: 2.69e-02, grad_scale: 32.0 2023-11-18 07:39:06,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=124760.0, ans=0.125 2023-11-18 07:39:07,718 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.903e+01 1.041e+02 1.128e+02 1.286e+02 1.870e+02, threshold=2.255e+02, percent-clipped=0.0 2023-11-18 07:39:08,569 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.58 vs. limit=15.0 2023-11-18 07:39:14,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=124760.0, ans=0.2 2023-11-18 07:39:18,523 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6700, loss[loss=0.1126, simple_loss=0.1137, pruned_loss=0.04108, audio_tagging_loss=0.01466, over 16124.00 frames. ], tot_loss[loss=0.1274, simple_loss=0.1333, pruned_loss=0.04802, audio_tagging_loss=0.01269, over 3042988.64 frames. ], batch size: 60, lr: 2.69e-02, grad_scale: 32.0 2023-11-18 07:39:18,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=124826.66666666667, ans=0.0 2023-11-18 07:39:33,546 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.38 vs. limit=15.0 2023-11-18 07:39:36,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=124893.33333333333, ans=0.2 2023-11-18 07:39:40,837 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=8.34 vs. limit=15.0 2023-11-18 07:39:44,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=124960.0, ans=0.0 2023-11-18 07:39:50,340 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.66 vs. limit=22.5 2023-11-18 07:39:55,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=125026.66666666667, ans=0.0 2023-11-18 07:40:13,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=125093.33333333333, ans=0.1 2023-11-18 07:40:16,329 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6750, loss[loss=0.1766, simple_loss=0.1928, pruned_loss=0.06876, audio_tagging_loss=0.01148, over 15742.00 frames. ], tot_loss[loss=0.1275, simple_loss=0.1333, pruned_loss=0.04806, audio_tagging_loss=0.01275, over 3033516.63 frames. ], batch size: 58, lr: 2.69e-02, grad_scale: 32.0 2023-11-18 07:40:16,813 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.68 vs. limit=10.0 2023-11-18 07:40:30,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=125226.66666666667, ans=0.0 2023-11-18 07:40:33,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=125226.66666666667, ans=0.0 2023-11-18 07:40:49,624 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.44 vs. limit=15.0 2023-11-18 07:40:53,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=125360.0, ans=0.0 2023-11-18 07:41:01,711 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.301e+01 1.017e+02 1.137e+02 1.334e+02 2.157e+02, threshold=2.275e+02, percent-clipped=0.0 2023-11-18 07:41:07,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=125426.66666666667, ans=0.2 2023-11-18 07:41:13,084 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6800, loss[loss=0.09269, simple_loss=0.09552, pruned_loss=0.0349, audio_tagging_loss=0.01003, over 15859.00 frames. ], tot_loss[loss=0.1282, simple_loss=0.1342, pruned_loss=0.04853, audio_tagging_loss=0.01253, over 3035670.41 frames. ], batch size: 61, lr: 2.68e-02, grad_scale: 32.0 2023-11-18 07:41:29,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=125560.0, ans=0.0 2023-11-18 07:41:32,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=125560.0, ans=0.125 2023-11-18 07:41:33,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=125560.0, ans=15.0 2023-11-18 07:41:52,358 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.01 vs. limit=15.0 2023-11-18 07:42:09,007 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6850, loss[loss=0.1408, simple_loss=0.1537, pruned_loss=0.05359, audio_tagging_loss=0.01033, over 14743.00 frames. ], tot_loss[loss=0.1286, simple_loss=0.1349, pruned_loss=0.04879, audio_tagging_loss=0.01237, over 3032943.34 frames. ], batch size: 52, lr: 2.68e-02, grad_scale: 32.0 2023-11-18 07:42:11,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=125826.66666666667, ans=0.0 2023-11-18 07:42:22,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=125893.33333333333, ans=0.2 2023-11-18 07:42:28,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=125893.33333333333, ans=0.125 2023-11-18 07:42:43,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=126026.66666666667, ans=0.125 2023-11-18 07:42:53,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=126093.33333333333, ans=0.0 2023-11-18 07:42:54,313 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.768e+01 9.921e+01 1.152e+02 1.334e+02 2.003e+02, threshold=2.305e+02, percent-clipped=0.0 2023-11-18 07:43:02,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=126093.33333333333, ans=0.125 2023-11-18 07:43:05,706 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6900, loss[loss=0.127, simple_loss=0.1352, pruned_loss=0.04481, audio_tagging_loss=0.01456, over 15240.00 frames. ], tot_loss[loss=0.1272, simple_loss=0.1333, pruned_loss=0.04808, audio_tagging_loss=0.01243, over 3040561.01 frames. ], batch size: 57, lr: 2.68e-02, grad_scale: 32.0 2023-11-18 07:43:11,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=126160.0, ans=0.2 2023-11-18 07:43:18,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=126226.66666666667, ans=0.125 2023-11-18 07:43:20,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=126226.66666666667, ans=0.0 2023-11-18 07:43:23,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=126226.66666666667, ans=0.125 2023-11-18 07:43:48,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=126360.0, ans=0.1 2023-11-18 07:43:48,935 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 07:44:03,090 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 6950, loss[loss=0.1544, simple_loss=0.1602, pruned_loss=0.06411, audio_tagging_loss=0.01024, over 14051.00 frames. ], tot_loss[loss=0.128, simple_loss=0.1342, pruned_loss=0.04827, audio_tagging_loss=0.01256, over 3037354.71 frames. ], batch size: 53, lr: 2.68e-02, grad_scale: 32.0 2023-11-18 07:44:07,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=126493.33333333333, ans=0.0 2023-11-18 07:44:09,262 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.87 vs. limit=22.5 2023-11-18 07:44:19,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=126560.0, ans=0.0 2023-11-18 07:44:19,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=126560.0, ans=0.1 2023-11-18 07:44:25,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=126626.66666666667, ans=0.125 2023-11-18 07:44:26,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=126626.66666666667, ans=0.0 2023-11-18 07:44:27,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=126626.66666666667, ans=0.125 2023-11-18 07:44:27,762 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:44:39,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=126693.33333333333, ans=0.2 2023-11-18 07:44:48,853 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.915e+01 1.002e+02 1.157e+02 1.287e+02 1.874e+02, threshold=2.315e+02, percent-clipped=0.0 2023-11-18 07:44:53,785 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.44 vs. limit=15.0 2023-11-18 07:44:56,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=126760.0, ans=0.0 2023-11-18 07:44:59,694 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7000, loss[loss=0.09512, simple_loss=0.09283, pruned_loss=0.03687, audio_tagging_loss=0.01184, over 14779.00 frames. ], tot_loss[loss=0.1283, simple_loss=0.1345, pruned_loss=0.04852, audio_tagging_loss=0.01259, over 3047117.17 frames. ], batch size: 57, lr: 2.67e-02, grad_scale: 32.0 2023-11-18 07:45:01,236 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.54 vs. limit=15.0 2023-11-18 07:45:17,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=126893.33333333333, ans=0.0 2023-11-18 07:45:33,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=127026.66666666667, ans=0.0 2023-11-18 07:45:34,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=127026.66666666667, ans=0.125 2023-11-18 07:45:41,052 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.60 vs. limit=15.0 2023-11-18 07:45:46,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=127093.33333333333, ans=0.125 2023-11-18 07:45:56,131 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7050, loss[loss=0.09913, simple_loss=0.09273, pruned_loss=0.03658, audio_tagging_loss=0.01619, over 13968.00 frames. ], tot_loss[loss=0.128, simple_loss=0.134, pruned_loss=0.04832, audio_tagging_loss=0.01271, over 3052156.24 frames. ], batch size: 57, lr: 2.67e-02, grad_scale: 32.0 2023-11-18 07:46:16,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=127226.66666666667, ans=0.07 2023-11-18 07:46:19,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=127293.33333333333, ans=0.125 2023-11-18 07:46:27,218 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.48 vs. limit=10.0 2023-11-18 07:46:40,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=127426.66666666667, ans=0.125 2023-11-18 07:46:41,220 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 1.025e+02 1.162e+02 1.246e+02 1.816e+02, threshold=2.324e+02, percent-clipped=0.0 2023-11-18 07:46:53,114 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7100, loss[loss=0.1023, simple_loss=0.1077, pruned_loss=0.03399, audio_tagging_loss=0.0145, over 14233.00 frames. ], tot_loss[loss=0.1273, simple_loss=0.1332, pruned_loss=0.04777, audio_tagging_loss=0.01287, over 3054777.37 frames. ], batch size: 53, lr: 2.67e-02, grad_scale: 32.0 2023-11-18 07:46:57,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=127493.33333333333, ans=0.0 2023-11-18 07:46:58,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=127493.33333333333, ans=0.125 2023-11-18 07:47:03,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=127560.0, ans=0.0 2023-11-18 07:47:19,791 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=15.0 2023-11-18 07:47:24,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=127626.66666666667, ans=0.0 2023-11-18 07:47:32,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=127693.33333333333, ans=0.0 2023-11-18 07:47:38,642 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.00 vs. limit=10.0 2023-11-18 07:47:39,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=127760.0, ans=0.2 2023-11-18 07:47:42,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=127760.0, ans=0.0 2023-11-18 07:47:49,831 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7150, loss[loss=0.1841, simple_loss=0.1959, pruned_loss=0.0755, audio_tagging_loss=0.0106, over 13942.00 frames. ], tot_loss[loss=0.1279, simple_loss=0.1341, pruned_loss=0.04793, audio_tagging_loss=0.01289, over 3059769.55 frames. ], batch size: 52, lr: 2.66e-02, grad_scale: 64.0 2023-11-18 07:47:53,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=127826.66666666667, ans=0.125 2023-11-18 07:48:06,960 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.17 vs. limit=15.0 2023-11-18 07:48:21,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.98 vs. limit=6.0 2023-11-18 07:48:27,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.81 vs. limit=6.0 2023-11-18 07:48:32,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.78 vs. limit=15.0 2023-11-18 07:48:35,313 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.931e+01 1.022e+02 1.134e+02 1.283e+02 2.595e+02, threshold=2.267e+02, percent-clipped=2.0 2023-11-18 07:48:41,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=128093.33333333333, ans=0.025 2023-11-18 07:48:46,828 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7200, loss[loss=0.1027, simple_loss=0.1054, pruned_loss=0.0348, audio_tagging_loss=0.01523, over 15976.00 frames. ], tot_loss[loss=0.1261, simple_loss=0.1322, pruned_loss=0.0471, audio_tagging_loss=0.01292, over 3060847.96 frames. ], batch size: 61, lr: 2.66e-02, grad_scale: 64.0 2023-11-18 07:48:48,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=128160.0, ans=0.1 2023-11-18 07:49:05,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=128226.66666666667, ans=0.1 2023-11-18 07:49:05,349 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:49:06,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=128226.66666666667, ans=0.09899494936611666 2023-11-18 07:49:07,230 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.83 vs. limit=15.0 2023-11-18 07:49:34,022 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.71 vs. limit=6.0 2023-11-18 07:49:36,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=128426.66666666667, ans=0.125 2023-11-18 07:49:44,008 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7250, loss[loss=0.1351, simple_loss=0.1368, pruned_loss=0.04915, audio_tagging_loss=0.01754, over 14233.00 frames. ], tot_loss[loss=0.1263, simple_loss=0.1326, pruned_loss=0.0471, audio_tagging_loss=0.01292, over 3052437.94 frames. ], batch size: 53, lr: 2.66e-02, grad_scale: 64.0 2023-11-18 07:49:44,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=128493.33333333333, ans=0.125 2023-11-18 07:49:50,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=128493.33333333333, ans=0.125 2023-11-18 07:49:58,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=128560.0, ans=0.125 2023-11-18 07:50:07,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=128626.66666666667, ans=0.1 2023-11-18 07:50:10,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=128626.66666666667, ans=0.125 2023-11-18 07:50:29,533 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.726e+01 1.028e+02 1.141e+02 1.257e+02 1.791e+02, threshold=2.282e+02, percent-clipped=0.0 2023-11-18 07:50:34,503 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2023-11-18 07:50:40,926 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7300, loss[loss=0.1361, simple_loss=0.137, pruned_loss=0.05333, audio_tagging_loss=0.0143, over 13855.00 frames. ], tot_loss[loss=0.1268, simple_loss=0.1337, pruned_loss=0.04729, audio_tagging_loss=0.0127, over 3048828.81 frames. ], batch size: 53, lr: 2.65e-02, grad_scale: 64.0 2023-11-18 07:50:43,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=128826.66666666667, ans=0.0 2023-11-18 07:50:44,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=128826.66666666667, ans=0.125 2023-11-18 07:51:06,517 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.60 vs. limit=22.5 2023-11-18 07:51:21,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=129026.66666666667, ans=0.125 2023-11-18 07:51:37,931 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7350, loss[loss=0.1421, simple_loss=0.1421, pruned_loss=0.05477, audio_tagging_loss=0.01627, over 15446.00 frames. ], tot_loss[loss=0.1264, simple_loss=0.1334, pruned_loss=0.0472, audio_tagging_loss=0.01255, over 3051750.33 frames. ], batch size: 58, lr: 2.65e-02, grad_scale: 64.0 2023-11-18 07:51:46,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=129160.0, ans=0.125 2023-11-18 07:51:53,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=129226.66666666667, ans=0.1 2023-11-18 07:52:23,826 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.362e+01 1.003e+02 1.122e+02 1.264e+02 2.098e+02, threshold=2.243e+02, percent-clipped=0.0 2023-11-18 07:52:25,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=129426.66666666667, ans=0.035 2023-11-18 07:52:34,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=129493.33333333333, ans=0.2 2023-11-18 07:52:35,918 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7400, loss[loss=0.1263, simple_loss=0.1416, pruned_loss=0.0427, audio_tagging_loss=0.01281, over 14485.00 frames. ], tot_loss[loss=0.1264, simple_loss=0.1333, pruned_loss=0.04721, audio_tagging_loss=0.01249, over 3048535.88 frames. ], batch size: 55, lr: 2.65e-02, grad_scale: 64.0 2023-11-18 07:52:36,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=129493.33333333333, ans=0.125 2023-11-18 07:53:08,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=129693.33333333333, ans=0.0 2023-11-18 07:53:32,160 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7450, loss[loss=0.09587, simple_loss=0.0978, pruned_loss=0.038, audio_tagging_loss=0.00897, over 14497.00 frames. ], tot_loss[loss=0.1252, simple_loss=0.1323, pruned_loss=0.04659, audio_tagging_loss=0.01244, over 3046947.55 frames. ], batch size: 54, lr: 2.65e-02, grad_scale: 64.0 2023-11-18 07:53:47,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=129893.33333333333, ans=0.0 2023-11-18 07:53:50,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=129893.33333333333, ans=0.125 2023-11-18 07:54:00,916 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 07:54:17,508 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 1.036e+02 1.151e+02 1.366e+02 1.976e+02, threshold=2.301e+02, percent-clipped=0.0 2023-11-18 07:54:29,347 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7500, loss[loss=0.129, simple_loss=0.1301, pruned_loss=0.05022, audio_tagging_loss=0.01378, over 16016.00 frames. ], tot_loss[loss=0.1251, simple_loss=0.1317, pruned_loss=0.04672, audio_tagging_loss=0.01252, over 3043035.05 frames. ], batch size: 60, lr: 2.64e-02, grad_scale: 64.0 2023-11-18 07:54:38,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.18 vs. limit=10.0 2023-11-18 07:54:47,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=130226.66666666667, ans=0.0 2023-11-18 07:55:23,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=130426.66666666667, ans=0.2 2023-11-18 07:55:25,998 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7550, loss[loss=0.1411, simple_loss=0.1509, pruned_loss=0.05101, audio_tagging_loss=0.01467, over 15942.00 frames. ], tot_loss[loss=0.1268, simple_loss=0.1336, pruned_loss=0.04755, audio_tagging_loss=0.01247, over 3048014.46 frames. ], batch size: 59, lr: 2.64e-02, grad_scale: 64.0 2023-11-18 07:55:37,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=130560.0, ans=0.0 2023-11-18 07:55:42,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=130560.0, ans=0.0 2023-11-18 07:55:51,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=130626.66666666667, ans=0.07 2023-11-18 07:55:55,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=130626.66666666667, ans=0.07 2023-11-18 07:56:00,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.94 vs. limit=22.5 2023-11-18 07:56:02,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=130693.33333333333, ans=0.0 2023-11-18 07:56:12,136 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.352e+01 1.032e+02 1.116e+02 1.247e+02 1.797e+02, threshold=2.232e+02, percent-clipped=0.0 2023-11-18 07:56:22,916 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7600, loss[loss=0.1084, simple_loss=0.1153, pruned_loss=0.03981, audio_tagging_loss=0.011, over 15887.00 frames. ], tot_loss[loss=0.1264, simple_loss=0.1332, pruned_loss=0.04734, audio_tagging_loss=0.01244, over 3051184.92 frames. ], batch size: 61, lr: 2.64e-02, grad_scale: 64.0 2023-11-18 07:56:33,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=130893.33333333333, ans=0.125 2023-11-18 07:56:35,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=130893.33333333333, ans=0.0 2023-11-18 07:56:38,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=130893.33333333333, ans=0.2 2023-11-18 07:56:40,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=130893.33333333333, ans=0.1 2023-11-18 07:56:48,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=15.0 2023-11-18 07:57:10,967 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.76 vs. limit=22.5 2023-11-18 07:57:18,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=131160.0, ans=0.125 2023-11-18 07:57:19,672 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7650, loss[loss=0.1379, simple_loss=0.1506, pruned_loss=0.05145, audio_tagging_loss=0.01113, over 15345.00 frames. ], tot_loss[loss=0.1262, simple_loss=0.1329, pruned_loss=0.04719, audio_tagging_loss=0.01253, over 3057486.35 frames. ], batch size: 56, lr: 2.63e-02, grad_scale: 64.0 2023-11-18 07:57:31,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=131226.66666666666, ans=0.125 2023-11-18 07:57:48,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=131293.33333333334, ans=0.125 2023-11-18 07:58:05,099 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.764e+01 1.041e+02 1.157e+02 1.349e+02 1.751e+02, threshold=2.314e+02, percent-clipped=0.0 2023-11-18 07:58:05,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=131426.66666666666, ans=0.125 2023-11-18 07:58:16,547 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7700, loss[loss=0.1422, simple_loss=0.1562, pruned_loss=0.05417, audio_tagging_loss=0.009954, over 15612.00 frames. ], tot_loss[loss=0.1279, simple_loss=0.1351, pruned_loss=0.04785, audio_tagging_loss=0.01249, over 3052403.77 frames. ], batch size: 57, lr: 2.63e-02, grad_scale: 64.0 2023-11-18 07:58:22,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=131493.33333333334, ans=0.125 2023-11-18 07:58:25,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.43 vs. limit=15.0 2023-11-18 07:58:30,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=131560.0, ans=0.125 2023-11-18 07:58:41,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=131626.66666666666, ans=0.0 2023-11-18 07:58:47,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=131626.66666666666, ans=0.02 2023-11-18 07:58:52,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.19 vs. limit=22.5 2023-11-18 07:59:01,320 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.13 vs. limit=10.0 2023-11-18 07:59:13,110 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7750, loss[loss=0.151, simple_loss=0.1572, pruned_loss=0.05922, audio_tagging_loss=0.01312, over 15161.00 frames. ], tot_loss[loss=0.1286, simple_loss=0.1357, pruned_loss=0.04821, audio_tagging_loss=0.01253, over 3048407.75 frames. ], batch size: 55, lr: 2.63e-02, grad_scale: 64.0 2023-11-18 07:59:25,376 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.74 vs. limit=12.0 2023-11-18 07:59:30,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=131893.33333333334, ans=0.125 2023-11-18 07:59:33,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=131893.33333333334, ans=0.0 2023-11-18 07:59:33,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=131893.33333333334, ans=0.125 2023-11-18 07:59:58,785 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 1.007e+02 1.110e+02 1.214e+02 2.200e+02, threshold=2.220e+02, percent-clipped=0.0 2023-11-18 07:59:59,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=132093.33333333334, ans=0.5 2023-11-18 08:00:04,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=132093.33333333334, ans=0.125 2023-11-18 08:00:09,681 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7800, loss[loss=0.1216, simple_loss=0.1239, pruned_loss=0.04343, audio_tagging_loss=0.01619, over 14508.00 frames. ], tot_loss[loss=0.1294, simple_loss=0.137, pruned_loss=0.04837, audio_tagging_loss=0.01252, over 3050433.28 frames. ], batch size: 54, lr: 2.62e-02, grad_scale: 64.0 2023-11-18 08:00:16,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=132160.0, ans=0.0 2023-11-18 08:00:26,986 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.42 vs. limit=22.5 2023-11-18 08:00:31,960 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.10 vs. limit=22.5 2023-11-18 08:00:33,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=132293.33333333334, ans=0.125 2023-11-18 08:00:54,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=132426.66666666666, ans=0.125 2023-11-18 08:01:01,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.72 vs. limit=15.0 2023-11-18 08:01:07,214 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7850, loss[loss=0.1353, simple_loss=0.1424, pruned_loss=0.05282, audio_tagging_loss=0.0113, over 15020.00 frames. ], tot_loss[loss=0.1288, simple_loss=0.1359, pruned_loss=0.04818, audio_tagging_loss=0.01268, over 3043566.21 frames. ], batch size: 57, lr: 2.62e-02, grad_scale: 64.0 2023-11-18 08:01:12,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=132493.33333333334, ans=0.1 2023-11-18 08:01:23,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=132560.0, ans=0.125 2023-11-18 08:01:24,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=132560.0, ans=0.1 2023-11-18 08:01:27,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=132560.0, ans=0.125 2023-11-18 08:01:35,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.63 vs. limit=22.5 2023-11-18 08:01:53,777 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.431e+01 1.081e+02 1.200e+02 1.326e+02 3.280e+02, threshold=2.400e+02, percent-clipped=1.0 2023-11-18 08:02:03,870 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7900, loss[loss=0.1612, simple_loss=0.1738, pruned_loss=0.06247, audio_tagging_loss=0.01179, over 15453.00 frames. ], tot_loss[loss=0.1285, simple_loss=0.1352, pruned_loss=0.04809, audio_tagging_loss=0.0128, over 3043208.06 frames. ], batch size: 56, lr: 2.62e-02, grad_scale: 32.0 2023-11-18 08:02:05,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=132826.66666666666, ans=0.0 2023-11-18 08:02:09,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=132826.66666666666, ans=0.125 2023-11-18 08:02:27,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=132960.0, ans=0.125 2023-11-18 08:02:30,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=132960.0, ans=0.0 2023-11-18 08:02:59,853 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 7950, loss[loss=0.1661, simple_loss=0.183, pruned_loss=0.06449, audio_tagging_loss=0.01016, over 15652.00 frames. ], tot_loss[loss=0.1286, simple_loss=0.1352, pruned_loss=0.04806, audio_tagging_loss=0.01295, over 3040333.03 frames. ], batch size: 57, lr: 2.62e-02, grad_scale: 32.0 2023-11-18 08:03:04,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.53 vs. limit=22.5 2023-11-18 08:03:06,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=133160.0, ans=0.0 2023-11-18 08:03:12,339 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:03:14,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=133226.66666666666, ans=0.0 2023-11-18 08:03:20,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=133226.66666666666, ans=0.125 2023-11-18 08:03:21,674 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:03:22,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.68 vs. limit=15.0 2023-11-18 08:03:27,183 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.07 vs. limit=15.0 2023-11-18 08:03:29,220 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-20000.pt 2023-11-18 08:03:33,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=133293.33333333334, ans=0.1 2023-11-18 08:03:40,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=133360.0, ans=0.0 2023-11-18 08:03:48,632 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.322e+01 1.027e+02 1.116e+02 1.320e+02 1.890e+02, threshold=2.232e+02, percent-clipped=0.0 2023-11-18 08:03:58,886 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8000, loss[loss=0.1639, simple_loss=0.1775, pruned_loss=0.06267, audio_tagging_loss=0.01251, over 15675.00 frames. ], tot_loss[loss=0.1284, simple_loss=0.1345, pruned_loss=0.04804, audio_tagging_loss=0.01308, over 3037705.21 frames. ], batch size: 57, lr: 2.61e-02, grad_scale: 32.0 2023-11-18 08:03:59,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=133493.33333333334, ans=0.09899494936611666 2023-11-18 08:04:14,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=133560.0, ans=0.1 2023-11-18 08:04:28,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=133626.66666666666, ans=0.125 2023-11-18 08:04:33,292 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.81 vs. limit=6.0 2023-11-18 08:04:37,256 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.40 vs. limit=15.0 2023-11-18 08:04:37,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=133693.33333333334, ans=0.125 2023-11-18 08:04:50,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=133760.0, ans=0.125 2023-11-18 08:04:56,044 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8050, loss[loss=0.1271, simple_loss=0.1255, pruned_loss=0.04647, audio_tagging_loss=0.0179, over 15054.00 frames. ], tot_loss[loss=0.1277, simple_loss=0.1337, pruned_loss=0.04776, audio_tagging_loss=0.01307, over 3049815.50 frames. ], batch size: 56, lr: 2.61e-02, grad_scale: 32.0 2023-11-18 08:05:06,431 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.98 vs. limit=6.0 2023-11-18 08:05:22,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=133960.0, ans=0.5 2023-11-18 08:05:32,266 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.74 vs. limit=6.0 2023-11-18 08:05:41,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=134093.33333333334, ans=0.125 2023-11-18 08:05:42,354 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.010e+01 1.125e+02 1.300e+02 1.606e+02 2.209e+02, threshold=2.601e+02, percent-clipped=0.0 2023-11-18 08:05:52,001 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8100, loss[loss=0.1132, simple_loss=0.1274, pruned_loss=0.03982, audio_tagging_loss=0.009711, over 14736.00 frames. ], tot_loss[loss=0.1275, simple_loss=0.1336, pruned_loss=0.04783, audio_tagging_loss=0.01285, over 3050747.21 frames. ], batch size: 56, lr: 2.61e-02, grad_scale: 32.0 2023-11-18 08:05:57,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=134160.0, ans=0.1 2023-11-18 08:05:58,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=134160.0, ans=0.125 2023-11-18 08:05:59,292 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.96 vs. limit=15.0 2023-11-18 08:06:39,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=134426.66666666666, ans=0.125 2023-11-18 08:06:39,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=134426.66666666666, ans=0.2 2023-11-18 08:06:48,381 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8150, loss[loss=0.1611, simple_loss=0.1576, pruned_loss=0.06631, audio_tagging_loss=0.01596, over 15603.00 frames. ], tot_loss[loss=0.1277, simple_loss=0.1339, pruned_loss=0.04819, audio_tagging_loss=0.01259, over 3045674.19 frames. ], batch size: 57, lr: 2.60e-02, grad_scale: 32.0 2023-11-18 08:06:48,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=134493.33333333334, ans=0.125 2023-11-18 08:06:58,721 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.17 vs. limit=15.0 2023-11-18 08:07:05,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=134560.0, ans=0.0 2023-11-18 08:07:34,444 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.378e+01 1.047e+02 1.173e+02 1.336e+02 3.591e+02, threshold=2.346e+02, percent-clipped=1.0 2023-11-18 08:07:38,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=134760.0, ans=0.0 2023-11-18 08:07:44,604 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:07:45,622 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8200, loss[loss=0.1363, simple_loss=0.1504, pruned_loss=0.05236, audio_tagging_loss=0.008783, over 16606.00 frames. ], tot_loss[loss=0.128, simple_loss=0.1347, pruned_loss=0.04827, audio_tagging_loss=0.01241, over 3047268.39 frames. ], batch size: 60, lr: 2.60e-02, grad_scale: 32.0 2023-11-18 08:07:47,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=134826.66666666666, ans=0.2 2023-11-18 08:07:48,250 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2023-11-18 08:07:52,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=134826.66666666666, ans=0.125 2023-11-18 08:07:58,879 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.18 vs. limit=10.0 2023-11-18 08:08:01,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=134893.33333333334, ans=15.0 2023-11-18 08:08:12,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=134960.0, ans=0.125 2023-11-18 08:08:12,386 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.62 vs. limit=15.0 2023-11-18 08:08:26,030 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=10.88 vs. limit=12.0 2023-11-18 08:08:27,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=135026.66666666666, ans=0.0 2023-11-18 08:08:41,585 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8250, loss[loss=0.09192, simple_loss=0.1037, pruned_loss=0.0279, audio_tagging_loss=0.01217, over 14535.00 frames. ], tot_loss[loss=0.127, simple_loss=0.1337, pruned_loss=0.04777, audio_tagging_loss=0.0124, over 3038142.90 frames. ], batch size: 55, lr: 2.60e-02, grad_scale: 32.0 2023-11-18 08:08:47,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=135160.0, ans=0.1 2023-11-18 08:08:48,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=135160.0, ans=0.2 2023-11-18 08:08:53,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=135226.66666666666, ans=0.125 2023-11-18 08:09:06,495 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.86 vs. limit=15.0 2023-11-18 08:09:12,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.64 vs. limit=15.0 2023-11-18 08:09:14,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=135293.33333333334, ans=0.07 2023-11-18 08:09:16,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=135360.0, ans=0.1 2023-11-18 08:09:19,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=135360.0, ans=0.125 2023-11-18 08:09:27,855 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.216e+01 1.092e+02 1.238e+02 1.418e+02 2.138e+02, threshold=2.477e+02, percent-clipped=0.0 2023-11-18 08:09:38,148 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8300, loss[loss=0.1067, simple_loss=0.1146, pruned_loss=0.03416, audio_tagging_loss=0.01524, over 15580.00 frames. ], tot_loss[loss=0.1275, simple_loss=0.1343, pruned_loss=0.04785, audio_tagging_loss=0.0125, over 3045661.30 frames. ], batch size: 58, lr: 2.60e-02, grad_scale: 32.0 2023-11-18 08:09:42,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=135493.33333333334, ans=0.125 2023-11-18 08:09:48,228 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.03 vs. limit=12.0 2023-11-18 08:09:49,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2023-11-18 08:09:53,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=135560.0, ans=0.0 2023-11-18 08:09:59,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=135560.0, ans=0.1 2023-11-18 08:10:11,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=135693.33333333334, ans=0.0 2023-11-18 08:10:19,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=135693.33333333334, ans=0.125 2023-11-18 08:10:26,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=135760.0, ans=0.125 2023-11-18 08:10:27,022 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.00 vs. limit=6.0 2023-11-18 08:10:35,050 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8350, loss[loss=0.1117, simple_loss=0.1112, pruned_loss=0.03774, audio_tagging_loss=0.01834, over 15546.00 frames. ], tot_loss[loss=0.1272, simple_loss=0.1341, pruned_loss=0.04766, audio_tagging_loss=0.01246, over 3048211.10 frames. ], batch size: 62, lr: 2.59e-02, grad_scale: 32.0 2023-11-18 08:10:37,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2023-11-18 08:10:44,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=135826.66666666666, ans=0.0 2023-11-18 08:11:04,612 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:11:05,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=135960.0, ans=0.0 2023-11-18 08:11:14,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=136026.66666666666, ans=0.0 2023-11-18 08:11:18,144 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.56 vs. limit=10.0 2023-11-18 08:11:21,584 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 1.030e+02 1.156e+02 1.318e+02 1.873e+02, threshold=2.311e+02, percent-clipped=0.0 2023-11-18 08:11:30,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=136160.0, ans=0.0 2023-11-18 08:11:31,729 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8400, loss[loss=0.1302, simple_loss=0.1396, pruned_loss=0.04661, audio_tagging_loss=0.01378, over 16001.00 frames. ], tot_loss[loss=0.1275, simple_loss=0.1346, pruned_loss=0.04764, audio_tagging_loss=0.01253, over 3046087.25 frames. ], batch size: 58, lr: 2.59e-02, grad_scale: 32.0 2023-11-18 08:11:36,734 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.51 vs. limit=22.5 2023-11-18 08:12:08,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=136360.0, ans=0.125 2023-11-18 08:12:23,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=136426.66666666666, ans=0.1 2023-11-18 08:12:28,253 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8450, loss[loss=0.1062, simple_loss=0.1073, pruned_loss=0.04015, audio_tagging_loss=0.01244, over 15128.00 frames. ], tot_loss[loss=0.1273, simple_loss=0.1341, pruned_loss=0.0476, audio_tagging_loss=0.01262, over 3049437.32 frames. ], batch size: 57, lr: 2.59e-02, grad_scale: 32.0 2023-11-18 08:12:29,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=136493.33333333334, ans=0.1 2023-11-18 08:12:40,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=136560.0, ans=0.1 2023-11-18 08:12:44,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=136560.0, ans=0.125 2023-11-18 08:13:14,556 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.166e+01 1.028e+02 1.129e+02 1.256e+02 1.884e+02, threshold=2.258e+02, percent-clipped=0.0 2023-11-18 08:13:22,148 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2023-11-18 08:13:25,408 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8500, loss[loss=0.1328, simple_loss=0.1448, pruned_loss=0.05058, audio_tagging_loss=0.009847, over 15675.00 frames. ], tot_loss[loss=0.1274, simple_loss=0.1344, pruned_loss=0.04762, audio_tagging_loss=0.01257, over 3050340.57 frames. ], batch size: 57, lr: 2.59e-02, grad_scale: 32.0 2023-11-18 08:13:34,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=136826.66666666666, ans=0.0 2023-11-18 08:13:35,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=136893.33333333334, ans=0.0 2023-11-18 08:13:50,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=136960.0, ans=0.1 2023-11-18 08:14:11,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=137093.33333333334, ans=0.1 2023-11-18 08:14:21,094 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8550, loss[loss=0.1078, simple_loss=0.09892, pruned_loss=0.03996, audio_tagging_loss=0.01842, over 15472.00 frames. ], tot_loss[loss=0.1277, simple_loss=0.1347, pruned_loss=0.04775, audio_tagging_loss=0.01258, over 3053924.83 frames. ], batch size: 61, lr: 2.58e-02, grad_scale: 32.0 2023-11-18 08:14:32,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=137226.66666666666, ans=0.07 2023-11-18 08:14:35,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=137226.66666666666, ans=0.125 2023-11-18 08:14:45,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=137293.33333333334, ans=0.125 2023-11-18 08:14:48,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=137293.33333333334, ans=0.0 2023-11-18 08:14:50,867 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2023-11-18 08:14:57,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=137360.0, ans=10.0 2023-11-18 08:15:04,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=137360.0, ans=0.2 2023-11-18 08:15:07,824 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.950e+01 1.016e+02 1.105e+02 1.283e+02 1.880e+02, threshold=2.211e+02, percent-clipped=0.0 2023-11-18 08:15:18,205 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8600, loss[loss=0.1416, simple_loss=0.1509, pruned_loss=0.0534, audio_tagging_loss=0.0127, over 14337.00 frames. ], tot_loss[loss=0.1268, simple_loss=0.1335, pruned_loss=0.04734, audio_tagging_loss=0.01275, over 3055728.51 frames. ], batch size: 56, lr: 2.58e-02, grad_scale: 32.0 2023-11-18 08:15:42,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.33 vs. limit=15.0 2023-11-18 08:15:47,960 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.83 vs. limit=15.0 2023-11-18 08:16:14,991 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8650, loss[loss=0.1245, simple_loss=0.1306, pruned_loss=0.0468, audio_tagging_loss=0.01242, over 16544.00 frames. ], tot_loss[loss=0.1266, simple_loss=0.1332, pruned_loss=0.04713, audio_tagging_loss=0.01287, over 3050575.63 frames. ], batch size: 61, lr: 2.58e-02, grad_scale: 32.0 2023-11-18 08:16:18,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=137826.66666666666, ans=0.0 2023-11-18 08:16:19,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=137826.66666666666, ans=0.1 2023-11-18 08:16:20,576 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:16:54,967 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.85 vs. limit=15.0 2023-11-18 08:16:57,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=138026.66666666666, ans=0.125 2023-11-18 08:17:01,319 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 1.026e+02 1.125e+02 1.305e+02 1.898e+02, threshold=2.250e+02, percent-clipped=0.0 2023-11-18 08:17:05,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=138093.33333333334, ans=0.0 2023-11-18 08:17:11,115 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8700, loss[loss=0.1235, simple_loss=0.1303, pruned_loss=0.04549, audio_tagging_loss=0.01285, over 16475.00 frames. ], tot_loss[loss=0.1291, simple_loss=0.1361, pruned_loss=0.04825, audio_tagging_loss=0.01282, over 3053457.85 frames. ], batch size: 63, lr: 2.57e-02, grad_scale: 32.0 2023-11-18 08:17:22,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=138226.66666666666, ans=0.125 2023-11-18 08:17:24,062 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.23 vs. limit=15.0 2023-11-18 08:17:32,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.44 vs. limit=15.0 2023-11-18 08:17:34,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=138293.33333333334, ans=0.0 2023-11-18 08:17:36,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=138293.33333333334, ans=0.125 2023-11-18 08:17:56,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=138426.66666666666, ans=0.0 2023-11-18 08:18:07,690 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8750, loss[loss=0.1434, simple_loss=0.1569, pruned_loss=0.05296, audio_tagging_loss=0.01195, over 14983.00 frames. ], tot_loss[loss=0.13, simple_loss=0.1375, pruned_loss=0.04845, audio_tagging_loss=0.01282, over 3056315.68 frames. ], batch size: 57, lr: 2.57e-02, grad_scale: 32.0 2023-11-18 08:18:16,082 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.800e+00 2023-11-18 08:18:32,836 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.53 vs. limit=15.0 2023-11-18 08:18:39,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=138626.66666666666, ans=10.0 2023-11-18 08:18:44,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=138693.33333333334, ans=0.0 2023-11-18 08:18:49,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=138693.33333333334, ans=0.0 2023-11-18 08:18:49,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=138693.33333333334, ans=0.07 2023-11-18 08:18:55,031 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.161e+01 1.039e+02 1.204e+02 1.359e+02 1.963e+02, threshold=2.408e+02, percent-clipped=0.0 2023-11-18 08:18:57,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=138760.0, ans=0.125 2023-11-18 08:19:05,273 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8800, loss[loss=0.1503, simple_loss=0.1629, pruned_loss=0.05869, audio_tagging_loss=0.01012, over 15700.00 frames. ], tot_loss[loss=0.13, simple_loss=0.1376, pruned_loss=0.04835, audio_tagging_loss=0.01286, over 3053437.01 frames. ], batch size: 57, lr: 2.57e-02, grad_scale: 32.0 2023-11-18 08:19:08,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=138826.66666666666, ans=0.0 2023-11-18 08:19:11,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=138826.66666666666, ans=0.1 2023-11-18 08:19:29,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=138960.0, ans=0.0 2023-11-18 08:19:34,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=138960.0, ans=0.0 2023-11-18 08:19:35,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=138960.0, ans=0.0 2023-11-18 08:19:39,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=139026.66666666666, ans=0.0 2023-11-18 08:19:39,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=139026.66666666666, ans=0.2 2023-11-18 08:19:54,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=139093.33333333334, ans=0.1 2023-11-18 08:19:55,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=139093.33333333334, ans=0.05 2023-11-18 08:20:01,574 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8850, loss[loss=0.1324, simple_loss=0.1438, pruned_loss=0.04956, audio_tagging_loss=0.01091, over 15650.00 frames. ], tot_loss[loss=0.1297, simple_loss=0.137, pruned_loss=0.04828, audio_tagging_loss=0.01291, over 3051327.66 frames. ], batch size: 57, lr: 2.57e-02, grad_scale: 32.0 2023-11-18 08:20:01,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=139160.0, ans=0.125 2023-11-18 08:20:09,068 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:20:20,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=139226.66666666666, ans=0.1 2023-11-18 08:20:34,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=139293.33333333334, ans=0.125 2023-11-18 08:20:36,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=139360.0, ans=0.05 2023-11-18 08:20:47,633 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.898e+01 1.044e+02 1.177e+02 1.340e+02 1.901e+02, threshold=2.354e+02, percent-clipped=0.0 2023-11-18 08:20:57,243 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8900, loss[loss=0.116, simple_loss=0.1231, pruned_loss=0.04516, audio_tagging_loss=0.00935, over 15134.00 frames. ], tot_loss[loss=0.1291, simple_loss=0.1372, pruned_loss=0.04794, audio_tagging_loss=0.01255, over 3056088.16 frames. ], batch size: 56, lr: 2.56e-02, grad_scale: 32.0 2023-11-18 08:21:04,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=139493.33333333334, ans=0.125 2023-11-18 08:21:24,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=139626.66666666666, ans=0.125 2023-11-18 08:21:27,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=139626.66666666666, ans=0.0 2023-11-18 08:21:28,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=139626.66666666666, ans=0.05 2023-11-18 08:21:34,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=139693.33333333334, ans=0.5 2023-11-18 08:21:40,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=139693.33333333334, ans=0.0 2023-11-18 08:21:49,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.02 vs. limit=10.0 2023-11-18 08:21:54,096 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 8950, loss[loss=0.08622, simple_loss=0.09388, pruned_loss=0.02603, audio_tagging_loss=0.01325, over 13771.00 frames. ], tot_loss[loss=0.1281, simple_loss=0.1361, pruned_loss=0.04769, audio_tagging_loss=0.01235, over 3052776.94 frames. ], batch size: 54, lr: 2.56e-02, grad_scale: 32.0 2023-11-18 08:22:16,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=139960.0, ans=0.1 2023-11-18 08:22:16,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=139960.0, ans=0.5 2023-11-18 08:22:30,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.63 vs. limit=22.5 2023-11-18 08:22:32,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=140026.66666666666, ans=0.125 2023-11-18 08:22:41,390 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.130e+01 1.003e+02 1.129e+02 1.259e+02 1.857e+02, threshold=2.258e+02, percent-clipped=0.0 2023-11-18 08:22:44,829 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:22:51,052 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9000, loss[loss=0.1371, simple_loss=0.146, pruned_loss=0.05206, audio_tagging_loss=0.01199, over 15273.00 frames. ], tot_loss[loss=0.128, simple_loss=0.1359, pruned_loss=0.04789, audio_tagging_loss=0.01222, over 3052922.24 frames. ], batch size: 58, lr: 2.56e-02, grad_scale: 32.0 2023-11-18 08:22:51,054 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 08:23:09,294 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.7965, 5.6512, 5.1951, 5.4096], device='cuda:0') 2023-11-18 08:23:11,465 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([0.4317, 2.6472, 2.8053, 2.3571, 2.7293, 2.7513, 2.5935, 2.4791], device='cuda:0') 2023-11-18 08:23:26,300 INFO [train_asr.py:1147] (0/4) Epoch 2, validation: loss=0.08723, simple_loss=0.06802, pruned_loss=0.01417, audio_tagging_loss=0.03906, over 4681554.00 frames. 2023-11-18 08:23:26,301 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 08:23:27,893 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.65 vs. limit=15.0 2023-11-18 08:23:28,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=140160.0, ans=0.125 2023-11-18 08:23:29,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=140160.0, ans=0.125 2023-11-18 08:23:43,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=140226.66666666666, ans=0.0 2023-11-18 08:23:54,089 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.38 vs. limit=15.0 2023-11-18 08:23:57,068 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.99 vs. limit=15.0 2023-11-18 08:24:05,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=140360.0, ans=10.0 2023-11-18 08:24:07,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=140360.0, ans=0.125 2023-11-18 08:24:16,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=140426.66666666666, ans=0.0 2023-11-18 08:24:22,615 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9050, loss[loss=0.08732, simple_loss=0.08676, pruned_loss=0.032, audio_tagging_loss=0.01194, over 14655.00 frames. ], tot_loss[loss=0.1267, simple_loss=0.134, pruned_loss=0.0474, audio_tagging_loss=0.01233, over 3042653.35 frames. ], batch size: 56, lr: 2.56e-02, grad_scale: 32.0 2023-11-18 08:24:24,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=140493.33333333334, ans=0.1 2023-11-18 08:24:31,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.65 vs. limit=15.0 2023-11-18 08:24:49,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=140626.66666666666, ans=0.125 2023-11-18 08:25:06,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=140760.0, ans=0.125 2023-11-18 08:25:08,332 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.490e+01 1.025e+02 1.134e+02 1.283e+02 1.776e+02, threshold=2.268e+02, percent-clipped=0.0 2023-11-18 08:25:15,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=140760.0, ans=0.125 2023-11-18 08:25:17,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=140826.66666666666, ans=0.125 2023-11-18 08:25:18,127 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9100, loss[loss=0.0748, simple_loss=0.0707, pruned_loss=0.02282, audio_tagging_loss=0.01662, over 15228.00 frames. ], tot_loss[loss=0.1272, simple_loss=0.1344, pruned_loss=0.0478, audio_tagging_loss=0.0122, over 3046685.17 frames. ], batch size: 59, lr: 2.55e-02, grad_scale: 32.0 2023-11-18 08:25:22,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=140826.66666666666, ans=0.125 2023-11-18 08:25:27,596 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.27 vs. limit=15.0 2023-11-18 08:25:43,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=140960.0, ans=0.2 2023-11-18 08:25:46,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_abs, batch_count=140960.0, ans=0.5 2023-11-18 08:25:54,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=141026.66666666666, ans=0.125 2023-11-18 08:26:00,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=141026.66666666666, ans=0.1 2023-11-18 08:26:03,033 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-11-18 08:26:05,048 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.60 vs. limit=15.0 2023-11-18 08:26:05,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=141093.33333333334, ans=0.0 2023-11-18 08:26:09,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=141093.33333333334, ans=0.1 2023-11-18 08:26:15,066 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9150, loss[loss=0.1195, simple_loss=0.1221, pruned_loss=0.04536, audio_tagging_loss=0.01304, over 16021.00 frames. ], tot_loss[loss=0.1275, simple_loss=0.1349, pruned_loss=0.04775, audio_tagging_loss=0.01229, over 3050479.66 frames. ], batch size: 61, lr: 2.55e-02, grad_scale: 32.0 2023-11-18 08:26:31,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=141226.66666666666, ans=0.125 2023-11-18 08:26:40,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.72 vs. limit=15.0 2023-11-18 08:26:58,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten.whitening_limit, batch_count=141360.0, ans=15.0 2023-11-18 08:27:01,448 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.257e+01 1.062e+02 1.145e+02 1.276e+02 2.030e+02, threshold=2.290e+02, percent-clipped=0.0 2023-11-18 08:27:05,420 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.22 vs. limit=15.0 2023-11-18 08:27:08,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=141426.66666666666, ans=0.125 2023-11-18 08:27:12,359 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9200, loss[loss=0.09478, simple_loss=0.09917, pruned_loss=0.03173, audio_tagging_loss=0.01347, over 16064.00 frames. ], tot_loss[loss=0.1283, simple_loss=0.1357, pruned_loss=0.0482, audio_tagging_loss=0.01229, over 3055853.16 frames. ], batch size: 60, lr: 2.55e-02, grad_scale: 32.0 2023-11-18 08:27:16,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.76 vs. limit=10.0 2023-11-18 08:27:22,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=141560.0, ans=0.0 2023-11-18 08:27:37,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=141626.66666666666, ans=0.0 2023-11-18 08:27:40,246 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.25 vs. limit=22.5 2023-11-18 08:27:57,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=141760.0, ans=0.125 2023-11-18 08:27:58,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=141760.0, ans=0.125 2023-11-18 08:28:01,493 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:28:08,764 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9250, loss[loss=0.1133, simple_loss=0.1171, pruned_loss=0.04111, audio_tagging_loss=0.01367, over 15901.00 frames. ], tot_loss[loss=0.1278, simple_loss=0.1351, pruned_loss=0.04802, audio_tagging_loss=0.01222, over 3057087.67 frames. ], batch size: 61, lr: 2.54e-02, grad_scale: 32.0 2023-11-18 08:28:16,806 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.74 vs. limit=15.0 2023-11-18 08:28:20,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.73 vs. limit=15.0 2023-11-18 08:28:42,239 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.20 vs. limit=12.0 2023-11-18 08:28:46,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=142026.66666666666, ans=0.125 2023-11-18 08:28:50,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.88 vs. limit=12.0 2023-11-18 08:28:55,016 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.910e+01 1.033e+02 1.140e+02 1.302e+02 2.365e+02, threshold=2.281e+02, percent-clipped=1.0 2023-11-18 08:29:01,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=142093.33333333334, ans=0.1 2023-11-18 08:29:02,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=142093.33333333334, ans=0.5 2023-11-18 08:29:04,823 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9300, loss[loss=0.1555, simple_loss=0.1679, pruned_loss=0.05942, audio_tagging_loss=0.01208, over 15801.00 frames. ], tot_loss[loss=0.1276, simple_loss=0.1348, pruned_loss=0.04784, audio_tagging_loss=0.01234, over 3055542.80 frames. ], batch size: 57, lr: 2.54e-02, grad_scale: 32.0 2023-11-18 08:29:47,431 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.35 vs. limit=22.5 2023-11-18 08:30:00,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=142493.33333333334, ans=0.125 2023-11-18 08:30:01,732 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9350, loss[loss=0.107, simple_loss=0.1173, pruned_loss=0.03604, audio_tagging_loss=0.01233, over 15206.00 frames. ], tot_loss[loss=0.1276, simple_loss=0.1345, pruned_loss=0.04774, audio_tagging_loss=0.01256, over 3050960.31 frames. ], batch size: 57, lr: 2.54e-02, grad_scale: 32.0 2023-11-18 08:30:09,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=142493.33333333334, ans=0.125 2023-11-18 08:30:09,264 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.05 vs. limit=6.0 2023-11-18 08:30:19,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=142560.0, ans=0.2 2023-11-18 08:30:24,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=142626.66666666666, ans=0.125 2023-11-18 08:30:40,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=142693.33333333334, ans=0.07 2023-11-18 08:30:45,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=142693.33333333334, ans=0.1 2023-11-18 08:30:48,961 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.103e+01 1.054e+02 1.142e+02 1.283e+02 1.990e+02, threshold=2.284e+02, percent-clipped=0.0 2023-11-18 08:30:53,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=142760.0, ans=0.125 2023-11-18 08:30:59,176 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9400, loss[loss=0.1475, simple_loss=0.1432, pruned_loss=0.06112, audio_tagging_loss=0.01479, over 15591.00 frames. ], tot_loss[loss=0.1285, simple_loss=0.136, pruned_loss=0.04809, audio_tagging_loss=0.01241, over 3052467.31 frames. ], batch size: 58, lr: 2.54e-02, grad_scale: 32.0 2023-11-18 08:31:13,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=142893.33333333334, ans=0.125 2023-11-18 08:31:23,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=142960.0, ans=0.2 2023-11-18 08:31:30,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=142960.0, ans=0.125 2023-11-18 08:31:31,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=142960.0, ans=0.1 2023-11-18 08:31:41,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=143026.66666666666, ans=0.1 2023-11-18 08:31:43,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=143093.33333333334, ans=0.125 2023-11-18 08:31:45,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=143093.33333333334, ans=0.2 2023-11-18 08:31:50,651 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:31:53,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=143160.0, ans=0.0 2023-11-18 08:31:54,924 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9450, loss[loss=0.1204, simple_loss=0.131, pruned_loss=0.04144, audio_tagging_loss=0.01345, over 14747.00 frames. ], tot_loss[loss=0.1271, simple_loss=0.1341, pruned_loss=0.04747, audio_tagging_loss=0.01262, over 3052040.06 frames. ], batch size: 56, lr: 2.53e-02, grad_scale: 32.0 2023-11-18 08:31:55,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=143160.0, ans=0.0 2023-11-18 08:32:03,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=143160.0, ans=0.125 2023-11-18 08:32:08,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=143226.66666666666, ans=0.1 2023-11-18 08:32:18,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=143293.33333333334, ans=0.125 2023-11-18 08:32:28,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=143360.0, ans=0.1 2023-11-18 08:32:34,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=143360.0, ans=0.125 2023-11-18 08:32:41,715 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.457e+01 1.024e+02 1.132e+02 1.318e+02 2.507e+02, threshold=2.264e+02, percent-clipped=1.0 2023-11-18 08:32:48,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=143426.66666666666, ans=0.1 2023-11-18 08:32:51,334 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9500, loss[loss=0.1297, simple_loss=0.1345, pruned_loss=0.04661, audio_tagging_loss=0.01589, over 15994.00 frames. ], tot_loss[loss=0.1269, simple_loss=0.1336, pruned_loss=0.04732, audio_tagging_loss=0.01278, over 3052013.02 frames. ], batch size: 58, lr: 2.53e-02, grad_scale: 32.0 2023-11-18 08:33:05,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=143560.0, ans=0.1 2023-11-18 08:33:23,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=143626.66666666666, ans=0.1 2023-11-18 08:33:24,702 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=143693.33333333334, ans=0.0 2023-11-18 08:33:27,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=143693.33333333334, ans=0.5 2023-11-18 08:33:30,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=143693.33333333334, ans=0.0 2023-11-18 08:33:31,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=143693.33333333334, ans=0.125 2023-11-18 08:33:48,291 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9550, loss[loss=0.1368, simple_loss=0.1425, pruned_loss=0.05527, audio_tagging_loss=0.01031, over 14990.00 frames. ], tot_loss[loss=0.1279, simple_loss=0.1345, pruned_loss=0.04773, audio_tagging_loss=0.01287, over 3045459.72 frames. ], batch size: 55, lr: 2.53e-02, grad_scale: 32.0 2023-11-18 08:33:48,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.34 vs. limit=22.5 2023-11-18 08:34:00,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.88 vs. limit=22.5 2023-11-18 08:34:27,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=144026.66666666666, ans=0.1 2023-11-18 08:34:33,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=144093.33333333334, ans=0.0 2023-11-18 08:34:34,609 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.739e+01 9.724e+01 1.130e+02 1.324e+02 2.108e+02, threshold=2.261e+02, percent-clipped=0.0 2023-11-18 08:34:44,682 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9600, loss[loss=0.1799, simple_loss=0.1923, pruned_loss=0.07212, audio_tagging_loss=0.01166, over 15368.00 frames. ], tot_loss[loss=0.1273, simple_loss=0.134, pruned_loss=0.04743, audio_tagging_loss=0.0128, over 3047172.87 frames. ], batch size: 56, lr: 2.53e-02, grad_scale: 32.0 2023-11-18 08:34:48,243 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.831e+00 2023-11-18 08:35:13,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=144293.33333333334, ans=0.0 2023-11-18 08:35:22,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=144360.0, ans=0.1 2023-11-18 08:35:41,220 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9650, loss[loss=0.1131, simple_loss=0.1167, pruned_loss=0.0428, audio_tagging_loss=0.01199, over 14350.00 frames. ], tot_loss[loss=0.1265, simple_loss=0.1335, pruned_loss=0.04712, audio_tagging_loss=0.01262, over 3041192.48 frames. ], batch size: 54, lr: 2.52e-02, grad_scale: 32.0 2023-11-18 08:35:41,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=144493.33333333334, ans=0.0 2023-11-18 08:35:59,864 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.19 vs. limit=15.0 2023-11-18 08:36:27,675 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.399e+01 1.034e+02 1.159e+02 1.347e+02 1.813e+02, threshold=2.318e+02, percent-clipped=0.0 2023-11-18 08:36:38,027 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9700, loss[loss=0.1043, simple_loss=0.109, pruned_loss=0.03491, audio_tagging_loss=0.01489, over 14546.00 frames. ], tot_loss[loss=0.1263, simple_loss=0.1336, pruned_loss=0.04713, audio_tagging_loss=0.01237, over 3036367.87 frames. ], batch size: 55, lr: 2.52e-02, grad_scale: 32.0 2023-11-18 08:36:41,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=144826.66666666666, ans=0.125 2023-11-18 08:36:43,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=144826.66666666666, ans=0.0 2023-11-18 08:36:51,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=144893.33333333334, ans=0.0 2023-11-18 08:37:06,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=144960.0, ans=0.125 2023-11-18 08:37:18,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=145026.66666666666, ans=0.1 2023-11-18 08:37:33,995 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9750, loss[loss=0.178, simple_loss=0.187, pruned_loss=0.07442, audio_tagging_loss=0.01005, over 16353.00 frames. ], tot_loss[loss=0.1253, simple_loss=0.1328, pruned_loss=0.04661, audio_tagging_loss=0.01228, over 3041296.86 frames. ], batch size: 61, lr: 2.52e-02, grad_scale: 32.0 2023-11-18 08:37:36,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=145160.0, ans=0.125 2023-11-18 08:37:44,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=145226.66666666666, ans=0.125 2023-11-18 08:38:04,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=145293.33333333334, ans=0.125 2023-11-18 08:38:04,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=145293.33333333334, ans=0.0 2023-11-18 08:38:07,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=145360.0, ans=0.125 2023-11-18 08:38:14,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=145360.0, ans=0.0 2023-11-18 08:38:21,154 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.307e+01 1.005e+02 1.144e+02 1.318e+02 1.775e+02, threshold=2.288e+02, percent-clipped=0.0 2023-11-18 08:38:31,542 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9800, loss[loss=0.1088, simple_loss=0.1165, pruned_loss=0.0388, audio_tagging_loss=0.01176, over 16227.00 frames. ], tot_loss[loss=0.126, simple_loss=0.1336, pruned_loss=0.04703, audio_tagging_loss=0.0122, over 3046601.26 frames. ], batch size: 63, lr: 2.52e-02, grad_scale: 32.0 2023-11-18 08:38:36,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=145493.33333333334, ans=0.125 2023-11-18 08:38:38,071 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.34 vs. limit=22.5 2023-11-18 08:38:38,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=145493.33333333334, ans=0.0 2023-11-18 08:38:55,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=11.91 vs. limit=15.0 2023-11-18 08:39:14,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=145693.33333333334, ans=15.0 2023-11-18 08:39:19,449 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:39:28,532 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9850, loss[loss=0.1369, simple_loss=0.1501, pruned_loss=0.05196, audio_tagging_loss=0.00995, over 13690.00 frames. ], tot_loss[loss=0.126, simple_loss=0.1334, pruned_loss=0.04711, audio_tagging_loss=0.01221, over 3049816.67 frames. ], batch size: 54, lr: 2.51e-02, grad_scale: 32.0 2023-11-18 08:39:42,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=145893.33333333334, ans=0.1 2023-11-18 08:39:54,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=145960.0, ans=0.125 2023-11-18 08:40:14,769 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 1.021e+02 1.122e+02 1.308e+02 2.084e+02, threshold=2.244e+02, percent-clipped=0.0 2023-11-18 08:40:20,648 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.37 vs. limit=15.0 2023-11-18 08:40:24,466 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9900, loss[loss=0.1036, simple_loss=0.1188, pruned_loss=0.03524, audio_tagging_loss=0.008948, over 15223.00 frames. ], tot_loss[loss=0.1263, simple_loss=0.1344, pruned_loss=0.04705, audio_tagging_loss=0.0121, over 3049492.54 frames. ], batch size: 55, lr: 2.51e-02, grad_scale: 64.0 2023-11-18 08:40:46,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=146293.33333333334, ans=0.1 2023-11-18 08:41:01,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=146360.0, ans=0.0 2023-11-18 08:41:20,914 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 9950, loss[loss=0.1006, simple_loss=0.09595, pruned_loss=0.03694, audio_tagging_loss=0.01565, over 16129.00 frames. ], tot_loss[loss=0.1264, simple_loss=0.1345, pruned_loss=0.04688, audio_tagging_loss=0.0122, over 3049927.59 frames. ], batch size: 62, lr: 2.51e-02, grad_scale: 64.0 2023-11-18 08:41:21,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.86 vs. limit=12.0 2023-11-18 08:41:37,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=146560.0, ans=0.125 2023-11-18 08:41:50,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=146626.66666666666, ans=0.125 2023-11-18 08:42:07,456 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.831e+01 1.033e+02 1.176e+02 1.296e+02 1.958e+02, threshold=2.352e+02, percent-clipped=0.0 2023-11-18 08:42:18,273 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10000, loss[loss=0.1308, simple_loss=0.1429, pruned_loss=0.04634, audio_tagging_loss=0.01301, over 16364.00 frames. ], tot_loss[loss=0.1251, simple_loss=0.1331, pruned_loss=0.0462, audio_tagging_loss=0.01235, over 3055516.40 frames. ], batch size: 60, lr: 2.51e-02, grad_scale: 64.0 2023-11-18 08:42:22,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=146826.66666666666, ans=0.1 2023-11-18 08:42:25,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=146826.66666666666, ans=0.0 2023-11-18 08:42:53,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=147026.66666666666, ans=0.125 2023-11-18 08:43:03,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=147093.33333333334, ans=0.125 2023-11-18 08:43:07,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=147093.33333333334, ans=0.125 2023-11-18 08:43:14,432 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10050, loss[loss=0.125, simple_loss=0.1321, pruned_loss=0.05049, audio_tagging_loss=0.008442, over 14632.00 frames. ], tot_loss[loss=0.125, simple_loss=0.1327, pruned_loss=0.04629, audio_tagging_loss=0.01233, over 3053120.92 frames. ], batch size: 55, lr: 2.50e-02, grad_scale: 64.0 2023-11-18 08:43:20,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=12.0 2023-11-18 08:43:22,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=147160.0, ans=0.125 2023-11-18 08:43:32,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=147226.66666666666, ans=0.1 2023-11-18 08:43:40,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=147293.33333333334, ans=0.0 2023-11-18 08:43:50,672 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.65 vs. limit=22.5 2023-11-18 08:43:56,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.71 vs. limit=15.0 2023-11-18 08:44:01,689 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.399e+01 9.828e+01 1.108e+02 1.232e+02 2.122e+02, threshold=2.217e+02, percent-clipped=0.0 2023-11-18 08:44:10,754 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10100, loss[loss=0.118, simple_loss=0.1273, pruned_loss=0.04407, audio_tagging_loss=0.01032, over 15166.00 frames. ], tot_loss[loss=0.1249, simple_loss=0.1327, pruned_loss=0.04615, audio_tagging_loss=0.01238, over 3049523.34 frames. ], batch size: 57, lr: 2.50e-02, grad_scale: 32.0 2023-11-18 08:44:52,712 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:44:57,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=147760.0, ans=0.1 2023-11-18 08:44:58,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=147760.0, ans=0.125 2023-11-18 08:45:00,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=147760.0, ans=0.125 2023-11-18 08:45:05,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=147760.0, ans=0.2 2023-11-18 08:45:08,349 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10150, loss[loss=0.1097, simple_loss=0.1217, pruned_loss=0.03729, audio_tagging_loss=0.01161, over 14198.00 frames. ], tot_loss[loss=0.1256, simple_loss=0.1334, pruned_loss=0.04655, audio_tagging_loss=0.01237, over 3046525.03 frames. ], batch size: 52, lr: 2.50e-02, grad_scale: 32.0 2023-11-18 08:45:20,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=147893.33333333334, ans=0.125 2023-11-18 08:45:20,870 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2023-11-18 08:45:26,181 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.55 vs. limit=22.5 2023-11-18 08:45:30,103 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:45:32,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=147960.0, ans=0.125 2023-11-18 08:45:44,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=148026.66666666666, ans=0.125 2023-11-18 08:45:49,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=148026.66666666666, ans=0.07 2023-11-18 08:45:53,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=148093.33333333334, ans=0.125 2023-11-18 08:45:55,677 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.369e+01 1.049e+02 1.137e+02 1.279e+02 1.864e+02, threshold=2.275e+02, percent-clipped=0.0 2023-11-18 08:45:56,295 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.57 vs. limit=15.0 2023-11-18 08:46:04,224 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10200, loss[loss=0.1092, simple_loss=0.1219, pruned_loss=0.03753, audio_tagging_loss=0.01076, over 14868.00 frames. ], tot_loss[loss=0.1258, simple_loss=0.1334, pruned_loss=0.04663, audio_tagging_loss=0.01251, over 3052741.31 frames. ], batch size: 57, lr: 2.50e-02, grad_scale: 32.0 2023-11-18 08:46:10,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=148160.0, ans=0.125 2023-11-18 08:46:10,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=148160.0, ans=0.0 2023-11-18 08:46:11,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=148160.0, ans=0.125 2023-11-18 08:46:20,811 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:46:49,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=148426.66666666666, ans=0.2 2023-11-18 08:47:00,839 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10250, loss[loss=0.1056, simple_loss=0.11, pruned_loss=0.03864, audio_tagging_loss=0.01199, over 15070.00 frames. ], tot_loss[loss=0.1247, simple_loss=0.1319, pruned_loss=0.04605, audio_tagging_loss=0.01268, over 3047078.27 frames. ], batch size: 56, lr: 2.49e-02, grad_scale: 32.0 2023-11-18 08:47:48,252 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.684e+01 1.009e+02 1.120e+02 1.271e+02 1.895e+02, threshold=2.240e+02, percent-clipped=0.0 2023-11-18 08:47:58,057 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10300, loss[loss=0.1346, simple_loss=0.1449, pruned_loss=0.04856, audio_tagging_loss=0.01357, over 16022.00 frames. ], tot_loss[loss=0.1245, simple_loss=0.1316, pruned_loss=0.04597, audio_tagging_loss=0.01268, over 3045804.48 frames. ], batch size: 57, lr: 2.49e-02, grad_scale: 32.0 2023-11-18 08:48:00,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=148826.66666666666, ans=0.2 2023-11-18 08:48:07,223 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.98 vs. limit=6.0 2023-11-18 08:48:12,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=148893.33333333334, ans=0.0 2023-11-18 08:48:28,176 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:48:47,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=149093.33333333334, ans=0.1 2023-11-18 08:48:50,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=149093.33333333334, ans=0.125 2023-11-18 08:48:54,299 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10350, loss[loss=0.1205, simple_loss=0.127, pruned_loss=0.04456, audio_tagging_loss=0.01245, over 14579.00 frames. ], tot_loss[loss=0.1257, simple_loss=0.1331, pruned_loss=0.0464, audio_tagging_loss=0.01272, over 3048774.12 frames. ], batch size: 54, lr: 2.49e-02, grad_scale: 32.0 2023-11-18 08:49:01,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=149160.0, ans=22.5 2023-11-18 08:49:03,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=149160.0, ans=0.1 2023-11-18 08:49:04,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=149226.66666666666, ans=0.125 2023-11-18 08:49:04,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.18 vs. limit=15.0 2023-11-18 08:49:11,484 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.08 vs. limit=10.0 2023-11-18 08:49:21,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=149293.33333333334, ans=0.1 2023-11-18 08:49:41,444 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.630e+01 9.879e+01 1.106e+02 1.235e+02 1.803e+02, threshold=2.212e+02, percent-clipped=0.0 2023-11-18 08:49:50,046 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10400, loss[loss=0.09321, simple_loss=0.1091, pruned_loss=0.02607, audio_tagging_loss=0.01259, over 14129.00 frames. ], tot_loss[loss=0.1264, simple_loss=0.1341, pruned_loss=0.04666, audio_tagging_loss=0.01271, over 3044673.24 frames. ], batch size: 53, lr: 2.49e-02, grad_scale: 32.0 2023-11-18 08:49:50,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=149493.33333333334, ans=0.0 2023-11-18 08:49:54,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=149493.33333333334, ans=0.125 2023-11-18 08:50:05,682 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.00 vs. limit=10.0 2023-11-18 08:50:22,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=149626.66666666666, ans=0.09899494936611666 2023-11-18 08:50:24,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=149693.33333333334, ans=0.125 2023-11-18 08:50:34,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=149760.0, ans=0.125 2023-11-18 08:50:47,474 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10450, loss[loss=0.134, simple_loss=0.1332, pruned_loss=0.0576, audio_tagging_loss=0.009783, over 14813.00 frames. ], tot_loss[loss=0.1255, simple_loss=0.1333, pruned_loss=0.04616, audio_tagging_loss=0.01274, over 3042207.34 frames. ], batch size: 58, lr: 2.48e-02, grad_scale: 32.0 2023-11-18 08:51:00,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=149893.33333333334, ans=0.125 2023-11-18 08:51:04,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=149893.33333333334, ans=0.0 2023-11-18 08:51:07,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=149893.33333333334, ans=0.09899494936611666 2023-11-18 08:51:17,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=149960.0, ans=0.0 2023-11-18 08:51:20,810 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.13 vs. limit=10.0 2023-11-18 08:51:31,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=150093.33333333334, ans=0.125 2023-11-18 08:51:33,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=150093.33333333334, ans=0.125 2023-11-18 08:51:33,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.58 vs. limit=15.0 2023-11-18 08:51:35,244 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.869e+01 9.874e+01 1.064e+02 1.233e+02 1.785e+02, threshold=2.128e+02, percent-clipped=0.0 2023-11-18 08:51:44,324 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10500, loss[loss=0.1425, simple_loss=0.1651, pruned_loss=0.05213, audio_tagging_loss=0.007867, over 15357.00 frames. ], tot_loss[loss=0.1249, simple_loss=0.1325, pruned_loss=0.04602, audio_tagging_loss=0.01257, over 3033635.57 frames. ], batch size: 55, lr: 2.48e-02, grad_scale: 32.0 2023-11-18 08:51:52,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=150160.0, ans=0.125 2023-11-18 08:52:07,623 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.017e+00 2023-11-18 08:52:25,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=150360.0, ans=0.125 2023-11-18 08:52:28,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=150426.66666666666, ans=0.125 2023-11-18 08:52:30,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=150426.66666666666, ans=0.125 2023-11-18 08:52:37,354 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.45 vs. limit=15.0 2023-11-18 08:52:39,970 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10550, loss[loss=0.1573, simple_loss=0.1684, pruned_loss=0.06245, audio_tagging_loss=0.01064, over 14999.00 frames. ], tot_loss[loss=0.1247, simple_loss=0.1327, pruned_loss=0.04601, audio_tagging_loss=0.01236, over 3029568.24 frames. ], batch size: 57, lr: 2.48e-02, grad_scale: 32.0 2023-11-18 08:52:40,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=150493.33333333334, ans=0.2 2023-11-18 08:53:03,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=150626.66666666666, ans=0.125 2023-11-18 08:53:11,135 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.48 vs. limit=6.0 2023-11-18 08:53:13,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.86 vs. limit=10.0 2023-11-18 08:53:27,567 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.461e+01 9.723e+01 1.093e+02 1.257e+02 1.576e+02, threshold=2.186e+02, percent-clipped=0.0 2023-11-18 08:53:37,320 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10600, loss[loss=0.1412, simple_loss=0.1473, pruned_loss=0.0557, audio_tagging_loss=0.0119, over 15063.00 frames. ], tot_loss[loss=0.1249, simple_loss=0.1332, pruned_loss=0.04607, audio_tagging_loss=0.01225, over 3032259.40 frames. ], batch size: 57, lr: 2.48e-02, grad_scale: 32.0 2023-11-18 08:53:54,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=150893.33333333334, ans=0.125 2023-11-18 08:53:55,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=150893.33333333334, ans=0.04949747468305833 2023-11-18 08:53:57,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=150893.33333333334, ans=0.125 2023-11-18 08:53:59,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=150960.0, ans=0.0 2023-11-18 08:54:26,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=151093.33333333334, ans=0.125 2023-11-18 08:54:33,700 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10650, loss[loss=0.0965, simple_loss=0.1033, pruned_loss=0.03138, audio_tagging_loss=0.01347, over 15320.00 frames. ], tot_loss[loss=0.1252, simple_loss=0.1334, pruned_loss=0.04619, audio_tagging_loss=0.01225, over 3028324.65 frames. ], batch size: 59, lr: 2.47e-02, grad_scale: 32.0 2023-11-18 08:54:38,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=151160.0, ans=0.125 2023-11-18 08:54:51,984 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=8.29 vs. limit=15.0 2023-11-18 08:54:59,109 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2023-11-18 08:55:05,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=151293.33333333334, ans=0.0 2023-11-18 08:55:21,757 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.938e+01 1.018e+02 1.108e+02 1.279e+02 1.939e+02, threshold=2.217e+02, percent-clipped=0.0 2023-11-18 08:55:21,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=151426.66666666666, ans=0.125 2023-11-18 08:55:30,336 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10700, loss[loss=0.1014, simple_loss=0.1097, pruned_loss=0.03552, audio_tagging_loss=0.01104, over 14402.00 frames. ], tot_loss[loss=0.1246, simple_loss=0.133, pruned_loss=0.04594, audio_tagging_loss=0.01222, over 3026848.39 frames. ], batch size: 56, lr: 2.47e-02, grad_scale: 32.0 2023-11-18 08:55:35,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.91 vs. limit=15.0 2023-11-18 08:55:43,319 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.81 vs. limit=15.0 2023-11-18 08:56:15,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=151760.0, ans=0.035 2023-11-18 08:56:26,873 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10750, loss[loss=0.1412, simple_loss=0.1517, pruned_loss=0.05356, audio_tagging_loss=0.01179, over 15441.00 frames. ], tot_loss[loss=0.1251, simple_loss=0.1334, pruned_loss=0.04618, audio_tagging_loss=0.0122, over 3032212.47 frames. ], batch size: 56, lr: 2.47e-02, grad_scale: 32.0 2023-11-18 08:56:39,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.06 vs. limit=15.0 2023-11-18 08:56:43,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.75 vs. limit=15.0 2023-11-18 08:56:44,540 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.12 vs. limit=15.0 2023-11-18 08:56:47,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=151893.33333333334, ans=0.2 2023-11-18 08:56:52,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=151960.0, ans=0.1 2023-11-18 08:57:03,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=152026.66666666666, ans=0.0 2023-11-18 08:57:06,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=152026.66666666666, ans=0.125 2023-11-18 08:57:14,951 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.096e+01 9.809e+01 1.098e+02 1.227e+02 2.197e+02, threshold=2.197e+02, percent-clipped=0.0 2023-11-18 08:57:15,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.58 vs. limit=10.0 2023-11-18 08:57:24,150 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10800, loss[loss=0.1343, simple_loss=0.1473, pruned_loss=0.04612, audio_tagging_loss=0.01451, over 15296.00 frames. ], tot_loss[loss=0.1257, simple_loss=0.1339, pruned_loss=0.04654, audio_tagging_loss=0.01222, over 3032211.56 frames. ], batch size: 56, lr: 2.47e-02, grad_scale: 32.0 2023-11-18 08:57:40,679 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.55 vs. limit=6.0 2023-11-18 08:58:04,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=152360.0, ans=0.0 2023-11-18 08:58:11,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.89 vs. limit=6.0 2023-11-18 08:58:21,123 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10850, loss[loss=0.09151, simple_loss=0.09215, pruned_loss=0.03385, audio_tagging_loss=0.01159, over 15433.00 frames. ], tot_loss[loss=0.1256, simple_loss=0.1334, pruned_loss=0.04657, audio_tagging_loss=0.01234, over 3038640.92 frames. ], batch size: 63, lr: 2.46e-02, grad_scale: 32.0 2023-11-18 08:58:28,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=152493.33333333334, ans=0.125 2023-11-18 08:58:45,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=152626.66666666666, ans=0.0 2023-11-18 08:59:06,443 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 08:59:08,397 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.824e+01 1.080e+02 1.224e+02 1.410e+02 3.165e+02, threshold=2.449e+02, percent-clipped=2.0 2023-11-18 08:59:10,631 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 08:59:14,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=152760.0, ans=0.125 2023-11-18 08:59:17,585 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10900, loss[loss=0.1, simple_loss=0.09816, pruned_loss=0.0354, audio_tagging_loss=0.01554, over 15276.00 frames. ], tot_loss[loss=0.1258, simple_loss=0.1338, pruned_loss=0.04647, audio_tagging_loss=0.01244, over 3043978.92 frames. ], batch size: 57, lr: 2.46e-02, grad_scale: 32.0 2023-11-18 08:59:27,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=152826.66666666666, ans=0.125 2023-11-18 08:59:28,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=152893.33333333334, ans=0.0 2023-11-18 08:59:50,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=153026.66666666666, ans=0.125 2023-11-18 08:59:52,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=153026.66666666666, ans=0.125 2023-11-18 09:00:11,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=153093.33333333334, ans=0.2 2023-11-18 09:00:12,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=153093.33333333334, ans=0.0 2023-11-18 09:00:14,335 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 10950, loss[loss=0.09342, simple_loss=0.1052, pruned_loss=0.03131, audio_tagging_loss=0.009507, over 15340.00 frames. ], tot_loss[loss=0.1249, simple_loss=0.133, pruned_loss=0.04593, audio_tagging_loss=0.01252, over 3045226.62 frames. ], batch size: 58, lr: 2.46e-02, grad_scale: 32.0 2023-11-18 09:00:25,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=153226.66666666666, ans=0.0 2023-11-18 09:00:29,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=153226.66666666666, ans=0.95 2023-11-18 09:00:30,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.93 vs. limit=22.5 2023-11-18 09:00:45,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=153293.33333333334, ans=0.0 2023-11-18 09:01:02,029 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.175e+01 9.744e+01 1.111e+02 1.253e+02 1.675e+02, threshold=2.223e+02, percent-clipped=0.0 2023-11-18 09:01:10,791 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11000, loss[loss=0.1183, simple_loss=0.1205, pruned_loss=0.04466, audio_tagging_loss=0.0134, over 15520.00 frames. ], tot_loss[loss=0.1242, simple_loss=0.1322, pruned_loss=0.0456, audio_tagging_loss=0.01256, over 3047026.15 frames. ], batch size: 59, lr: 2.46e-02, grad_scale: 32.0 2023-11-18 09:01:14,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=153493.33333333334, ans=0.125 2023-11-18 09:01:17,744 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:01:28,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=153560.0, ans=0.125 2023-11-18 09:01:38,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=153626.66666666666, ans=0.1 2023-11-18 09:01:44,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=153693.33333333334, ans=0.0 2023-11-18 09:01:53,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=153693.33333333334, ans=0.125 2023-11-18 09:02:00,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=153760.0, ans=0.125 2023-11-18 09:02:01,437 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=17.04 vs. limit=15.0 2023-11-18 09:02:05,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=153760.0, ans=0.125 2023-11-18 09:02:07,783 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11050, loss[loss=0.1343, simple_loss=0.1383, pruned_loss=0.05111, audio_tagging_loss=0.01402, over 15447.00 frames. ], tot_loss[loss=0.1246, simple_loss=0.1325, pruned_loss=0.04572, audio_tagging_loss=0.01264, over 3045098.50 frames. ], batch size: 58, lr: 2.45e-02, grad_scale: 32.0 2023-11-18 09:02:17,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=153826.66666666666, ans=0.125 2023-11-18 09:02:18,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=153893.33333333334, ans=0.125 2023-11-18 09:02:26,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=153893.33333333334, ans=0.125 2023-11-18 09:02:32,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=153960.0, ans=0.125 2023-11-18 09:02:40,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=154026.66666666666, ans=0.0 2023-11-18 09:02:42,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=154026.66666666666, ans=0.0 2023-11-18 09:02:52,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=154093.33333333334, ans=0.2 2023-11-18 09:02:55,027 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.060e+01 9.808e+01 1.104e+02 1.219e+02 2.392e+02, threshold=2.208e+02, percent-clipped=1.0 2023-11-18 09:02:56,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=154093.33333333334, ans=0.125 2023-11-18 09:02:56,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=154093.33333333334, ans=0.2 2023-11-18 09:03:02,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=154093.33333333334, ans=0.125 2023-11-18 09:03:04,802 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11100, loss[loss=0.1319, simple_loss=0.1269, pruned_loss=0.05375, audio_tagging_loss=0.01468, over 13928.00 frames. ], tot_loss[loss=0.1247, simple_loss=0.1326, pruned_loss=0.04576, audio_tagging_loss=0.01266, over 3047518.86 frames. ], batch size: 53, lr: 2.45e-02, grad_scale: 32.0 2023-11-18 09:03:09,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=154160.0, ans=0.95 2023-11-18 09:03:18,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=154226.66666666666, ans=0.1 2023-11-18 09:04:00,574 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11150, loss[loss=0.1643, simple_loss=0.1758, pruned_loss=0.06574, audio_tagging_loss=0.01064, over 15491.00 frames. ], tot_loss[loss=0.126, simple_loss=0.1336, pruned_loss=0.04645, audio_tagging_loss=0.0127, over 3050859.16 frames. ], batch size: 58, lr: 2.45e-02, grad_scale: 32.0 2023-11-18 09:04:03,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=154493.33333333334, ans=0.0 2023-11-18 09:04:08,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=154493.33333333334, ans=0.0 2023-11-18 09:04:09,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=154493.33333333334, ans=0.125 2023-11-18 09:04:09,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=154493.33333333334, ans=0.0 2023-11-18 09:04:11,684 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2023-11-18 09:04:19,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=154560.0, ans=0.125 2023-11-18 09:04:48,102 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 1.033e+02 1.136e+02 1.301e+02 2.057e+02, threshold=2.273e+02, percent-clipped=0.0 2023-11-18 09:04:57,171 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11200, loss[loss=0.1064, simple_loss=0.107, pruned_loss=0.03928, audio_tagging_loss=0.01367, over 14269.00 frames. ], tot_loss[loss=0.1252, simple_loss=0.133, pruned_loss=0.04599, audio_tagging_loss=0.01275, over 3050788.76 frames. ], batch size: 57, lr: 2.45e-02, grad_scale: 32.0 2023-11-18 09:05:02,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=154826.66666666666, ans=0.07 2023-11-18 09:05:12,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=154893.33333333334, ans=0.125 2023-11-18 09:05:19,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=154960.0, ans=0.95 2023-11-18 09:05:26,877 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:05:30,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.98 vs. limit=15.0 2023-11-18 09:05:38,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.70 vs. limit=22.5 2023-11-18 09:05:40,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=155026.66666666666, ans=0.2 2023-11-18 09:05:45,348 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.78 vs. limit=22.5 2023-11-18 09:05:47,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=155093.33333333334, ans=0.125 2023-11-18 09:05:53,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=155160.0, ans=0.125 2023-11-18 09:05:53,932 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11250, loss[loss=0.1552, simple_loss=0.1572, pruned_loss=0.05968, audio_tagging_loss=0.01692, over 15105.00 frames. ], tot_loss[loss=0.1238, simple_loss=0.1314, pruned_loss=0.04548, audio_tagging_loss=0.01266, over 3054632.24 frames. ], batch size: 58, lr: 2.44e-02, grad_scale: 32.0 2023-11-18 09:06:00,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.87 vs. limit=15.0 2023-11-18 09:06:12,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=155226.66666666666, ans=0.125 2023-11-18 09:06:14,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=155293.33333333334, ans=0.125 2023-11-18 09:06:14,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=155293.33333333334, ans=0.1 2023-11-18 09:06:19,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=155293.33333333334, ans=0.07 2023-11-18 09:06:29,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=155360.0, ans=0.1 2023-11-18 09:06:41,313 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.020e+01 9.682e+01 1.104e+02 1.218e+02 1.906e+02, threshold=2.209e+02, percent-clipped=0.0 2023-11-18 09:06:45,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=155426.66666666666, ans=0.1 2023-11-18 09:06:49,987 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11300, loss[loss=0.1033, simple_loss=0.1204, pruned_loss=0.03453, audio_tagging_loss=0.008575, over 14666.00 frames. ], tot_loss[loss=0.1239, simple_loss=0.1317, pruned_loss=0.04558, audio_tagging_loss=0.01249, over 3048620.03 frames. ], batch size: 57, lr: 2.44e-02, grad_scale: 32.0 2023-11-18 09:07:18,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=155626.66666666666, ans=10.0 2023-11-18 09:07:26,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=155693.33333333334, ans=0.125 2023-11-18 09:07:40,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=155760.0, ans=0.0 2023-11-18 09:07:45,921 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11350, loss[loss=0.1079, simple_loss=0.1138, pruned_loss=0.03994, audio_tagging_loss=0.01104, over 15306.00 frames. ], tot_loss[loss=0.124, simple_loss=0.1318, pruned_loss=0.04571, audio_tagging_loss=0.01236, over 3048641.08 frames. ], batch size: 57, lr: 2.44e-02, grad_scale: 32.0 2023-11-18 09:07:55,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.32 vs. limit=10.0 2023-11-18 09:08:06,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=155893.33333333334, ans=0.125 2023-11-18 09:08:28,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=156026.66666666666, ans=0.1 2023-11-18 09:08:33,863 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.925e+01 1.029e+02 1.095e+02 1.224e+02 1.585e+02, threshold=2.190e+02, percent-clipped=0.0 2023-11-18 09:08:43,600 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11400, loss[loss=0.1004, simple_loss=0.1088, pruned_loss=0.03331, audio_tagging_loss=0.01266, over 14397.00 frames. ], tot_loss[loss=0.1246, simple_loss=0.1329, pruned_loss=0.04592, audio_tagging_loss=0.01227, over 3048417.68 frames. ], batch size: 54, lr: 2.44e-02, grad_scale: 32.0 2023-11-18 09:08:47,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=156160.0, ans=0.125 2023-11-18 09:08:59,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=156226.66666666666, ans=0.125 2023-11-18 09:08:59,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=156226.66666666666, ans=0.125 2023-11-18 09:08:59,937 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.23 vs. limit=22.5 2023-11-18 09:09:12,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=156293.33333333334, ans=0.0 2023-11-18 09:09:18,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.43 vs. limit=15.0 2023-11-18 09:09:22,126 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.88 vs. limit=15.0 2023-11-18 09:09:33,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=156426.66666666666, ans=0.125 2023-11-18 09:09:35,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=156426.66666666666, ans=0.125 2023-11-18 09:09:39,849 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11450, loss[loss=0.1534, simple_loss=0.1661, pruned_loss=0.05709, audio_tagging_loss=0.0133, over 14648.00 frames. ], tot_loss[loss=0.1248, simple_loss=0.1331, pruned_loss=0.04612, audio_tagging_loss=0.01219, over 3053543.04 frames. ], batch size: 54, lr: 2.43e-02, grad_scale: 32.0 2023-11-18 09:09:40,319 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.82 vs. limit=10.0 2023-11-18 09:09:54,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=156560.0, ans=0.0 2023-11-18 09:10:05,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=156626.66666666666, ans=10.0 2023-11-18 09:10:05,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=156626.66666666666, ans=0.0 2023-11-18 09:10:06,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=156626.66666666666, ans=0.09899494936611666 2023-11-18 09:10:22,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=156693.33333333334, ans=0.0 2023-11-18 09:10:24,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.13 vs. limit=15.0 2023-11-18 09:10:26,555 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.266e+01 9.860e+01 1.075e+02 1.215e+02 1.820e+02, threshold=2.151e+02, percent-clipped=0.0 2023-11-18 09:10:35,116 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11500, loss[loss=0.1066, simple_loss=0.1195, pruned_loss=0.03473, audio_tagging_loss=0.01209, over 14349.00 frames. ], tot_loss[loss=0.1247, simple_loss=0.133, pruned_loss=0.04602, audio_tagging_loss=0.0122, over 3052497.33 frames. ], batch size: 52, lr: 2.43e-02, grad_scale: 32.0 2023-11-18 09:10:44,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=156826.66666666666, ans=0.125 2023-11-18 09:10:58,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=156960.0, ans=0.1 2023-11-18 09:11:07,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.16 vs. limit=6.0 2023-11-18 09:11:31,797 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11550, loss[loss=0.1472, simple_loss=0.165, pruned_loss=0.05322, audio_tagging_loss=0.01151, over 15556.00 frames. ], tot_loss[loss=0.1259, simple_loss=0.1346, pruned_loss=0.04652, audio_tagging_loss=0.01213, over 3054056.20 frames. ], batch size: 55, lr: 2.43e-02, grad_scale: 32.0 2023-11-18 09:11:33,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=157160.0, ans=0.0 2023-11-18 09:11:38,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=157160.0, ans=0.2 2023-11-18 09:11:42,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=16.79 vs. limit=15.0 2023-11-18 09:11:46,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=157226.66666666666, ans=0.2 2023-11-18 09:11:47,152 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.15 vs. limit=10.0 2023-11-18 09:12:01,698 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:12:02,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=157293.33333333334, ans=0.125 2023-11-18 09:12:18,753 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.77 vs. limit=15.0 2023-11-18 09:12:19,807 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.118e+01 1.012e+02 1.136e+02 1.340e+02 1.723e+02, threshold=2.272e+02, percent-clipped=0.0 2023-11-18 09:12:28,352 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11600, loss[loss=0.1033, simple_loss=0.09836, pruned_loss=0.03852, audio_tagging_loss=0.01561, over 15504.00 frames. ], tot_loss[loss=0.1265, simple_loss=0.1354, pruned_loss=0.0468, audio_tagging_loss=0.01198, over 3054786.45 frames. ], batch size: 60, lr: 2.43e-02, grad_scale: 32.0 2023-11-18 09:12:30,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.23 vs. limit=22.5 2023-11-18 09:13:07,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=157693.33333333334, ans=0.2 2023-11-18 09:13:21,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=157760.0, ans=0.125 2023-11-18 09:13:22,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=157826.66666666666, ans=0.125 2023-11-18 09:13:23,778 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11650, loss[loss=0.1354, simple_loss=0.1455, pruned_loss=0.04907, audio_tagging_loss=0.01359, over 15581.00 frames. ], tot_loss[loss=0.1274, simple_loss=0.1364, pruned_loss=0.04713, audio_tagging_loss=0.0121, over 3063303.54 frames. ], batch size: 57, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:13:44,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=157893.33333333334, ans=0.0 2023-11-18 09:13:44,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=157893.33333333334, ans=0.04949747468305833 2023-11-18 09:13:52,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=157960.0, ans=0.1 2023-11-18 09:13:56,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=158026.66666666666, ans=0.1 2023-11-18 09:14:10,052 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.668e+01 1.030e+02 1.124e+02 1.249e+02 1.579e+02, threshold=2.249e+02, percent-clipped=0.0 2023-11-18 09:14:15,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.67 vs. limit=10.0 2023-11-18 09:14:15,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=158093.33333333334, ans=0.0 2023-11-18 09:14:18,998 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11700, loss[loss=0.09895, simple_loss=0.09937, pruned_loss=0.03347, audio_tagging_loss=0.0158, over 15273.00 frames. ], tot_loss[loss=0.1264, simple_loss=0.1352, pruned_loss=0.04668, audio_tagging_loss=0.01213, over 3055636.08 frames. ], batch size: 59, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:14:31,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=158226.66666666666, ans=0.125 2023-11-18 09:14:39,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=158226.66666666666, ans=0.125 2023-11-18 09:14:44,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=158293.33333333334, ans=0.0 2023-11-18 09:15:01,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=158360.0, ans=0.125 2023-11-18 09:15:04,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=158426.66666666666, ans=0.0 2023-11-18 09:15:15,192 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11750, loss[loss=0.0935, simple_loss=0.09365, pruned_loss=0.03193, audio_tagging_loss=0.01475, over 14207.00 frames. ], tot_loss[loss=0.1254, simple_loss=0.134, pruned_loss=0.04627, audio_tagging_loss=0.01215, over 3047923.62 frames. ], batch size: 57, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:15:37,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=158626.66666666666, ans=0.125 2023-11-18 09:15:55,370 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.91 vs. limit=15.0 2023-11-18 09:16:01,614 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 9.907e+01 1.124e+02 1.266e+02 1.981e+02, threshold=2.248e+02, percent-clipped=0.0 2023-11-18 09:16:10,015 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11800, loss[loss=0.1509, simple_loss=0.1619, pruned_loss=0.05835, audio_tagging_loss=0.01162, over 15219.00 frames. ], tot_loss[loss=0.1244, simple_loss=0.1327, pruned_loss=0.04578, audio_tagging_loss=0.01229, over 3043192.47 frames. ], batch size: 56, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:16:11,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=158826.66666666666, ans=0.125 2023-11-18 09:16:15,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=158826.66666666666, ans=0.125 2023-11-18 09:16:37,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.98 vs. limit=22.5 2023-11-18 09:17:02,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=159093.33333333334, ans=0.1 2023-11-18 09:17:05,605 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11850, loss[loss=0.1614, simple_loss=0.1805, pruned_loss=0.06229, audio_tagging_loss=0.008865, over 16965.00 frames. ], tot_loss[loss=0.1247, simple_loss=0.1325, pruned_loss=0.04604, audio_tagging_loss=0.01242, over 3045402.77 frames. ], batch size: 59, lr: 2.42e-02, grad_scale: 32.0 2023-11-18 09:17:05,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=159160.0, ans=0.125 2023-11-18 09:17:27,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=159293.33333333334, ans=0.125 2023-11-18 09:17:30,057 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.02 vs. limit=22.5 2023-11-18 09:17:37,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=159293.33333333334, ans=0.125 2023-11-18 09:17:41,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=159360.0, ans=0.04949747468305833 2023-11-18 09:17:52,693 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.196e+01 1.014e+02 1.138e+02 1.282e+02 2.288e+02, threshold=2.275e+02, percent-clipped=1.0 2023-11-18 09:17:53,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=159426.66666666666, ans=0.04949747468305833 2023-11-18 09:17:55,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.05 vs. limit=15.0 2023-11-18 09:18:01,628 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11900, loss[loss=0.1248, simple_loss=0.1245, pruned_loss=0.04827, audio_tagging_loss=0.0143, over 14682.00 frames. ], tot_loss[loss=0.1246, simple_loss=0.1325, pruned_loss=0.0458, audio_tagging_loss=0.01254, over 3047743.05 frames. ], batch size: 55, lr: 2.41e-02, grad_scale: 32.0 2023-11-18 09:18:56,791 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 11950, loss[loss=0.1419, simple_loss=0.1624, pruned_loss=0.05082, audio_tagging_loss=0.00985, over 15615.00 frames. ], tot_loss[loss=0.1251, simple_loss=0.1332, pruned_loss=0.04599, audio_tagging_loss=0.01244, over 3045783.16 frames. ], batch size: 57, lr: 2.41e-02, grad_scale: 32.0 2023-11-18 09:18:58,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=159826.66666666666, ans=0.2 2023-11-18 09:19:00,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=159826.66666666666, ans=0.125 2023-11-18 09:19:00,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.27 vs. limit=12.0 2023-11-18 09:19:06,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=159893.33333333334, ans=0.125 2023-11-18 09:19:07,258 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=7.21 vs. limit=15.0 2023-11-18 09:19:08,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=159893.33333333334, ans=0.125 2023-11-18 09:19:12,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten.whitening_limit, batch_count=159893.33333333334, ans=15.0 2023-11-18 09:19:24,940 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-24000.pt 2023-11-18 09:19:32,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=160026.66666666666, ans=0.0 2023-11-18 09:19:33,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=160026.66666666666, ans=0.125 2023-11-18 09:19:44,164 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.182e+01 9.874e+01 1.073e+02 1.187e+02 1.717e+02, threshold=2.145e+02, percent-clipped=0.0 2023-11-18 09:19:46,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=160093.33333333334, ans=0.0 2023-11-18 09:19:47,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=160093.33333333334, ans=0.125 2023-11-18 09:19:52,774 INFO [train_asr.py:1115] (0/4) Epoch 2, batch 12000, loss[loss=0.1466, simple_loss=0.1484, pruned_loss=0.06207, audio_tagging_loss=0.01036, over 15313.00 frames. ], tot_loss[loss=0.1267, simple_loss=0.1349, pruned_loss=0.04683, audio_tagging_loss=0.01244, over 3051704.17 frames. ], batch size: 58, lr: 2.41e-02, grad_scale: 32.0 2023-11-18 09:19:52,776 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 09:20:12,968 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8560, 2.8689, 4.7832, 4.2214], device='cuda:0') 2023-11-18 09:20:15,858 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7463, 5.7759, 5.8622, 5.8492], device='cuda:0') 2023-11-18 09:20:22,857 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.2041, 2.3078, 5.1633, 2.1510], device='cuda:0') 2023-11-18 09:20:26,781 INFO [train_asr.py:1147] (0/4) Epoch 2, validation: loss=0.08437, simple_loss=0.06733, pruned_loss=0.01363, audio_tagging_loss=0.03708, over 4681554.00 frames. 2023-11-18 09:20:26,782 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 09:20:32,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=160160.0, ans=0.0 2023-11-18 09:20:43,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=160226.66666666666, ans=0.2 2023-11-18 09:20:50,363 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-2.pt 2023-11-18 09:21:26,868 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 0, loss[loss=0.128, simple_loss=0.1224, pruned_loss=0.03782, audio_tagging_loss=0.02901, over 14728.00 frames. ], tot_loss[loss=0.128, simple_loss=0.1224, pruned_loss=0.03782, audio_tagging_loss=0.02901, over 14728.00 frames. ], batch size: 57, lr: 2.29e-02, grad_scale: 32.0 2023-11-18 09:21:26,870 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 09:21:58,063 INFO [train_asr.py:1147] (0/4) Epoch 3, validation: loss=0.08217, simple_loss=0.06725, pruned_loss=0.01375, audio_tagging_loss=0.03479, over 4681554.00 frames. 2023-11-18 09:21:58,064 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 09:22:04,080 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:22:09,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=160366.66666666666, ans=0.125 2023-11-18 09:22:14,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=160366.66666666666, ans=0.0 2023-11-18 09:22:14,949 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.74 vs. limit=22.5 2023-11-18 09:22:25,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=160433.33333333334, ans=0.125 2023-11-18 09:22:29,698 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.11 vs. limit=6.0 2023-11-18 09:22:45,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=160566.66666666666, ans=0.125 2023-11-18 09:22:53,018 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 50, loss[loss=0.1539, simple_loss=0.1561, pruned_loss=0.05538, audio_tagging_loss=0.02048, over 15072.00 frames. ], tot_loss[loss=0.1329, simple_loss=0.1298, pruned_loss=0.04398, audio_tagging_loss=0.02404, over 691573.97 frames. ], batch size: 55, lr: 2.29e-02, grad_scale: 32.0 2023-11-18 09:22:54,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=160633.33333333334, ans=0.125 2023-11-18 09:22:55,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=160633.33333333334, ans=0.0 2023-11-18 09:22:57,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=160633.33333333334, ans=0.125 2023-11-18 09:23:16,044 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.190e+01 1.036e+02 1.137e+02 1.326e+02 1.917e+02, threshold=2.275e+02, percent-clipped=0.0 2023-11-18 09:23:33,147 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.11 vs. limit=15.0 2023-11-18 09:23:45,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=160900.0, ans=0.125 2023-11-18 09:23:47,694 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 100, loss[loss=0.133, simple_loss=0.1427, pruned_loss=0.04217, audio_tagging_loss=0.01948, over 15200.00 frames. ], tot_loss[loss=0.1325, simple_loss=0.131, pruned_loss=0.04397, audio_tagging_loss=0.02303, over 1215563.61 frames. ], batch size: 55, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:23:52,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.13 vs. limit=15.0 2023-11-18 09:24:01,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=161033.33333333334, ans=0.125 2023-11-18 09:24:21,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=161166.66666666666, ans=0.0 2023-11-18 09:24:34,605 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.63 vs. limit=15.0 2023-11-18 09:24:37,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=161233.33333333334, ans=0.1 2023-11-18 09:24:43,654 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 150, loss[loss=0.1456, simple_loss=0.1627, pruned_loss=0.05139, audio_tagging_loss=0.01288, over 15606.00 frames. ], tot_loss[loss=0.1305, simple_loss=0.1323, pruned_loss=0.04383, audio_tagging_loss=0.0205, over 1619351.25 frames. ], batch size: 60, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:24:45,269 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=25.30 vs. limit=22.5 2023-11-18 09:24:50,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=161300.0, ans=0.2 2023-11-18 09:24:57,770 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=24.48 vs. limit=22.5 2023-11-18 09:25:06,615 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.947e+01 1.007e+02 1.136e+02 1.298e+02 1.875e+02, threshold=2.273e+02, percent-clipped=0.0 2023-11-18 09:25:20,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=161500.0, ans=0.125 2023-11-18 09:25:39,265 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 200, loss[loss=0.152, simple_loss=0.1706, pruned_loss=0.05621, audio_tagging_loss=0.01052, over 14488.00 frames. ], tot_loss[loss=0.1275, simple_loss=0.1318, pruned_loss=0.0437, audio_tagging_loss=0.01793, over 1936902.07 frames. ], batch size: 54, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:25:39,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=161633.33333333334, ans=0.125 2023-11-18 09:25:58,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=161700.0, ans=0.125 2023-11-18 09:26:08,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=161766.66666666666, ans=0.0 2023-11-18 09:26:32,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=161900.0, ans=0.2 2023-11-18 09:26:34,568 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 250, loss[loss=0.1314, simple_loss=0.146, pruned_loss=0.04857, audio_tagging_loss=0.009829, over 14327.00 frames. ], tot_loss[loss=0.1245, simple_loss=0.1301, pruned_loss=0.04329, audio_tagging_loss=0.01616, over 2177289.15 frames. ], batch size: 52, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:26:44,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=161966.66666666666, ans=0.2 2023-11-18 09:26:44,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=161966.66666666666, ans=0.125 2023-11-18 09:26:50,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=162033.33333333334, ans=0.125 2023-11-18 09:26:57,823 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.024e+01 1.002e+02 1.144e+02 1.310e+02 1.731e+02, threshold=2.288e+02, percent-clipped=0.0 2023-11-18 09:27:02,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=162100.0, ans=0.125 2023-11-18 09:27:05,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.82 vs. limit=15.0 2023-11-18 09:27:09,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=162166.66666666666, ans=0.125 2023-11-18 09:27:10,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=162166.66666666666, ans=0.125 2023-11-18 09:27:26,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=162233.33333333334, ans=0.2 2023-11-18 09:27:27,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=162233.33333333334, ans=0.125 2023-11-18 09:27:30,575 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 300, loss[loss=0.1392, simple_loss=0.1397, pruned_loss=0.05317, audio_tagging_loss=0.01616, over 15892.00 frames. ], tot_loss[loss=0.1259, simple_loss=0.133, pruned_loss=0.04449, audio_tagging_loss=0.01492, over 2375676.66 frames. ], batch size: 59, lr: 2.28e-02, grad_scale: 64.0 2023-11-18 09:27:36,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=162300.0, ans=0.125 2023-11-18 09:27:57,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=162433.33333333334, ans=0.1 2023-11-18 09:28:21,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=162566.66666666666, ans=0.1 2023-11-18 09:28:25,389 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 350, loss[loss=0.1165, simple_loss=0.1308, pruned_loss=0.04131, audio_tagging_loss=0.009772, over 14982.00 frames. ], tot_loss[loss=0.1238, simple_loss=0.1312, pruned_loss=0.04415, audio_tagging_loss=0.01402, over 2517992.88 frames. ], batch size: 55, lr: 2.27e-02, grad_scale: 64.0 2023-11-18 09:28:49,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=162766.66666666666, ans=0.1 2023-11-18 09:28:50,147 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.758e+01 9.861e+01 1.085e+02 1.214e+02 1.858e+02, threshold=2.170e+02, percent-clipped=0.0 2023-11-18 09:28:52,971 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.31 vs. limit=10.0 2023-11-18 09:28:59,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=162833.33333333334, ans=0.125 2023-11-18 09:29:02,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=162833.33333333334, ans=0.0 2023-11-18 09:29:21,421 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 400, loss[loss=0.1211, simple_loss=0.1274, pruned_loss=0.0442, audio_tagging_loss=0.0132, over 15538.00 frames. ], tot_loss[loss=0.1238, simple_loss=0.1317, pruned_loss=0.04442, audio_tagging_loss=0.01355, over 2629992.87 frames. ], batch size: 58, lr: 2.27e-02, grad_scale: 64.0 2023-11-18 09:29:32,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=163033.33333333334, ans=0.0 2023-11-18 09:29:33,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=163033.33333333334, ans=0.125 2023-11-18 09:29:38,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=163033.33333333334, ans=0.1 2023-11-18 09:29:43,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=163100.0, ans=0.125 2023-11-18 09:30:02,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=163166.66666666666, ans=0.2 2023-11-18 09:30:04,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=163166.66666666666, ans=0.125 2023-11-18 09:30:13,558 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.29 vs. limit=15.0 2023-11-18 09:30:18,230 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 450, loss[loss=0.1375, simple_loss=0.1566, pruned_loss=0.05048, audio_tagging_loss=0.008728, over 16222.00 frames. ], tot_loss[loss=0.1217, simple_loss=0.1299, pruned_loss=0.04349, audio_tagging_loss=0.01329, over 2724182.60 frames. ], batch size: 58, lr: 2.27e-02, grad_scale: 64.0 2023-11-18 09:30:18,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=163300.0, ans=0.125 2023-11-18 09:30:24,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=163300.0, ans=0.2 2023-11-18 09:30:39,796 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.91 vs. limit=10.0 2023-11-18 09:30:40,028 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.316e+01 9.836e+01 1.125e+02 1.262e+02 2.640e+02, threshold=2.251e+02, percent-clipped=1.0 2023-11-18 09:30:43,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=163433.33333333334, ans=0.035 2023-11-18 09:31:01,820 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.44 vs. limit=15.0 2023-11-18 09:31:06,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=163566.66666666666, ans=0.1 2023-11-18 09:31:12,863 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 500, loss[loss=0.1314, simple_loss=0.1412, pruned_loss=0.04796, audio_tagging_loss=0.0128, over 15282.00 frames. ], tot_loss[loss=0.1229, simple_loss=0.1313, pruned_loss=0.0442, audio_tagging_loss=0.01301, over 2798920.04 frames. ], batch size: 56, lr: 2.27e-02, grad_scale: 64.0 2023-11-18 09:31:26,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=163700.0, ans=0.1 2023-11-18 09:32:07,426 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 550, loss[loss=0.1112, simple_loss=0.1245, pruned_loss=0.03566, audio_tagging_loss=0.0133, over 15133.00 frames. ], tot_loss[loss=0.1224, simple_loss=0.1311, pruned_loss=0.04394, audio_tagging_loss=0.01293, over 2851801.21 frames. ], batch size: 56, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:32:11,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=163966.66666666666, ans=0.2 2023-11-18 09:32:14,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.20 vs. limit=15.0 2023-11-18 09:32:31,486 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 9.566e+01 1.089e+02 1.252e+02 1.679e+02, threshold=2.177e+02, percent-clipped=0.0 2023-11-18 09:32:41,519 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.61 vs. limit=15.0 2023-11-18 09:32:53,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=164233.33333333334, ans=0.2 2023-11-18 09:32:54,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=164233.33333333334, ans=0.0 2023-11-18 09:33:03,673 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 600, loss[loss=0.1433, simple_loss=0.1623, pruned_loss=0.05461, audio_tagging_loss=0.007556, over 15265.00 frames. ], tot_loss[loss=0.121, simple_loss=0.1295, pruned_loss=0.04347, audio_tagging_loss=0.01277, over 2890047.70 frames. ], batch size: 55, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:33:37,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=164500.0, ans=0.125 2023-11-18 09:33:44,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.47 vs. limit=15.0 2023-11-18 09:33:46,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=164566.66666666666, ans=0.2 2023-11-18 09:33:50,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=164566.66666666666, ans=0.125 2023-11-18 09:33:51,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=164566.66666666666, ans=0.04949747468305833 2023-11-18 09:33:52,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=164566.66666666666, ans=0.2 2023-11-18 09:33:57,687 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 650, loss[loss=0.159, simple_loss=0.1755, pruned_loss=0.06271, audio_tagging_loss=0.008521, over 14802.00 frames. ], tot_loss[loss=0.1214, simple_loss=0.1303, pruned_loss=0.04359, audio_tagging_loss=0.01261, over 2927663.32 frames. ], batch size: 54, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:34:02,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=164633.33333333334, ans=0.125 2023-11-18 09:34:02,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=164633.33333333334, ans=0.1 2023-11-18 09:34:11,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=164700.0, ans=0.125 2023-11-18 09:34:11,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=164700.0, ans=0.05 2023-11-18 09:34:12,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=164700.0, ans=0.125 2023-11-18 09:34:20,738 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.536e+01 9.683e+01 1.100e+02 1.220e+02 1.764e+02, threshold=2.199e+02, percent-clipped=0.0 2023-11-18 09:34:28,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=164766.66666666666, ans=0.125 2023-11-18 09:34:31,123 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.574e-01 2023-11-18 09:34:52,271 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 700, loss[loss=0.1287, simple_loss=0.1467, pruned_loss=0.03963, audio_tagging_loss=0.01569, over 15781.00 frames. ], tot_loss[loss=0.1211, simple_loss=0.1305, pruned_loss=0.04342, audio_tagging_loss=0.01246, over 2958686.65 frames. ], batch size: 57, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:34:59,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=164966.66666666666, ans=0.2 2023-11-18 09:35:00,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=164966.66666666666, ans=0.05 2023-11-18 09:35:23,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=165100.0, ans=0.125 2023-11-18 09:35:33,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=165166.66666666666, ans=0.1 2023-11-18 09:35:34,976 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:35:36,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=165233.33333333334, ans=0.125 2023-11-18 09:35:38,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=165233.33333333334, ans=0.125 2023-11-18 09:35:39,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.68 vs. limit=15.0 2023-11-18 09:35:42,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=165233.33333333334, ans=0.09899494936611666 2023-11-18 09:35:49,137 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 750, loss[loss=0.1277, simple_loss=0.1319, pruned_loss=0.04914, audio_tagging_loss=0.01257, over 16197.00 frames. ], tot_loss[loss=0.1205, simple_loss=0.1297, pruned_loss=0.04327, audio_tagging_loss=0.01244, over 2979315.17 frames. ], batch size: 62, lr: 2.26e-02, grad_scale: 64.0 2023-11-18 09:35:50,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=165300.0, ans=0.0 2023-11-18 09:36:04,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=165366.66666666666, ans=0.0 2023-11-18 09:36:06,815 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.56 vs. limit=12.0 2023-11-18 09:36:07,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=165366.66666666666, ans=0.0 2023-11-18 09:36:11,466 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.593e+01 1.007e+02 1.126e+02 1.277e+02 1.870e+02, threshold=2.252e+02, percent-clipped=0.0 2023-11-18 09:36:15,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=165433.33333333334, ans=0.125 2023-11-18 09:36:21,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=165500.0, ans=0.125 2023-11-18 09:36:39,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.72 vs. limit=12.0 2023-11-18 09:36:42,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=165566.66666666666, ans=0.0 2023-11-18 09:36:44,359 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 800, loss[loss=0.1594, simple_loss=0.1833, pruned_loss=0.05865, audio_tagging_loss=0.009142, over 14763.00 frames. ], tot_loss[loss=0.1211, simple_loss=0.1301, pruned_loss=0.0436, audio_tagging_loss=0.01246, over 2991435.04 frames. ], batch size: 53, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:36:52,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=165633.33333333334, ans=0.0 2023-11-18 09:37:39,015 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 850, loss[loss=0.1282, simple_loss=0.1421, pruned_loss=0.04572, audio_tagging_loss=0.01145, over 15040.00 frames. ], tot_loss[loss=0.121, simple_loss=0.13, pruned_loss=0.04346, audio_tagging_loss=0.01252, over 3003351.75 frames. ], batch size: 59, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:37:53,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=166033.33333333334, ans=0.1 2023-11-18 09:37:54,653 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:37:58,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=166033.33333333334, ans=0.0 2023-11-18 09:38:03,430 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.295e+01 1.046e+02 1.125e+02 1.279e+02 2.412e+02, threshold=2.250e+02, percent-clipped=1.0 2023-11-18 09:38:17,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=166166.66666666666, ans=0.2 2023-11-18 09:38:18,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=166166.66666666666, ans=0.09899494936611666 2023-11-18 09:38:35,532 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 900, loss[loss=0.1174, simple_loss=0.1292, pruned_loss=0.04157, audio_tagging_loss=0.01119, over 15185.00 frames. ], tot_loss[loss=0.1209, simple_loss=0.1297, pruned_loss=0.04342, audio_tagging_loss=0.01261, over 3009828.53 frames. ], batch size: 57, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:39:02,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.91 vs. limit=15.0 2023-11-18 09:39:09,603 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.13 vs. limit=15.0 2023-11-18 09:39:23,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=166566.66666666666, ans=0.0 2023-11-18 09:39:24,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.20 vs. limit=15.0 2023-11-18 09:39:31,307 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 950, loss[loss=0.1108, simple_loss=0.1138, pruned_loss=0.04098, audio_tagging_loss=0.01291, over 15109.00 frames. ], tot_loss[loss=0.1209, simple_loss=0.1298, pruned_loss=0.04342, audio_tagging_loss=0.01252, over 3021353.79 frames. ], batch size: 59, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:39:36,286 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.35 vs. limit=22.5 2023-11-18 09:39:46,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.52 vs. limit=15.0 2023-11-18 09:39:54,127 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 9.509e+01 1.090e+02 1.237e+02 1.820e+02, threshold=2.179e+02, percent-clipped=0.0 2023-11-18 09:39:56,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=166766.66666666666, ans=0.1 2023-11-18 09:40:02,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=166766.66666666666, ans=0.0 2023-11-18 09:40:10,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.65 vs. limit=22.5 2023-11-18 09:40:26,289 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1000, loss[loss=0.1083, simple_loss=0.1147, pruned_loss=0.03762, audio_tagging_loss=0.01339, over 13819.00 frames. ], tot_loss[loss=0.1206, simple_loss=0.1298, pruned_loss=0.04345, audio_tagging_loss=0.01225, over 3028782.15 frames. ], batch size: 53, lr: 2.25e-02, grad_scale: 64.0 2023-11-18 09:40:28,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=166966.66666666666, ans=0.1 2023-11-18 09:40:40,225 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=9.514e-01 2023-11-18 09:40:50,186 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:40:58,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=167100.0, ans=0.0 2023-11-18 09:41:22,341 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1050, loss[loss=0.1451, simple_loss=0.1586, pruned_loss=0.05358, audio_tagging_loss=0.01222, over 16771.00 frames. ], tot_loss[loss=0.1202, simple_loss=0.1293, pruned_loss=0.04342, audio_tagging_loss=0.01215, over 3034731.88 frames. ], batch size: 61, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:41:30,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=167300.0, ans=0.125 2023-11-18 09:41:36,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.52 vs. limit=22.5 2023-11-18 09:41:38,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=167366.66666666666, ans=22.5 2023-11-18 09:41:44,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=167433.33333333334, ans=0.125 2023-11-18 09:41:44,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=167433.33333333334, ans=0.125 2023-11-18 09:41:45,461 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.835e+01 9.727e+01 1.056e+02 1.215e+02 1.619e+02, threshold=2.112e+02, percent-clipped=0.0 2023-11-18 09:41:50,090 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.26 vs. limit=15.0 2023-11-18 09:42:01,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.67 vs. limit=15.0 2023-11-18 09:42:18,380 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1100, loss[loss=0.1358, simple_loss=0.1527, pruned_loss=0.04726, audio_tagging_loss=0.01218, over 15431.00 frames. ], tot_loss[loss=0.12, simple_loss=0.1293, pruned_loss=0.04335, audio_tagging_loss=0.01207, over 3035378.65 frames. ], batch size: 56, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:42:21,534 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:42:31,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=167700.0, ans=0.2 2023-11-18 09:42:48,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=167766.66666666666, ans=0.0 2023-11-18 09:42:59,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=167833.33333333334, ans=0.1 2023-11-18 09:43:10,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=167900.0, ans=0.1 2023-11-18 09:43:13,569 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1150, loss[loss=0.1284, simple_loss=0.138, pruned_loss=0.04835, audio_tagging_loss=0.0111, over 15681.00 frames. ], tot_loss[loss=0.1207, simple_loss=0.1303, pruned_loss=0.04363, audio_tagging_loss=0.01193, over 3031893.00 frames. ], batch size: 57, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:43:19,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=167966.66666666666, ans=0.09899494936611666 2023-11-18 09:43:22,969 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.86 vs. limit=15.0 2023-11-18 09:43:23,917 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.32 vs. limit=15.0 2023-11-18 09:43:34,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=168033.33333333334, ans=0.125 2023-11-18 09:43:37,788 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 9.862e+01 1.112e+02 1.270e+02 2.649e+02, threshold=2.225e+02, percent-clipped=1.0 2023-11-18 09:43:48,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=168166.66666666666, ans=0.125 2023-11-18 09:44:09,520 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1200, loss[loss=0.1242, simple_loss=0.1342, pruned_loss=0.04599, audio_tagging_loss=0.01106, over 16059.00 frames. ], tot_loss[loss=0.1215, simple_loss=0.131, pruned_loss=0.04403, audio_tagging_loss=0.01194, over 3037672.60 frames. ], batch size: 59, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:44:21,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=168366.66666666666, ans=0.2 2023-11-18 09:44:24,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=168366.66666666666, ans=0.1 2023-11-18 09:44:30,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=168366.66666666666, ans=0.0 2023-11-18 09:44:51,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=168500.0, ans=0.09899494936611666 2023-11-18 09:44:55,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=168566.66666666666, ans=0.0 2023-11-18 09:45:05,696 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1250, loss[loss=0.09273, simple_loss=0.0956, pruned_loss=0.02967, audio_tagging_loss=0.01525, over 14553.00 frames. ], tot_loss[loss=0.1208, simple_loss=0.13, pruned_loss=0.04372, audio_tagging_loss=0.01211, over 3037047.88 frames. ], batch size: 54, lr: 2.24e-02, grad_scale: 64.0 2023-11-18 09:45:06,281 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.61 vs. limit=22.5 2023-11-18 09:45:26,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.40 vs. limit=6.0 2023-11-18 09:45:26,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.99 vs. limit=6.0 2023-11-18 09:45:28,400 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.825e+01 1.002e+02 1.131e+02 1.253e+02 1.979e+02, threshold=2.263e+02, percent-clipped=0.0 2023-11-18 09:45:41,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.82 vs. limit=22.5 2023-11-18 09:45:45,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=168833.33333333334, ans=0.0 2023-11-18 09:45:56,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=168900.0, ans=0.2 2023-11-18 09:46:00,851 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1300, loss[loss=0.1032, simple_loss=0.1129, pruned_loss=0.03487, audio_tagging_loss=0.01187, over 15382.00 frames. ], tot_loss[loss=0.1189, simple_loss=0.1279, pruned_loss=0.04281, audio_tagging_loss=0.0121, over 3036972.05 frames. ], batch size: 57, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:46:05,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=168966.66666666666, ans=0.125 2023-11-18 09:46:05,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=168966.66666666666, ans=0.1 2023-11-18 09:46:19,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=169033.33333333334, ans=0.1 2023-11-18 09:46:27,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=169100.0, ans=0.125 2023-11-18 09:46:28,402 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.83 vs. limit=6.0 2023-11-18 09:46:48,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=169233.33333333334, ans=0.0 2023-11-18 09:46:49,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=169233.33333333334, ans=0.2 2023-11-18 09:46:55,926 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1350, loss[loss=0.1146, simple_loss=0.127, pruned_loss=0.04014, audio_tagging_loss=0.01089, over 14763.00 frames. ], tot_loss[loss=0.1196, simple_loss=0.1287, pruned_loss=0.04309, audio_tagging_loss=0.01221, over 3033540.34 frames. ], batch size: 55, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:46:59,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=169300.0, ans=0.025 2023-11-18 09:47:01,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=169300.0, ans=0.125 2023-11-18 09:47:07,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=169366.66666666666, ans=0.2 2023-11-18 09:47:07,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=169366.66666666666, ans=0.125 2023-11-18 09:47:15,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=169366.66666666666, ans=0.1 2023-11-18 09:47:17,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=169366.66666666666, ans=0.1 2023-11-18 09:47:19,992 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.798e+01 9.482e+01 1.049e+02 1.147e+02 1.889e+02, threshold=2.098e+02, percent-clipped=0.0 2023-11-18 09:47:37,374 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 09:47:52,638 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1400, loss[loss=0.0877, simple_loss=0.0952, pruned_loss=0.02918, audio_tagging_loss=0.01092, over 15076.00 frames. ], tot_loss[loss=0.1211, simple_loss=0.1305, pruned_loss=0.04369, audio_tagging_loss=0.01214, over 3040187.37 frames. ], batch size: 59, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:48:45,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=169900.0, ans=0.125 2023-11-18 09:48:47,403 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1450, loss[loss=0.1001, simple_loss=0.1078, pruned_loss=0.03011, audio_tagging_loss=0.01606, over 15519.00 frames. ], tot_loss[loss=0.1204, simple_loss=0.1294, pruned_loss=0.04324, audio_tagging_loss=0.01246, over 3042824.97 frames. ], batch size: 58, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:48:49,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=169966.66666666666, ans=0.125 2023-11-18 09:48:51,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=169966.66666666666, ans=0.2 2023-11-18 09:48:52,428 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:49:11,551 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.160e+01 9.587e+01 1.090e+02 1.197e+02 1.611e+02, threshold=2.181e+02, percent-clipped=0.0 2023-11-18 09:49:24,968 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.03 vs. limit=15.0 2023-11-18 09:49:35,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=170233.33333333334, ans=0.125 2023-11-18 09:49:42,819 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1500, loss[loss=0.1477, simple_loss=0.1511, pruned_loss=0.05605, audio_tagging_loss=0.01613, over 15766.00 frames. ], tot_loss[loss=0.1203, simple_loss=0.1292, pruned_loss=0.04314, audio_tagging_loss=0.01255, over 3038124.01 frames. ], batch size: 58, lr: 2.23e-02, grad_scale: 64.0 2023-11-18 09:49:44,610 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.19 vs. limit=15.0 2023-11-18 09:50:03,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=170366.66666666666, ans=0.125 2023-11-18 09:50:04,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170433.33333333334, ans=0.1 2023-11-18 09:50:23,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff3.min_abs, batch_count=170500.0, ans=0.2 2023-11-18 09:50:26,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=170566.66666666666, ans=0.125 2023-11-18 09:50:28,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=170566.66666666666, ans=0.125 2023-11-18 09:50:30,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=170566.66666666666, ans=0.125 2023-11-18 09:50:39,364 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1550, loss[loss=0.1412, simple_loss=0.1445, pruned_loss=0.05265, audio_tagging_loss=0.01629, over 14737.00 frames. ], tot_loss[loss=0.12, simple_loss=0.1291, pruned_loss=0.04302, audio_tagging_loss=0.01249, over 3046626.67 frames. ], batch size: 55, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:50:48,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=170633.33333333334, ans=0.07 2023-11-18 09:50:58,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=170700.0, ans=0.0 2023-11-18 09:51:01,458 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2023-11-18 09:51:02,091 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 1.007e+02 1.092e+02 1.205e+02 1.689e+02, threshold=2.183e+02, percent-clipped=0.0 2023-11-18 09:51:09,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=170766.66666666666, ans=0.1 2023-11-18 09:51:17,597 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=7.879e-01 2023-11-18 09:51:23,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=170900.0, ans=0.1 2023-11-18 09:51:28,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=170900.0, ans=0.1 2023-11-18 09:51:33,672 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.77 vs. limit=22.5 2023-11-18 09:51:34,207 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1600, loss[loss=0.1188, simple_loss=0.1302, pruned_loss=0.04085, audio_tagging_loss=0.01291, over 15830.00 frames. ], tot_loss[loss=0.1213, simple_loss=0.1303, pruned_loss=0.04369, audio_tagging_loss=0.01251, over 3048055.03 frames. ], batch size: 60, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:51:53,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=171033.33333333334, ans=0.125 2023-11-18 09:52:29,529 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1650, loss[loss=0.131, simple_loss=0.1463, pruned_loss=0.04617, audio_tagging_loss=0.01167, over 15323.00 frames. ], tot_loss[loss=0.1221, simple_loss=0.1314, pruned_loss=0.04403, audio_tagging_loss=0.01233, over 3049468.07 frames. ], batch size: 56, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:52:30,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=171300.0, ans=0.2 2023-11-18 09:52:52,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.81 vs. limit=15.0 2023-11-18 09:52:52,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2023-11-18 09:52:53,344 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.843e+01 9.689e+01 1.063e+02 1.242e+02 1.763e+02, threshold=2.126e+02, percent-clipped=0.0 2023-11-18 09:53:08,560 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.26 vs. limit=12.0 2023-11-18 09:53:10,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=171500.0, ans=0.0 2023-11-18 09:53:17,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=171566.66666666666, ans=0.1 2023-11-18 09:53:26,190 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1700, loss[loss=0.1141, simple_loss=0.122, pruned_loss=0.03764, audio_tagging_loss=0.01546, over 15473.00 frames. ], tot_loss[loss=0.1205, simple_loss=0.1297, pruned_loss=0.04316, audio_tagging_loss=0.01254, over 3045859.11 frames. ], batch size: 56, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:53:26,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=171633.33333333334, ans=0.125 2023-11-18 09:53:28,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=171633.33333333334, ans=0.125 2023-11-18 09:53:43,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.89 vs. limit=6.0 2023-11-18 09:53:45,619 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.04 vs. limit=6.0 2023-11-18 09:53:57,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.25 vs. limit=15.0 2023-11-18 09:54:00,123 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.598e+00 2023-11-18 09:54:02,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=171833.33333333334, ans=0.0 2023-11-18 09:54:19,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=171966.66666666666, ans=0.0 2023-11-18 09:54:20,930 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1750, loss[loss=0.1547, simple_loss=0.1629, pruned_loss=0.06475, audio_tagging_loss=0.008513, over 15366.00 frames. ], tot_loss[loss=0.1201, simple_loss=0.1294, pruned_loss=0.04289, audio_tagging_loss=0.01248, over 3048821.72 frames. ], batch size: 54, lr: 2.22e-02, grad_scale: 64.0 2023-11-18 09:54:44,389 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 9.833e+01 1.116e+02 1.265e+02 1.757e+02, threshold=2.232e+02, percent-clipped=0.0 2023-11-18 09:54:44,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=172100.0, ans=0.125 2023-11-18 09:55:02,558 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:55:04,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=172233.33333333334, ans=0.2 2023-11-18 09:55:11,404 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.51 vs. limit=15.0 2023-11-18 09:55:16,038 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1800, loss[loss=0.1323, simple_loss=0.1437, pruned_loss=0.0474, audio_tagging_loss=0.01301, over 14793.00 frames. ], tot_loss[loss=0.1199, simple_loss=0.1295, pruned_loss=0.04281, audio_tagging_loss=0.01234, over 3042912.43 frames. ], batch size: 56, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:55:19,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=172300.0, ans=0.125 2023-11-18 09:55:38,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=172433.33333333334, ans=0.125 2023-11-18 09:55:39,772 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.64 vs. limit=12.0 2023-11-18 09:55:43,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.84 vs. limit=22.5 2023-11-18 09:55:48,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=172500.0, ans=0.0 2023-11-18 09:56:12,367 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1850, loss[loss=0.1146, simple_loss=0.1215, pruned_loss=0.04267, audio_tagging_loss=0.0112, over 15935.00 frames. ], tot_loss[loss=0.1197, simple_loss=0.1291, pruned_loss=0.04292, audio_tagging_loss=0.01222, over 3037677.24 frames. ], batch size: 60, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:56:26,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=172700.0, ans=0.2 2023-11-18 09:56:28,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.41 vs. limit=12.0 2023-11-18 09:56:34,451 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.476e+01 9.349e+01 1.016e+02 1.150e+02 1.872e+02, threshold=2.031e+02, percent-clipped=0.0 2023-11-18 09:56:56,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=172900.0, ans=0.07 2023-11-18 09:57:01,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=172900.0, ans=0.1 2023-11-18 09:57:07,224 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1900, loss[loss=0.1008, simple_loss=0.1106, pruned_loss=0.03448, audio_tagging_loss=0.01107, over 15004.00 frames. ], tot_loss[loss=0.1192, simple_loss=0.1289, pruned_loss=0.04258, audio_tagging_loss=0.01219, over 3045165.79 frames. ], batch size: 54, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:57:11,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=172966.66666666666, ans=0.0 2023-11-18 09:58:02,653 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 1950, loss[loss=0.1321, simple_loss=0.1349, pruned_loss=0.0489, audio_tagging_loss=0.01576, over 15418.00 frames. ], tot_loss[loss=0.1206, simple_loss=0.13, pruned_loss=0.04339, audio_tagging_loss=0.01221, over 3042048.43 frames. ], batch size: 58, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:58:06,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=173300.0, ans=0.07 2023-11-18 09:58:21,712 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 09:58:24,329 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.52 vs. limit=15.0 2023-11-18 09:58:25,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=173433.33333333334, ans=0.0 2023-11-18 09:58:26,728 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.049e+01 9.559e+01 1.056e+02 1.197e+02 1.715e+02, threshold=2.112e+02, percent-clipped=0.0 2023-11-18 09:58:39,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=12.0 2023-11-18 09:58:44,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=173500.0, ans=10.0 2023-11-18 09:58:49,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=173566.66666666666, ans=0.125 2023-11-18 09:58:58,537 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2000, loss[loss=0.118, simple_loss=0.1134, pruned_loss=0.04196, audio_tagging_loss=0.01938, over 16294.00 frames. ], tot_loss[loss=0.1204, simple_loss=0.1294, pruned_loss=0.04342, audio_tagging_loss=0.01224, over 3044020.95 frames. ], batch size: 61, lr: 2.21e-02, grad_scale: 64.0 2023-11-18 09:59:30,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=173833.33333333334, ans=0.125 2023-11-18 09:59:32,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=173833.33333333334, ans=0.125 2023-11-18 09:59:40,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=173833.33333333334, ans=0.0 2023-11-18 09:59:47,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=173900.0, ans=0.125 2023-11-18 09:59:53,999 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2050, loss[loss=0.1098, simple_loss=0.1238, pruned_loss=0.03874, audio_tagging_loss=0.009185, over 15737.00 frames. ], tot_loss[loss=0.1205, simple_loss=0.1301, pruned_loss=0.04337, audio_tagging_loss=0.01207, over 3041530.67 frames. ], batch size: 59, lr: 2.20e-02, grad_scale: 64.0 2023-11-18 10:00:14,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=174100.0, ans=0.125 2023-11-18 10:00:16,276 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.215e+01 1.049e+02 1.194e+02 1.365e+02 2.043e+02, threshold=2.387e+02, percent-clipped=0.0 2023-11-18 10:00:19,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=174100.0, ans=0.1 2023-11-18 10:00:25,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=174100.0, ans=0.2 2023-11-18 10:00:42,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=174233.33333333334, ans=0.125 2023-11-18 10:00:48,869 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2100, loss[loss=0.1479, simple_loss=0.1706, pruned_loss=0.04984, audio_tagging_loss=0.0127, over 16348.00 frames. ], tot_loss[loss=0.1212, simple_loss=0.131, pruned_loss=0.04356, audio_tagging_loss=0.01211, over 3045980.32 frames. ], batch size: 56, lr: 2.20e-02, grad_scale: 128.0 2023-11-18 10:01:00,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=174366.66666666666, ans=0.125 2023-11-18 10:01:17,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=174433.33333333334, ans=0.0 2023-11-18 10:01:25,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=174500.0, ans=0.0 2023-11-18 10:01:44,264 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2150, loss[loss=0.104, simple_loss=0.1069, pruned_loss=0.03703, audio_tagging_loss=0.01351, over 14295.00 frames. ], tot_loss[loss=0.1217, simple_loss=0.1313, pruned_loss=0.04393, audio_tagging_loss=0.01214, over 3040101.40 frames. ], batch size: 54, lr: 2.20e-02, grad_scale: 128.0 2023-11-18 10:01:58,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=174700.0, ans=0.125 2023-11-18 10:02:02,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=174700.0, ans=0.2 2023-11-18 10:02:04,521 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.53 vs. limit=10.0 2023-11-18 10:02:08,232 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.984e+01 9.849e+01 1.118e+02 1.250e+02 1.648e+02, threshold=2.236e+02, percent-clipped=0.0 2023-11-18 10:02:18,836 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:02:28,860 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.13 vs. limit=15.0 2023-11-18 10:02:34,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=174900.0, ans=0.07 2023-11-18 10:02:41,110 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2200, loss[loss=0.1151, simple_loss=0.1165, pruned_loss=0.0436, audio_tagging_loss=0.01325, over 15413.00 frames. ], tot_loss[loss=0.1215, simple_loss=0.1311, pruned_loss=0.04378, audio_tagging_loss=0.01218, over 3043883.49 frames. ], batch size: 59, lr: 2.20e-02, grad_scale: 128.0 2023-11-18 10:02:41,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=174966.66666666666, ans=0.0 2023-11-18 10:02:42,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2023-11-18 10:02:43,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=174966.66666666666, ans=0.125 2023-11-18 10:02:48,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=174966.66666666666, ans=0.125 2023-11-18 10:03:15,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=175166.66666666666, ans=0.125 2023-11-18 10:03:26,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=175233.33333333334, ans=0.2 2023-11-18 10:03:36,347 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2250, loss[loss=0.1599, simple_loss=0.1697, pruned_loss=0.06553, audio_tagging_loss=0.009483, over 15170.00 frames. ], tot_loss[loss=0.1215, simple_loss=0.1312, pruned_loss=0.04382, audio_tagging_loss=0.01209, over 3040236.40 frames. ], batch size: 55, lr: 2.20e-02, grad_scale: 64.0 2023-11-18 10:03:42,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=175300.0, ans=0.0 2023-11-18 10:04:00,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=175433.33333333334, ans=0.125 2023-11-18 10:04:01,347 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.717e+01 9.758e+01 1.067e+02 1.178e+02 1.415e+02, threshold=2.133e+02, percent-clipped=0.0 2023-11-18 10:04:06,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=175433.33333333334, ans=0.125 2023-11-18 10:04:23,582 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.10 vs. limit=15.0 2023-11-18 10:04:24,591 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.39 vs. limit=15.0 2023-11-18 10:04:28,121 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.82 vs. limit=6.0 2023-11-18 10:04:32,096 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2300, loss[loss=0.1341, simple_loss=0.1399, pruned_loss=0.05084, audio_tagging_loss=0.01336, over 15542.00 frames. ], tot_loss[loss=0.1215, simple_loss=0.1312, pruned_loss=0.04379, audio_tagging_loss=0.01218, over 3038259.10 frames. ], batch size: 57, lr: 2.19e-02, grad_scale: 64.0 2023-11-18 10:04:33,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=175633.33333333334, ans=0.125 2023-11-18 10:04:34,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=175633.33333333334, ans=0.5 2023-11-18 10:04:36,968 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:04:42,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=175700.0, ans=0.0 2023-11-18 10:05:16,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=175900.0, ans=0.0 2023-11-18 10:05:22,650 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:05:23,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=175900.0, ans=0.125 2023-11-18 10:05:27,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=175966.66666666666, ans=0.2 2023-11-18 10:05:27,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=175966.66666666666, ans=0.125 2023-11-18 10:05:27,916 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2350, loss[loss=0.1075, simple_loss=0.111, pruned_loss=0.03707, audio_tagging_loss=0.01496, over 15561.00 frames. ], tot_loss[loss=0.1209, simple_loss=0.1301, pruned_loss=0.04347, audio_tagging_loss=0.01235, over 3040547.61 frames. ], batch size: 63, lr: 2.19e-02, grad_scale: 64.0 2023-11-18 10:05:33,483 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.98 vs. limit=22.5 2023-11-18 10:05:36,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.19 vs. limit=22.5 2023-11-18 10:05:51,951 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.821e+01 9.805e+01 1.113e+02 1.261e+02 1.707e+02, threshold=2.226e+02, percent-clipped=0.0 2023-11-18 10:06:11,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.08 vs. limit=12.0 2023-11-18 10:06:16,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=176233.33333333334, ans=0.125 2023-11-18 10:06:22,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=176300.0, ans=0.025 2023-11-18 10:06:23,647 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2400, loss[loss=0.111, simple_loss=0.1214, pruned_loss=0.03889, audio_tagging_loss=0.0114, over 14771.00 frames. ], tot_loss[loss=0.12, simple_loss=0.129, pruned_loss=0.04293, audio_tagging_loss=0.01256, over 3037183.57 frames. ], batch size: 57, lr: 2.19e-02, grad_scale: 32.0 2023-11-18 10:06:30,474 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.82 vs. limit=22.5 2023-11-18 10:07:19,262 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2450, loss[loss=0.126, simple_loss=0.1356, pruned_loss=0.04525, audio_tagging_loss=0.01296, over 14911.00 frames. ], tot_loss[loss=0.12, simple_loss=0.1291, pruned_loss=0.04285, audio_tagging_loss=0.01255, over 3043377.86 frames. ], batch size: 57, lr: 2.19e-02, grad_scale: 32.0 2023-11-18 10:07:45,100 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 1.013e+02 1.126e+02 1.298e+02 2.274e+02, threshold=2.253e+02, percent-clipped=1.0 2023-11-18 10:08:15,428 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2500, loss[loss=0.1085, simple_loss=0.1278, pruned_loss=0.03362, audio_tagging_loss=0.01095, over 14885.00 frames. ], tot_loss[loss=0.1192, simple_loss=0.1283, pruned_loss=0.04244, audio_tagging_loss=0.01258, over 3037170.89 frames. ], batch size: 57, lr: 2.19e-02, grad_scale: 32.0 2023-11-18 10:08:17,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=176966.66666666666, ans=0.1 2023-11-18 10:08:55,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=177166.66666666666, ans=0.125 2023-11-18 10:09:00,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=177233.33333333334, ans=0.125 2023-11-18 10:09:06,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=177233.33333333334, ans=0.2 2023-11-18 10:09:10,978 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2550, loss[loss=0.08053, simple_loss=0.08927, pruned_loss=0.02282, audio_tagging_loss=0.01306, over 14786.00 frames. ], tot_loss[loss=0.1189, simple_loss=0.1279, pruned_loss=0.04238, audio_tagging_loss=0.01256, over 3040481.97 frames. ], batch size: 56, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:09:19,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=177300.0, ans=0.125 2023-11-18 10:09:28,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=177366.66666666666, ans=0.125 2023-11-18 10:09:37,081 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.896e+01 9.703e+01 1.094e+02 1.267e+02 1.679e+02, threshold=2.187e+02, percent-clipped=0.0 2023-11-18 10:09:44,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=177500.0, ans=0.0 2023-11-18 10:09:44,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=177500.0, ans=0.1 2023-11-18 10:09:45,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=177500.0, ans=0.125 2023-11-18 10:09:52,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=177500.0, ans=10.0 2023-11-18 10:10:06,250 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2600, loss[loss=0.09858, simple_loss=0.1119, pruned_loss=0.03, audio_tagging_loss=0.01263, over 15788.00 frames. ], tot_loss[loss=0.1186, simple_loss=0.1279, pruned_loss=0.04221, audio_tagging_loss=0.01242, over 3034118.84 frames. ], batch size: 57, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:10:10,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=177633.33333333334, ans=0.0 2023-11-18 10:10:15,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=177633.33333333334, ans=0.2 2023-11-18 10:10:49,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.07 vs. limit=22.5 2023-11-18 10:10:53,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=177900.0, ans=0.125 2023-11-18 10:11:03,109 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2650, loss[loss=0.1184, simple_loss=0.1233, pruned_loss=0.04264, audio_tagging_loss=0.01411, over 16206.00 frames. ], tot_loss[loss=0.1189, simple_loss=0.1281, pruned_loss=0.0425, audio_tagging_loss=0.01237, over 3028310.07 frames. ], batch size: 64, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:11:16,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=178033.33333333334, ans=0.0 2023-11-18 10:11:27,750 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.648e+01 9.789e+01 1.066e+02 1.192e+02 1.496e+02, threshold=2.133e+02, percent-clipped=0.0 2023-11-18 10:11:48,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=178233.33333333334, ans=0.1 2023-11-18 10:11:57,879 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2700, loss[loss=0.1115, simple_loss=0.1203, pruned_loss=0.04083, audio_tagging_loss=0.01051, over 14878.00 frames. ], tot_loss[loss=0.1193, simple_loss=0.1291, pruned_loss=0.04271, audio_tagging_loss=0.0121, over 3035901.38 frames. ], batch size: 55, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:12:00,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=178300.0, ans=0.2 2023-11-18 10:12:06,252 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.53 vs. limit=15.0 2023-11-18 10:12:14,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=178366.66666666666, ans=0.0 2023-11-18 10:12:24,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=178433.33333333334, ans=0.015 2023-11-18 10:12:25,607 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.828e+00 2023-11-18 10:12:26,109 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.31 vs. limit=22.5 2023-11-18 10:12:53,262 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2750, loss[loss=0.1158, simple_loss=0.1347, pruned_loss=0.03868, audio_tagging_loss=0.009778, over 14546.00 frames. ], tot_loss[loss=0.1188, simple_loss=0.1282, pruned_loss=0.04251, audio_tagging_loss=0.01217, over 3035966.79 frames. ], batch size: 54, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:13:05,650 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.51 vs. limit=15.0 2023-11-18 10:13:07,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=178700.0, ans=0.125 2023-11-18 10:13:10,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=178700.0, ans=0.0 2023-11-18 10:13:19,275 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.876e+01 1.008e+02 1.122e+02 1.241e+02 2.001e+02, threshold=2.244e+02, percent-clipped=0.0 2023-11-18 10:13:38,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=178900.0, ans=0.025 2023-11-18 10:13:41,733 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:13:41,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=178900.0, ans=0.125 2023-11-18 10:13:50,107 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2800, loss[loss=0.1007, simple_loss=0.09362, pruned_loss=0.03525, audio_tagging_loss=0.01868, over 16128.00 frames. ], tot_loss[loss=0.119, simple_loss=0.1281, pruned_loss=0.04271, audio_tagging_loss=0.01221, over 3041007.04 frames. ], batch size: 64, lr: 2.18e-02, grad_scale: 32.0 2023-11-18 10:13:52,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=178966.66666666666, ans=0.0 2023-11-18 10:13:53,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=178966.66666666666, ans=0.125 2023-11-18 10:13:59,212 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.73 vs. limit=15.0 2023-11-18 10:14:02,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=179033.33333333334, ans=0.125 2023-11-18 10:14:19,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2023-11-18 10:14:40,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=179233.33333333334, ans=0.1 2023-11-18 10:14:44,626 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2850, loss[loss=0.07605, simple_loss=0.07253, pruned_loss=0.02787, audio_tagging_loss=0.01192, over 14416.00 frames. ], tot_loss[loss=0.1182, simple_loss=0.1276, pruned_loss=0.04229, audio_tagging_loss=0.01215, over 3044987.28 frames. ], batch size: 56, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:14:46,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=179300.0, ans=6.0 2023-11-18 10:14:47,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=179300.0, ans=0.125 2023-11-18 10:14:48,280 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.61 vs. limit=12.0 2023-11-18 10:15:07,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=179433.33333333334, ans=0.125 2023-11-18 10:15:10,730 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.435e+01 9.812e+01 1.069e+02 1.186e+02 1.678e+02, threshold=2.137e+02, percent-clipped=0.0 2023-11-18 10:15:13,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=179433.33333333334, ans=0.125 2023-11-18 10:15:16,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=179433.33333333334, ans=0.125 2023-11-18 10:15:22,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=179500.0, ans=0.1 2023-11-18 10:15:27,408 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.32 vs. limit=22.5 2023-11-18 10:15:31,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=179566.66666666666, ans=0.125 2023-11-18 10:15:38,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=179633.33333333334, ans=0.2 2023-11-18 10:15:39,682 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2900, loss[loss=0.1273, simple_loss=0.1394, pruned_loss=0.04845, audio_tagging_loss=0.009101, over 14323.00 frames. ], tot_loss[loss=0.1194, simple_loss=0.1288, pruned_loss=0.04286, audio_tagging_loss=0.0121, over 3040959.42 frames. ], batch size: 58, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:15:50,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=179700.0, ans=0.125 2023-11-18 10:15:51,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=179700.0, ans=0.0 2023-11-18 10:16:11,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=179766.66666666666, ans=0.125 2023-11-18 10:16:13,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=179833.33333333334, ans=0.125 2023-11-18 10:16:20,384 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.23 vs. limit=22.5 2023-11-18 10:16:23,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=179900.0, ans=0.04949747468305833 2023-11-18 10:16:36,732 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 2950, loss[loss=0.085, simple_loss=0.08765, pruned_loss=0.02691, audio_tagging_loss=0.01427, over 15551.00 frames. ], tot_loss[loss=0.1192, simple_loss=0.1285, pruned_loss=0.04279, audio_tagging_loss=0.01221, over 3048318.22 frames. ], batch size: 59, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:16:40,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=179966.66666666666, ans=0.1 2023-11-18 10:17:01,085 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.805e+01 9.762e+01 1.073e+02 1.254e+02 1.837e+02, threshold=2.146e+02, percent-clipped=0.0 2023-11-18 10:17:12,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=180166.66666666666, ans=0.05 2023-11-18 10:17:29,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=180233.33333333334, ans=0.0 2023-11-18 10:17:32,031 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3000, loss[loss=0.1115, simple_loss=0.1083, pruned_loss=0.04598, audio_tagging_loss=0.01133, over 13727.00 frames. ], tot_loss[loss=0.1195, simple_loss=0.1289, pruned_loss=0.04286, audio_tagging_loss=0.01224, over 3045783.10 frames. ], batch size: 54, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:17:32,033 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 10:18:02,500 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([0.3714, 1.9018, 2.2601, 2.0291, 2.8311, 3.0452, 3.0074, 2.4616], device='cuda:0') 2023-11-18 10:18:03,209 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.6642, 3.7442, 4.3813, 3.5449], device='cuda:0') 2023-11-18 10:18:05,517 INFO [train_asr.py:1147] (0/4) Epoch 3, validation: loss=0.08163, simple_loss=0.06585, pruned_loss=0.01265, audio_tagging_loss=0.03605, over 4681554.00 frames. 2023-11-18 10:18:05,518 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 10:18:05,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=180300.0, ans=0.0 2023-11-18 10:18:11,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=180300.0, ans=0.125 2023-11-18 10:18:37,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=180500.0, ans=0.1 2023-11-18 10:18:45,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=180500.0, ans=0.125 2023-11-18 10:18:46,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=180500.0, ans=0.0 2023-11-18 10:19:01,891 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3050, loss[loss=0.1576, simple_loss=0.1746, pruned_loss=0.0605, audio_tagging_loss=0.009763, over 15113.00 frames. ], tot_loss[loss=0.1199, simple_loss=0.1293, pruned_loss=0.04295, audio_tagging_loss=0.01231, over 3047148.48 frames. ], batch size: 56, lr: 2.17e-02, grad_scale: 32.0 2023-11-18 10:19:09,958 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.87 vs. limit=15.0 2023-11-18 10:19:26,064 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 9.626e+01 1.059e+02 1.215e+02 1.726e+02, threshold=2.118e+02, percent-clipped=0.0 2023-11-18 10:19:34,001 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:19:43,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=180833.33333333334, ans=0.125 2023-11-18 10:19:47,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=180900.0, ans=0.2 2023-11-18 10:19:56,866 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3100, loss[loss=0.1284, simple_loss=0.142, pruned_loss=0.04633, audio_tagging_loss=0.01111, over 15187.00 frames. ], tot_loss[loss=0.1212, simple_loss=0.1305, pruned_loss=0.04354, audio_tagging_loss=0.01234, over 3046322.33 frames. ], batch size: 57, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:20:10,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=181033.33333333334, ans=0.0 2023-11-18 10:20:44,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=181233.33333333334, ans=0.0 2023-11-18 10:20:51,943 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3150, loss[loss=0.1048, simple_loss=0.1144, pruned_loss=0.03303, audio_tagging_loss=0.01456, over 14906.00 frames. ], tot_loss[loss=0.1211, simple_loss=0.1309, pruned_loss=0.04332, audio_tagging_loss=0.0124, over 3043617.54 frames. ], batch size: 57, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:20:53,670 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.99 vs. limit=15.0 2023-11-18 10:21:10,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=181366.66666666666, ans=0.125 2023-11-18 10:21:13,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=181366.66666666666, ans=0.025 2023-11-18 10:21:14,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=181433.33333333334, ans=0.1 2023-11-18 10:21:15,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=181433.33333333334, ans=0.1 2023-11-18 10:21:17,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=181433.33333333334, ans=0.2 2023-11-18 10:21:18,110 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.858e+01 9.872e+01 1.154e+02 1.398e+02 2.452e+02, threshold=2.308e+02, percent-clipped=3.0 2023-11-18 10:21:32,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=181500.0, ans=0.125 2023-11-18 10:21:33,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=181500.0, ans=0.125 2023-11-18 10:21:37,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=181566.66666666666, ans=0.125 2023-11-18 10:21:40,241 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.65 vs. limit=15.0 2023-11-18 10:21:42,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=181566.66666666666, ans=0.125 2023-11-18 10:21:43,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=181566.66666666666, ans=0.1 2023-11-18 10:21:47,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=181633.33333333334, ans=0.125 2023-11-18 10:21:48,262 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3200, loss[loss=0.12, simple_loss=0.1322, pruned_loss=0.04354, audio_tagging_loss=0.01039, over 14777.00 frames. ], tot_loss[loss=0.1202, simple_loss=0.1301, pruned_loss=0.0427, audio_tagging_loss=0.01247, over 3035036.66 frames. ], batch size: 58, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:21:48,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=181633.33333333334, ans=0.0 2023-11-18 10:21:53,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.96 vs. limit=15.0 2023-11-18 10:22:01,421 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.16 vs. limit=15.0 2023-11-18 10:22:03,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=181700.0, ans=0.0 2023-11-18 10:22:15,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=181766.66666666666, ans=0.1 2023-11-18 10:22:39,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=181900.0, ans=0.09899494936611666 2023-11-18 10:22:41,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=181900.0, ans=0.125 2023-11-18 10:22:42,989 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3250, loss[loss=0.1246, simple_loss=0.1269, pruned_loss=0.04373, audio_tagging_loss=0.01739, over 15083.00 frames. ], tot_loss[loss=0.1208, simple_loss=0.1306, pruned_loss=0.04296, audio_tagging_loss=0.01256, over 3044536.64 frames. ], batch size: 57, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:23:07,865 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 9.562e+01 1.039e+02 1.190e+02 1.635e+02, threshold=2.078e+02, percent-clipped=0.0 2023-11-18 10:23:36,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=182300.0, ans=0.0 2023-11-18 10:23:37,520 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3300, loss[loss=0.09697, simple_loss=0.09726, pruned_loss=0.03568, audio_tagging_loss=0.01265, over 15474.00 frames. ], tot_loss[loss=0.1206, simple_loss=0.1302, pruned_loss=0.04296, audio_tagging_loss=0.01255, over 3045185.07 frames. ], batch size: 61, lr: 2.16e-02, grad_scale: 32.0 2023-11-18 10:23:50,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=182366.66666666666, ans=0.125 2023-11-18 10:24:01,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=182433.33333333334, ans=0.0 2023-11-18 10:24:03,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=182433.33333333334, ans=0.125 2023-11-18 10:24:04,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=182433.33333333334, ans=0.125 2023-11-18 10:24:05,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=182433.33333333334, ans=0.125 2023-11-18 10:24:20,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=182566.66666666666, ans=0.125 2023-11-18 10:24:25,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=182566.66666666666, ans=0.1 2023-11-18 10:24:33,231 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3350, loss[loss=0.1353, simple_loss=0.1536, pruned_loss=0.04642, audio_tagging_loss=0.0121, over 14556.00 frames. ], tot_loss[loss=0.1207, simple_loss=0.1306, pruned_loss=0.04299, audio_tagging_loss=0.0124, over 3048083.90 frames. ], batch size: 54, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:24:35,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=182633.33333333334, ans=0.1 2023-11-18 10:24:44,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=182700.0, ans=0.0 2023-11-18 10:24:58,930 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.629e+01 9.499e+01 1.070e+02 1.220e+02 2.186e+02, threshold=2.139e+02, percent-clipped=1.0 2023-11-18 10:25:15,197 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.95 vs. limit=22.5 2023-11-18 10:25:25,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.56 vs. limit=6.0 2023-11-18 10:25:25,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=182900.0, ans=0.0 2023-11-18 10:25:29,561 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3400, loss[loss=0.0764, simple_loss=0.08143, pruned_loss=0.02291, audio_tagging_loss=0.01277, over 14247.00 frames. ], tot_loss[loss=0.1203, simple_loss=0.1304, pruned_loss=0.04298, audio_tagging_loss=0.01218, over 3045870.27 frames. ], batch size: 56, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:25:33,090 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.50 vs. limit=22.5 2023-11-18 10:26:03,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.37 vs. limit=22.5 2023-11-18 10:26:15,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=183233.33333333334, ans=0.0 2023-11-18 10:26:22,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=183233.33333333334, ans=0.125 2023-11-18 10:26:23,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.00 vs. limit=22.5 2023-11-18 10:26:24,425 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3450, loss[loss=0.1188, simple_loss=0.1293, pruned_loss=0.04512, audio_tagging_loss=0.009037, over 14046.00 frames. ], tot_loss[loss=0.1202, simple_loss=0.1305, pruned_loss=0.0429, audio_tagging_loss=0.01205, over 3040260.37 frames. ], batch size: 53, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:26:24,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=183300.0, ans=0.0 2023-11-18 10:26:32,724 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=16.20 vs. limit=22.5 2023-11-18 10:26:49,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=183433.33333333334, ans=0.125 2023-11-18 10:26:50,543 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 9.538e+01 1.062e+02 1.197e+02 2.158e+02, threshold=2.124e+02, percent-clipped=1.0 2023-11-18 10:26:54,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=183433.33333333334, ans=0.1 2023-11-18 10:26:57,539 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.07 vs. limit=15.0 2023-11-18 10:27:02,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=183500.0, ans=0.04949747468305833 2023-11-18 10:27:20,071 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3500, loss[loss=0.06255, simple_loss=0.06398, pruned_loss=0.01822, audio_tagging_loss=0.01234, over 15359.00 frames. ], tot_loss[loss=0.12, simple_loss=0.1301, pruned_loss=0.04301, audio_tagging_loss=0.0119, over 3038206.73 frames. ], batch size: 61, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:27:37,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=183700.0, ans=0.125 2023-11-18 10:27:38,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.95 vs. limit=15.0 2023-11-18 10:27:41,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=183766.66666666666, ans=0.125 2023-11-18 10:27:48,626 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:27:48,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=183766.66666666666, ans=0.125 2023-11-18 10:27:59,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=183833.33333333334, ans=0.125 2023-11-18 10:28:10,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=183900.0, ans=0.1 2023-11-18 10:28:10,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=183900.0, ans=0.0 2023-11-18 10:28:15,356 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.43 vs. limit=6.0 2023-11-18 10:28:15,713 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3550, loss[loss=0.1209, simple_loss=0.1332, pruned_loss=0.04452, audio_tagging_loss=0.009746, over 15649.00 frames. ], tot_loss[loss=0.1193, simple_loss=0.1295, pruned_loss=0.04264, audio_tagging_loss=0.01196, over 3047589.38 frames. ], batch size: 59, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:28:21,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=183966.66666666666, ans=0.1 2023-11-18 10:28:30,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=184033.33333333334, ans=0.125 2023-11-18 10:28:31,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=184033.33333333334, ans=0.125 2023-11-18 10:28:41,221 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.814e+01 9.739e+01 1.081e+02 1.236e+02 3.784e+02, threshold=2.163e+02, percent-clipped=1.0 2023-11-18 10:28:44,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=184100.0, ans=0.125 2023-11-18 10:28:58,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=184166.66666666666, ans=0.125 2023-11-18 10:29:11,453 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3600, loss[loss=0.1238, simple_loss=0.13, pruned_loss=0.04935, audio_tagging_loss=0.009463, over 15237.00 frames. ], tot_loss[loss=0.1189, simple_loss=0.1293, pruned_loss=0.04238, audio_tagging_loss=0.01189, over 3052530.98 frames. ], batch size: 59, lr: 2.15e-02, grad_scale: 32.0 2023-11-18 10:29:13,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=184300.0, ans=0.125 2023-11-18 10:29:16,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=184300.0, ans=0.1 2023-11-18 10:29:16,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=184300.0, ans=0.1 2023-11-18 10:29:22,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=184366.66666666666, ans=0.125 2023-11-18 10:29:23,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=184366.66666666666, ans=0.125 2023-11-18 10:29:30,409 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.32 vs. limit=22.5 2023-11-18 10:29:32,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.70 vs. limit=22.5 2023-11-18 10:30:06,996 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3650, loss[loss=0.09289, simple_loss=0.1082, pruned_loss=0.02891, audio_tagging_loss=0.009897, over 16092.00 frames. ], tot_loss[loss=0.1187, simple_loss=0.1289, pruned_loss=0.04239, audio_tagging_loss=0.01186, over 3058408.96 frames. ], batch size: 61, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:30:07,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=184633.33333333334, ans=0.125 2023-11-18 10:30:14,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=184633.33333333334, ans=0.0 2023-11-18 10:30:16,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=184633.33333333334, ans=0.125 2023-11-18 10:30:28,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=184766.66666666666, ans=0.125 2023-11-18 10:30:31,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=184766.66666666666, ans=0.04949747468305833 2023-11-18 10:30:32,818 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.231e+01 1.003e+02 1.085e+02 1.205e+02 1.999e+02, threshold=2.169e+02, percent-clipped=0.0 2023-11-18 10:30:50,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=184833.33333333334, ans=0.0 2023-11-18 10:31:00,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=184900.0, ans=0.125 2023-11-18 10:31:02,922 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3700, loss[loss=0.1125, simple_loss=0.1266, pruned_loss=0.03893, audio_tagging_loss=0.01023, over 14648.00 frames. ], tot_loss[loss=0.1191, simple_loss=0.1294, pruned_loss=0.04246, audio_tagging_loss=0.01195, over 3052317.45 frames. ], batch size: 56, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:31:20,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=185033.33333333334, ans=0.1 2023-11-18 10:31:42,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=15.10 vs. limit=15.0 2023-11-18 10:31:56,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=185233.33333333334, ans=0.125 2023-11-18 10:31:58,474 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3750, loss[loss=0.1666, simple_loss=0.1869, pruned_loss=0.06328, audio_tagging_loss=0.009896, over 16052.00 frames. ], tot_loss[loss=0.1197, simple_loss=0.13, pruned_loss=0.04271, audio_tagging_loss=0.01195, over 3060111.16 frames. ], batch size: 62, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:32:15,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=185366.66666666666, ans=0.0 2023-11-18 10:32:18,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=185366.66666666666, ans=0.05 2023-11-18 10:32:24,708 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.955e+01 9.670e+01 1.084e+02 1.200e+02 2.427e+02, threshold=2.168e+02, percent-clipped=1.0 2023-11-18 10:32:33,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.97 vs. limit=6.0 2023-11-18 10:32:37,374 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:32:48,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.60 vs. limit=22.5 2023-11-18 10:32:54,170 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3800, loss[loss=0.1369, simple_loss=0.1519, pruned_loss=0.04924, audio_tagging_loss=0.01174, over 16134.00 frames. ], tot_loss[loss=0.1199, simple_loss=0.1301, pruned_loss=0.04276, audio_tagging_loss=0.01212, over 3064423.28 frames. ], batch size: 58, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:32:55,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=185633.33333333334, ans=0.125 2023-11-18 10:33:01,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=185633.33333333334, ans=0.125 2023-11-18 10:33:15,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=185766.66666666666, ans=0.125 2023-11-18 10:33:44,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=185900.0, ans=0.125 2023-11-18 10:33:50,166 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3850, loss[loss=0.1378, simple_loss=0.1497, pruned_loss=0.05023, audio_tagging_loss=0.01274, over 15169.00 frames. ], tot_loss[loss=0.1189, simple_loss=0.1285, pruned_loss=0.0423, audio_tagging_loss=0.01236, over 3062675.98 frames. ], batch size: 55, lr: 2.14e-02, grad_scale: 32.0 2023-11-18 10:33:52,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=185966.66666666666, ans=0.125 2023-11-18 10:33:56,011 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.37 vs. limit=15.0 2023-11-18 10:34:15,529 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 1.009e+02 1.117e+02 1.269e+02 1.869e+02, threshold=2.233e+02, percent-clipped=0.0 2023-11-18 10:34:45,168 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3900, loss[loss=0.1292, simple_loss=0.1341, pruned_loss=0.04901, audio_tagging_loss=0.01311, over 15353.00 frames. ], tot_loss[loss=0.1183, simple_loss=0.1277, pruned_loss=0.04195, audio_tagging_loss=0.01249, over 3061933.04 frames. ], batch size: 55, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:35:10,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=186433.33333333334, ans=0.0 2023-11-18 10:35:26,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=186500.0, ans=0.125 2023-11-18 10:35:27,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=186500.0, ans=0.125 2023-11-18 10:35:27,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.89 vs. limit=15.0 2023-11-18 10:35:40,991 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 3950, loss[loss=0.1428, simple_loss=0.1454, pruned_loss=0.05667, audio_tagging_loss=0.01346, over 15352.00 frames. ], tot_loss[loss=0.1189, simple_loss=0.1283, pruned_loss=0.04208, audio_tagging_loss=0.01261, over 3058180.43 frames. ], batch size: 58, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:35:42,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=186633.33333333334, ans=0.2 2023-11-18 10:35:46,089 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-28000.pt 2023-11-18 10:36:03,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=186700.0, ans=0.125 2023-11-18 10:36:07,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=186766.66666666666, ans=0.125 2023-11-18 10:36:08,759 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 9.703e+01 1.079e+02 1.244e+02 1.846e+02, threshold=2.158e+02, percent-clipped=0.0 2023-11-18 10:36:12,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=186766.66666666666, ans=0.1 2023-11-18 10:36:16,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=186833.33333333334, ans=0.125 2023-11-18 10:36:28,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=186900.0, ans=0.04949747468305833 2023-11-18 10:36:37,714 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.32 vs. limit=15.0 2023-11-18 10:36:39,213 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4000, loss[loss=0.1302, simple_loss=0.1398, pruned_loss=0.04849, audio_tagging_loss=0.01186, over 15804.00 frames. ], tot_loss[loss=0.119, simple_loss=0.1286, pruned_loss=0.04218, audio_tagging_loss=0.01256, over 3055014.07 frames. ], batch size: 59, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:36:56,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=187033.33333333334, ans=0.2 2023-11-18 10:37:01,363 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.86 vs. limit=22.5 2023-11-18 10:37:02,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=187100.0, ans=0.125 2023-11-18 10:37:15,041 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.03 vs. limit=15.0 2023-11-18 10:37:17,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=187166.66666666666, ans=0.125 2023-11-18 10:37:22,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=187233.33333333334, ans=0.1 2023-11-18 10:37:34,009 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4050, loss[loss=0.1201, simple_loss=0.1301, pruned_loss=0.04332, audio_tagging_loss=0.01172, over 15543.00 frames. ], tot_loss[loss=0.1203, simple_loss=0.1298, pruned_loss=0.04282, audio_tagging_loss=0.01254, over 3056571.80 frames. ], batch size: 56, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:37:37,193 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:37:45,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=187366.66666666666, ans=0.1 2023-11-18 10:37:45,610 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.74 vs. limit=12.0 2023-11-18 10:38:00,301 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.024e+01 9.492e+01 1.077e+02 1.184e+02 1.546e+02, threshold=2.154e+02, percent-clipped=0.0 2023-11-18 10:38:11,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=187500.0, ans=0.05 2023-11-18 10:38:24,089 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.25 vs. limit=15.0 2023-11-18 10:38:26,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=187566.66666666666, ans=0.0 2023-11-18 10:38:30,034 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4100, loss[loss=0.1041, simple_loss=0.107, pruned_loss=0.04037, audio_tagging_loss=0.01019, over 14525.00 frames. ], tot_loss[loss=0.1196, simple_loss=0.129, pruned_loss=0.04244, audio_tagging_loss=0.0126, over 3054356.90 frames. ], batch size: 57, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:38:33,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=187633.33333333334, ans=0.125 2023-11-18 10:38:40,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=187700.0, ans=0.0 2023-11-18 10:38:55,178 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.98 vs. limit=15.0 2023-11-18 10:38:56,079 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.48 vs. limit=22.5 2023-11-18 10:39:02,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=187833.33333333334, ans=0.1 2023-11-18 10:39:06,603 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.34 vs. limit=15.0 2023-11-18 10:39:15,678 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:39:26,027 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4150, loss[loss=0.08956, simple_loss=0.09765, pruned_loss=0.02695, audio_tagging_loss=0.01379, over 14524.00 frames. ], tot_loss[loss=0.1196, simple_loss=0.1293, pruned_loss=0.04236, audio_tagging_loss=0.01254, over 3048212.06 frames. ], batch size: 56, lr: 2.13e-02, grad_scale: 32.0 2023-11-18 10:39:26,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=187966.66666666666, ans=0.0 2023-11-18 10:39:29,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=187966.66666666666, ans=0.1 2023-11-18 10:39:35,398 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.75 vs. limit=12.0 2023-11-18 10:39:36,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=188033.33333333334, ans=0.125 2023-11-18 10:39:37,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=188033.33333333334, ans=0.0 2023-11-18 10:39:50,385 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.052e+01 9.719e+01 1.039e+02 1.185e+02 1.497e+02, threshold=2.077e+02, percent-clipped=0.0 2023-11-18 10:40:02,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=188166.66666666666, ans=0.125 2023-11-18 10:40:07,375 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:40:10,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=188233.33333333334, ans=0.0 2023-11-18 10:40:21,055 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4200, loss[loss=0.1077, simple_loss=0.1033, pruned_loss=0.04186, audio_tagging_loss=0.01417, over 15018.00 frames. ], tot_loss[loss=0.1189, simple_loss=0.129, pruned_loss=0.04209, audio_tagging_loss=0.01233, over 3049095.50 frames. ], batch size: 58, lr: 2.12e-02, grad_scale: 32.0 2023-11-18 10:40:43,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=188433.33333333334, ans=15.0 2023-11-18 10:40:50,447 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.07 vs. limit=15.0 2023-11-18 10:41:04,197 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.20 vs. limit=10.0 2023-11-18 10:41:05,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=188566.66666666666, ans=0.125 2023-11-18 10:41:14,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=188633.33333333334, ans=0.125 2023-11-18 10:41:15,116 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4250, loss[loss=0.1228, simple_loss=0.1457, pruned_loss=0.04092, audio_tagging_loss=0.009067, over 15227.00 frames. ], tot_loss[loss=0.1194, simple_loss=0.13, pruned_loss=0.04222, audio_tagging_loss=0.01218, over 3049409.01 frames. ], batch size: 57, lr: 2.12e-02, grad_scale: 32.0 2023-11-18 10:41:29,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=188700.0, ans=0.0 2023-11-18 10:41:31,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.whiten.whitening_limit, batch_count=188700.0, ans=15.0 2023-11-18 10:41:32,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=188700.0, ans=0.125 2023-11-18 10:41:41,655 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.045e+01 9.813e+01 1.062e+02 1.234e+02 2.396e+02, threshold=2.125e+02, percent-clipped=1.0 2023-11-18 10:42:12,237 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4300, loss[loss=0.1515, simple_loss=0.1724, pruned_loss=0.05341, audio_tagging_loss=0.0119, over 16240.00 frames. ], tot_loss[loss=0.1187, simple_loss=0.129, pruned_loss=0.04196, audio_tagging_loss=0.0122, over 3055280.15 frames. ], batch size: 56, lr: 2.12e-02, grad_scale: 32.0 2023-11-18 10:42:31,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=189033.33333333334, ans=0.125 2023-11-18 10:42:31,648 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.95 vs. limit=22.5 2023-11-18 10:42:36,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=189100.0, ans=0.125 2023-11-18 10:42:38,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=189100.0, ans=0.0 2023-11-18 10:43:07,146 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4350, loss[loss=0.1076, simple_loss=0.1193, pruned_loss=0.03699, audio_tagging_loss=0.01091, over 15370.00 frames. ], tot_loss[loss=0.1194, simple_loss=0.13, pruned_loss=0.04229, audio_tagging_loss=0.01212, over 3053450.13 frames. ], batch size: 59, lr: 2.12e-02, grad_scale: 32.0 2023-11-18 10:43:27,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=189433.33333333334, ans=0.125 2023-11-18 10:43:33,110 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.563e+01 9.769e+01 1.106e+02 1.188e+02 1.814e+02, threshold=2.212e+02, percent-clipped=0.0 2023-11-18 10:43:35,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=189433.33333333334, ans=0.2 2023-11-18 10:43:35,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=189433.33333333334, ans=0.125 2023-11-18 10:43:37,223 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.56 vs. limit=10.0 2023-11-18 10:43:40,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=189500.0, ans=0.125 2023-11-18 10:44:02,007 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4400, loss[loss=0.1222, simple_loss=0.1389, pruned_loss=0.04092, audio_tagging_loss=0.01186, over 14560.00 frames. ], tot_loss[loss=0.1186, simple_loss=0.1291, pruned_loss=0.04195, audio_tagging_loss=0.01213, over 3056128.58 frames. ], batch size: 55, lr: 2.12e-02, grad_scale: 64.0 2023-11-18 10:44:05,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=189633.33333333334, ans=0.125 2023-11-18 10:44:10,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=189633.33333333334, ans=0.125 2023-11-18 10:44:18,254 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.689e-01 2023-11-18 10:44:28,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=189766.66666666666, ans=0.09899494936611666 2023-11-18 10:44:37,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=189833.33333333334, ans=0.015 2023-11-18 10:44:58,516 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4450, loss[loss=0.1298, simple_loss=0.1375, pruned_loss=0.04818, audio_tagging_loss=0.01286, over 14953.00 frames. ], tot_loss[loss=0.1191, simple_loss=0.1296, pruned_loss=0.0422, audio_tagging_loss=0.01203, over 3059334.98 frames. ], batch size: 54, lr: 2.12e-02, grad_scale: 64.0 2023-11-18 10:45:24,130 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 9.777e+01 1.062e+02 1.165e+02 1.734e+02, threshold=2.124e+02, percent-clipped=0.0 2023-11-18 10:45:53,624 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4500, loss[loss=0.1003, simple_loss=0.1134, pruned_loss=0.03245, audio_tagging_loss=0.01113, over 14880.00 frames. ], tot_loss[loss=0.1197, simple_loss=0.1302, pruned_loss=0.04255, audio_tagging_loss=0.01203, over 3054663.78 frames. ], batch size: 57, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:46:00,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=190300.0, ans=0.125 2023-11-18 10:46:03,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=190366.66666666666, ans=0.0 2023-11-18 10:46:03,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=190366.66666666666, ans=0.0 2023-11-18 10:46:34,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=190500.0, ans=0.0 2023-11-18 10:46:47,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=190633.33333333334, ans=0.125 2023-11-18 10:46:48,216 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4550, loss[loss=0.1284, simple_loss=0.1333, pruned_loss=0.04928, audio_tagging_loss=0.01253, over 16050.00 frames. ], tot_loss[loss=0.1197, simple_loss=0.1301, pruned_loss=0.04253, audio_tagging_loss=0.01211, over 3058674.28 frames. ], batch size: 59, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:47:03,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=190700.0, ans=0.1 2023-11-18 10:47:15,825 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 9.411e+01 1.047e+02 1.182e+02 1.787e+02, threshold=2.094e+02, percent-clipped=0.0 2023-11-18 10:47:29,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=190833.33333333334, ans=0.125 2023-11-18 10:47:30,615 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 10:47:44,417 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4600, loss[loss=0.0862, simple_loss=0.09098, pruned_loss=0.02769, audio_tagging_loss=0.01302, over 14069.00 frames. ], tot_loss[loss=0.1181, simple_loss=0.1282, pruned_loss=0.04178, audio_tagging_loss=0.01224, over 3054926.60 frames. ], batch size: 54, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:47:56,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=191033.33333333334, ans=0.125 2023-11-18 10:48:02,588 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=191033.33333333334, ans=0.0 2023-11-18 10:48:10,307 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.39 vs. limit=15.0 2023-11-18 10:48:15,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=191100.0, ans=0.125 2023-11-18 10:48:16,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=191166.66666666666, ans=0.1 2023-11-18 10:48:17,717 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.56 vs. limit=15.0 2023-11-18 10:48:19,885 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:48:25,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=191166.66666666666, ans=0.0 2023-11-18 10:48:34,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=191233.33333333334, ans=0.2 2023-11-18 10:48:38,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=191233.33333333334, ans=0.125 2023-11-18 10:48:40,141 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4650, loss[loss=0.1171, simple_loss=0.1171, pruned_loss=0.04551, audio_tagging_loss=0.01308, over 14677.00 frames. ], tot_loss[loss=0.1177, simple_loss=0.1274, pruned_loss=0.04162, audio_tagging_loss=0.01241, over 3051505.72 frames. ], batch size: 54, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:48:42,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=191300.0, ans=0.95 2023-11-18 10:48:50,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=191366.66666666666, ans=0.125 2023-11-18 10:49:06,099 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.260e+01 9.958e+01 1.111e+02 1.228e+02 2.300e+02, threshold=2.222e+02, percent-clipped=1.0 2023-11-18 10:49:19,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=191500.0, ans=0.125 2023-11-18 10:49:26,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=191566.66666666666, ans=0.0 2023-11-18 10:49:32,370 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2023-11-18 10:49:33,247 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.04 vs. limit=15.0 2023-11-18 10:49:34,886 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4700, loss[loss=0.1072, simple_loss=0.1085, pruned_loss=0.03717, audio_tagging_loss=0.01577, over 14433.00 frames. ], tot_loss[loss=0.1186, simple_loss=0.1279, pruned_loss=0.04219, audio_tagging_loss=0.01251, over 3051470.20 frames. ], batch size: 56, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:49:45,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=191700.0, ans=0.125 2023-11-18 10:49:48,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=191700.0, ans=0.0 2023-11-18 10:49:53,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.78 vs. limit=22.5 2023-11-18 10:50:04,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=191766.66666666666, ans=0.0 2023-11-18 10:50:05,657 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:50:23,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=191900.0, ans=0.125 2023-11-18 10:50:23,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=191900.0, ans=0.1 2023-11-18 10:50:30,216 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4750, loss[loss=0.1112, simple_loss=0.1238, pruned_loss=0.03944, audio_tagging_loss=0.009885, over 15415.00 frames. ], tot_loss[loss=0.1185, simple_loss=0.1278, pruned_loss=0.04196, audio_tagging_loss=0.01264, over 3045760.65 frames. ], batch size: 61, lr: 2.11e-02, grad_scale: 32.0 2023-11-18 10:50:30,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=191966.66666666666, ans=0.125 2023-11-18 10:50:57,124 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.711e+01 9.880e+01 1.110e+02 1.323e+02 1.950e+02, threshold=2.220e+02, percent-clipped=0.0 2023-11-18 10:51:03,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=192166.66666666666, ans=0.0 2023-11-18 10:51:05,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=192166.66666666666, ans=0.0 2023-11-18 10:51:26,448 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4800, loss[loss=0.1322, simple_loss=0.1431, pruned_loss=0.04765, audio_tagging_loss=0.01297, over 16706.00 frames. ], tot_loss[loss=0.1176, simple_loss=0.1269, pruned_loss=0.04151, audio_tagging_loss=0.01265, over 3045525.90 frames. ], batch size: 62, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:51:43,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=192366.66666666666, ans=0.0 2023-11-18 10:51:51,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=192433.33333333334, ans=0.1 2023-11-18 10:51:56,021 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 10:52:00,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=192500.0, ans=0.1 2023-11-18 10:52:01,316 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.12 vs. limit=22.5 2023-11-18 10:52:05,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=192500.0, ans=0.1 2023-11-18 10:52:19,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=192566.66666666666, ans=0.0 2023-11-18 10:52:21,057 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4850, loss[loss=0.1115, simple_loss=0.1296, pruned_loss=0.03577, audio_tagging_loss=0.01093, over 14954.00 frames. ], tot_loss[loss=0.1181, simple_loss=0.1275, pruned_loss=0.0416, audio_tagging_loss=0.01274, over 3048391.95 frames. ], batch size: 58, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:52:25,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=192633.33333333334, ans=0.0 2023-11-18 10:52:38,526 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=15.0 2023-11-18 10:52:42,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=192766.66666666666, ans=0.025 2023-11-18 10:52:42,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=192766.66666666666, ans=0.125 2023-11-18 10:52:42,950 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.01 vs. limit=15.0 2023-11-18 10:52:43,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=192766.66666666666, ans=0.125 2023-11-18 10:52:47,763 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.311e+01 9.557e+01 1.060e+02 1.196e+02 2.281e+02, threshold=2.120e+02, percent-clipped=1.0 2023-11-18 10:53:15,993 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4900, loss[loss=0.1371, simple_loss=0.1503, pruned_loss=0.05194, audio_tagging_loss=0.009998, over 15935.00 frames. ], tot_loss[loss=0.1178, simple_loss=0.1273, pruned_loss=0.04153, audio_tagging_loss=0.01259, over 3051225.04 frames. ], batch size: 56, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:53:21,457 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=8.23 vs. limit=8.0 2023-11-18 10:53:39,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=193100.0, ans=0.125 2023-11-18 10:53:56,293 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.16 vs. limit=15.0 2023-11-18 10:54:02,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=193233.33333333334, ans=0.125 2023-11-18 10:54:11,446 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 4950, loss[loss=0.09277, simple_loss=0.09372, pruned_loss=0.03064, audio_tagging_loss=0.01527, over 15376.00 frames. ], tot_loss[loss=0.1183, simple_loss=0.1282, pruned_loss=0.04186, audio_tagging_loss=0.01231, over 3049944.29 frames. ], batch size: 59, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:54:13,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=193300.0, ans=0.125 2023-11-18 10:54:35,846 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.06 vs. limit=10.0 2023-11-18 10:54:37,971 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.127e+01 9.494e+01 1.131e+02 1.249e+02 1.755e+02, threshold=2.261e+02, percent-clipped=0.0 2023-11-18 10:54:50,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=193500.0, ans=0.125 2023-11-18 10:54:56,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=193566.66666666666, ans=0.0 2023-11-18 10:55:06,964 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5000, loss[loss=0.1655, simple_loss=0.1793, pruned_loss=0.06765, audio_tagging_loss=0.008177, over 15429.00 frames. ], tot_loss[loss=0.1179, simple_loss=0.1281, pruned_loss=0.04186, audio_tagging_loss=0.012, over 3048466.67 frames. ], batch size: 57, lr: 2.10e-02, grad_scale: 32.0 2023-11-18 10:55:19,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=193700.0, ans=0.0 2023-11-18 10:55:21,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.64 vs. limit=5.0 2023-11-18 10:56:02,101 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5050, loss[loss=0.1057, simple_loss=0.1202, pruned_loss=0.034, audio_tagging_loss=0.01154, over 16078.00 frames. ], tot_loss[loss=0.1177, simple_loss=0.1281, pruned_loss=0.04174, audio_tagging_loss=0.01189, over 3045375.98 frames. ], batch size: 59, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 10:56:02,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=193966.66666666666, ans=0.0 2023-11-18 10:56:03,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.whiten.whitening_limit, batch_count=193966.66666666666, ans=12.0 2023-11-18 10:56:10,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=193966.66666666666, ans=0.125 2023-11-18 10:56:11,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=193966.66666666666, ans=0.1 2023-11-18 10:56:28,904 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 1.006e+02 1.111e+02 1.230e+02 2.145e+02, threshold=2.223e+02, percent-clipped=0.0 2023-11-18 10:56:29,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=194100.0, ans=0.0 2023-11-18 10:56:57,803 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5100, loss[loss=0.117, simple_loss=0.1179, pruned_loss=0.04427, audio_tagging_loss=0.01372, over 13959.00 frames. ], tot_loss[loss=0.1181, simple_loss=0.1283, pruned_loss=0.04207, audio_tagging_loss=0.01191, over 3048784.31 frames. ], batch size: 53, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 10:57:17,585 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.97 vs. limit=10.0 2023-11-18 10:57:22,971 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.52 vs. limit=15.0 2023-11-18 10:57:32,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=194500.0, ans=0.09899494936611666 2023-11-18 10:57:51,897 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=12.0 2023-11-18 10:57:52,356 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5150, loss[loss=0.1057, simple_loss=0.1095, pruned_loss=0.03729, audio_tagging_loss=0.01362, over 15612.00 frames. ], tot_loss[loss=0.1182, simple_loss=0.1286, pruned_loss=0.04211, audio_tagging_loss=0.01179, over 3047423.72 frames. ], batch size: 61, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 10:57:56,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=194633.33333333334, ans=0.125 2023-11-18 10:58:07,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=194700.0, ans=0.1 2023-11-18 10:58:20,054 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.587e+01 9.533e+01 1.047e+02 1.145e+02 1.744e+02, threshold=2.094e+02, percent-clipped=0.0 2023-11-18 10:58:20,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=194766.66666666666, ans=0.125 2023-11-18 10:58:23,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=194766.66666666666, ans=0.2 2023-11-18 10:58:24,022 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=5.27 vs. limit=12.0 2023-11-18 10:58:25,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=194833.33333333334, ans=0.0 2023-11-18 10:58:48,382 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5200, loss[loss=0.09133, simple_loss=0.09661, pruned_loss=0.03277, audio_tagging_loss=0.01025, over 15182.00 frames. ], tot_loss[loss=0.1179, simple_loss=0.1282, pruned_loss=0.04204, audio_tagging_loss=0.01182, over 3045917.78 frames. ], batch size: 58, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 10:59:06,651 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2023-11-18 10:59:07,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=195033.33333333334, ans=0.0 2023-11-18 10:59:42,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.42 vs. limit=22.5 2023-11-18 10:59:44,057 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5250, loss[loss=0.1149, simple_loss=0.1248, pruned_loss=0.03751, audio_tagging_loss=0.01498, over 15070.00 frames. ], tot_loss[loss=0.1188, simple_loss=0.1289, pruned_loss=0.04253, audio_tagging_loss=0.01177, over 3049561.64 frames. ], batch size: 57, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 10:59:46,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=195300.0, ans=0.125 2023-11-18 10:59:46,751 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.39 vs. limit=15.0 2023-11-18 11:00:09,747 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.048e+01 9.706e+01 1.086e+02 1.165e+02 1.723e+02, threshold=2.171e+02, percent-clipped=0.0 2023-11-18 11:00:29,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=195566.66666666666, ans=0.1 2023-11-18 11:00:32,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=195566.66666666666, ans=0.125 2023-11-18 11:00:37,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=195633.33333333334, ans=0.2 2023-11-18 11:00:38,723 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5300, loss[loss=0.1123, simple_loss=0.1318, pruned_loss=0.03456, audio_tagging_loss=0.01188, over 15889.00 frames. ], tot_loss[loss=0.1177, simple_loss=0.1277, pruned_loss=0.04209, audio_tagging_loss=0.01178, over 3048796.33 frames. ], batch size: 57, lr: 2.09e-02, grad_scale: 32.0 2023-11-18 11:00:46,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=195633.33333333334, ans=0.0 2023-11-18 11:00:52,526 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.99 vs. limit=22.5 2023-11-18 11:00:53,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=195700.0, ans=0.125 2023-11-18 11:00:57,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=195700.0, ans=0.0 2023-11-18 11:00:59,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=195766.66666666666, ans=0.125 2023-11-18 11:01:07,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=195766.66666666666, ans=0.125 2023-11-18 11:01:12,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=195833.33333333334, ans=0.0 2023-11-18 11:01:18,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=195833.33333333334, ans=22.5 2023-11-18 11:01:18,106 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.86 vs. limit=22.5 2023-11-18 11:01:24,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.23 vs. limit=15.0 2023-11-18 11:01:27,148 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:01:27,528 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2023-11-18 11:01:33,825 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5350, loss[loss=0.08734, simple_loss=0.08835, pruned_loss=0.02478, audio_tagging_loss=0.01839, over 15440.00 frames. ], tot_loss[loss=0.1166, simple_loss=0.1264, pruned_loss=0.04144, audio_tagging_loss=0.01192, over 3045904.64 frames. ], batch size: 59, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:01:51,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=196033.33333333334, ans=0.04949747468305833 2023-11-18 11:02:00,850 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.505e+01 9.740e+01 1.103e+02 1.236e+02 1.942e+02, threshold=2.206e+02, percent-clipped=0.0 2023-11-18 11:02:01,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=196100.0, ans=0.125 2023-11-18 11:02:04,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=196100.0, ans=0.125 2023-11-18 11:02:23,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=196233.33333333334, ans=0.125 2023-11-18 11:02:27,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=196233.33333333334, ans=0.125 2023-11-18 11:02:30,253 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5400, loss[loss=0.13, simple_loss=0.1421, pruned_loss=0.04468, audio_tagging_loss=0.01432, over 15556.00 frames. ], tot_loss[loss=0.1172, simple_loss=0.1272, pruned_loss=0.04156, audio_tagging_loss=0.01206, over 3045446.11 frames. ], batch size: 55, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:02:43,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=196366.66666666666, ans=0.5 2023-11-18 11:03:24,824 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5450, loss[loss=0.1354, simple_loss=0.1518, pruned_loss=0.04847, audio_tagging_loss=0.01101, over 14615.00 frames. ], tot_loss[loss=0.1178, simple_loss=0.1279, pruned_loss=0.04167, audio_tagging_loss=0.01215, over 3044069.95 frames. ], batch size: 55, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:03:51,182 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.469e+01 9.475e+01 1.043e+02 1.232e+02 1.692e+02, threshold=2.085e+02, percent-clipped=0.0 2023-11-18 11:04:05,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=196833.33333333334, ans=0.125 2023-11-18 11:04:09,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=196900.0, ans=0.125 2023-11-18 11:04:11,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=196900.0, ans=0.1 2023-11-18 11:04:19,145 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5500, loss[loss=0.1289, simple_loss=0.1516, pruned_loss=0.04165, audio_tagging_loss=0.01139, over 15603.00 frames. ], tot_loss[loss=0.1181, simple_loss=0.1283, pruned_loss=0.04183, audio_tagging_loss=0.01218, over 3044617.72 frames. ], batch size: 56, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:04:22,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=196966.66666666666, ans=0.125 2023-11-18 11:04:55,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=197166.66666666666, ans=0.1 2023-11-18 11:05:15,099 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5550, loss[loss=0.1184, simple_loss=0.1241, pruned_loss=0.04266, audio_tagging_loss=0.01367, over 15277.00 frames. ], tot_loss[loss=0.1179, simple_loss=0.1281, pruned_loss=0.04154, audio_tagging_loss=0.01231, over 3052160.22 frames. ], batch size: 57, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:05:32,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=197366.66666666666, ans=0.125 2023-11-18 11:05:41,237 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.765e+01 9.422e+01 1.021e+02 1.116e+02 1.524e+02, threshold=2.042e+02, percent-clipped=0.0 2023-11-18 11:05:49,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=197500.0, ans=0.1 2023-11-18 11:05:57,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=197500.0, ans=10.0 2023-11-18 11:05:58,182 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.90 vs. limit=15.0 2023-11-18 11:06:06,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=197566.66666666666, ans=0.1 2023-11-18 11:06:06,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=197566.66666666666, ans=0.0 2023-11-18 11:06:10,798 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5600, loss[loss=0.132, simple_loss=0.1483, pruned_loss=0.04638, audio_tagging_loss=0.01149, over 14955.00 frames. ], tot_loss[loss=0.1193, simple_loss=0.1296, pruned_loss=0.04202, audio_tagging_loss=0.01241, over 3054872.15 frames. ], batch size: 56, lr: 2.08e-02, grad_scale: 32.0 2023-11-18 11:06:21,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=197700.0, ans=0.04949747468305833 2023-11-18 11:06:21,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.99 vs. limit=15.0 2023-11-18 11:06:25,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=197700.0, ans=0.125 2023-11-18 11:06:33,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.80 vs. limit=15.0 2023-11-18 11:06:35,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=197766.66666666666, ans=0.125 2023-11-18 11:06:35,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2023-11-18 11:06:43,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=197833.33333333334, ans=0.1 2023-11-18 11:06:49,881 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:06:53,881 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.18 vs. limit=8.0 2023-11-18 11:07:03,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=197900.0, ans=0.0 2023-11-18 11:07:03,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=197900.0, ans=0.0 2023-11-18 11:07:05,553 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5650, loss[loss=0.1338, simple_loss=0.1535, pruned_loss=0.04837, audio_tagging_loss=0.008701, over 15360.00 frames. ], tot_loss[loss=0.1183, simple_loss=0.1281, pruned_loss=0.04169, audio_tagging_loss=0.0125, over 3050593.35 frames. ], batch size: 57, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:07:16,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=198033.33333333334, ans=10.0 2023-11-18 11:07:32,366 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.586e+01 9.472e+01 1.054e+02 1.179e+02 1.784e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 11:07:33,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=198100.0, ans=0.0 2023-11-18 11:07:36,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.37 vs. limit=15.0 2023-11-18 11:07:41,279 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=14.09 vs. limit=15.0 2023-11-18 11:07:48,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=198233.33333333334, ans=0.0 2023-11-18 11:07:50,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=198233.33333333334, ans=0.125 2023-11-18 11:07:50,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=198233.33333333334, ans=0.125 2023-11-18 11:08:01,317 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5700, loss[loss=0.125, simple_loss=0.1396, pruned_loss=0.04606, audio_tagging_loss=0.009135, over 14506.00 frames. ], tot_loss[loss=0.1184, simple_loss=0.1281, pruned_loss=0.04193, audio_tagging_loss=0.01241, over 3047174.98 frames. ], batch size: 54, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:08:13,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=198366.66666666666, ans=0.125 2023-11-18 11:08:28,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=198433.33333333334, ans=0.1 2023-11-18 11:08:43,864 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.03 vs. limit=15.0 2023-11-18 11:08:47,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=198566.66666666666, ans=0.1 2023-11-18 11:08:47,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=198566.66666666666, ans=0.2 2023-11-18 11:08:50,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=198566.66666666666, ans=0.125 2023-11-18 11:08:56,301 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5750, loss[loss=0.1166, simple_loss=0.1346, pruned_loss=0.03696, audio_tagging_loss=0.01238, over 15531.00 frames. ], tot_loss[loss=0.1185, simple_loss=0.1287, pruned_loss=0.04192, audio_tagging_loss=0.0122, over 3044635.81 frames. ], batch size: 57, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:09:15,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.16 vs. limit=15.0 2023-11-18 11:09:15,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=198700.0, ans=0.125 2023-11-18 11:09:22,454 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.260e+01 9.937e+01 1.145e+02 1.295e+02 2.386e+02, threshold=2.290e+02, percent-clipped=2.0 2023-11-18 11:09:24,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=198766.66666666666, ans=0.125 2023-11-18 11:09:29,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=198833.33333333334, ans=0.125 2023-11-18 11:09:34,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=198833.33333333334, ans=0.1 2023-11-18 11:09:40,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=198900.0, ans=0.1 2023-11-18 11:09:40,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=198900.0, ans=0.125 2023-11-18 11:09:42,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=198900.0, ans=0.035 2023-11-18 11:09:50,851 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5800, loss[loss=0.1334, simple_loss=0.1524, pruned_loss=0.04514, audio_tagging_loss=0.01209, over 14881.00 frames. ], tot_loss[loss=0.1178, simple_loss=0.1281, pruned_loss=0.04166, audio_tagging_loss=0.0121, over 3053558.67 frames. ], batch size: 55, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:09:51,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=198966.66666666666, ans=0.025 2023-11-18 11:09:55,634 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-11-18 11:09:56,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=198966.66666666666, ans=0.0 2023-11-18 11:10:08,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.05 vs. limit=15.0 2023-11-18 11:10:43,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=199233.33333333334, ans=0.125 2023-11-18 11:10:45,926 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5850, loss[loss=0.1228, simple_loss=0.132, pruned_loss=0.04278, audio_tagging_loss=0.01403, over 15019.00 frames. ], tot_loss[loss=0.1176, simple_loss=0.1279, pruned_loss=0.04155, audio_tagging_loss=0.01212, over 3052188.75 frames. ], batch size: 58, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:10:57,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=199366.66666666666, ans=0.5 2023-11-18 11:11:01,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=199366.66666666666, ans=0.2 2023-11-18 11:11:12,884 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 9.939e+01 1.135e+02 1.295e+02 1.954e+02, threshold=2.270e+02, percent-clipped=0.0 2023-11-18 11:11:13,529 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.32 vs. limit=10.0 2023-11-18 11:11:16,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=199433.33333333334, ans=0.2 2023-11-18 11:11:23,509 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.12 vs. limit=10.0 2023-11-18 11:11:24,495 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.67 vs. limit=22.5 2023-11-18 11:11:42,582 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5900, loss[loss=0.07865, simple_loss=0.0739, pruned_loss=0.02753, audio_tagging_loss=0.01417, over 15919.00 frames. ], tot_loss[loss=0.1173, simple_loss=0.1277, pruned_loss=0.04147, audio_tagging_loss=0.01196, over 3046223.48 frames. ], batch size: 63, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:11:47,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=199633.33333333334, ans=0.1 2023-11-18 11:12:03,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=199766.66666666666, ans=0.1 2023-11-18 11:12:09,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=199766.66666666666, ans=0.0 2023-11-18 11:12:23,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=199833.33333333334, ans=0.125 2023-11-18 11:12:23,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=199833.33333333334, ans=0.125 2023-11-18 11:12:32,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=199900.0, ans=0.125 2023-11-18 11:12:37,047 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 5950, loss[loss=0.09226, simple_loss=0.09624, pruned_loss=0.03108, audio_tagging_loss=0.01307, over 14435.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1266, pruned_loss=0.04069, audio_tagging_loss=0.01191, over 3045241.08 frames. ], batch size: 56, lr: 2.07e-02, grad_scale: 32.0 2023-11-18 11:12:37,311 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=3.337e+00 2023-11-18 11:12:42,003 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.84 vs. limit=15.0 2023-11-18 11:12:48,914 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.01 vs. limit=15.0 2023-11-18 11:13:00,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=200100.0, ans=0.125 2023-11-18 11:13:04,192 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 9.491e+01 1.040e+02 1.180e+02 1.802e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-18 11:13:09,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=200100.0, ans=0.0 2023-11-18 11:13:19,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=200166.66666666666, ans=0.125 2023-11-18 11:13:32,553 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6000, loss[loss=0.09927, simple_loss=0.1015, pruned_loss=0.03077, audio_tagging_loss=0.01775, over 14870.00 frames. ], tot_loss[loss=0.1163, simple_loss=0.1269, pruned_loss=0.04093, audio_tagging_loss=0.01197, over 3043773.19 frames. ], batch size: 58, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:13:32,556 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 11:13:58,141 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.9353, 3.0536, 4.8300, 4.3772], device='cuda:0') 2023-11-18 11:14:05,630 INFO [train_asr.py:1147] (0/4) Epoch 3, validation: loss=0.08054, simple_loss=0.06533, pruned_loss=0.01225, audio_tagging_loss=0.03562, over 4681554.00 frames. 2023-11-18 11:14:05,631 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 11:14:43,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=200500.0, ans=0.0 2023-11-18 11:14:45,159 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:14:56,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=200566.66666666666, ans=0.0 2023-11-18 11:15:00,399 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6050, loss[loss=0.1363, simple_loss=0.1538, pruned_loss=0.04774, audio_tagging_loss=0.01165, over 14753.00 frames. ], tot_loss[loss=0.1168, simple_loss=0.1274, pruned_loss=0.04104, audio_tagging_loss=0.01207, over 3047615.93 frames. ], batch size: 52, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:15:03,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=200633.33333333334, ans=0.125 2023-11-18 11:15:27,197 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.571e+01 9.505e+01 1.054e+02 1.196e+02 1.657e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 11:15:38,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=200833.33333333334, ans=0.125 2023-11-18 11:15:46,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=200900.0, ans=0.125 2023-11-18 11:15:55,689 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6100, loss[loss=0.1147, simple_loss=0.1163, pruned_loss=0.03661, audio_tagging_loss=0.01993, over 15189.00 frames. ], tot_loss[loss=0.1162, simple_loss=0.1265, pruned_loss=0.04089, audio_tagging_loss=0.01206, over 3043585.89 frames. ], batch size: 57, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:15:58,911 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.58 vs. limit=15.0 2023-11-18 11:16:09,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=201033.33333333334, ans=0.125 2023-11-18 11:16:28,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=201166.66666666666, ans=0.0 2023-11-18 11:16:51,861 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6150, loss[loss=0.1028, simple_loss=0.1089, pruned_loss=0.03622, audio_tagging_loss=0.01214, over 13407.00 frames. ], tot_loss[loss=0.117, simple_loss=0.1278, pruned_loss=0.04108, audio_tagging_loss=0.01204, over 3038982.83 frames. ], batch size: 52, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:17:18,532 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 9.824e+01 1.100e+02 1.227e+02 1.879e+02, threshold=2.200e+02, percent-clipped=0.0 2023-11-18 11:17:30,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=201500.0, ans=0.035 2023-11-18 11:17:30,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=201500.0, ans=0.0 2023-11-18 11:17:43,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_na.min_abs, batch_count=201566.66666666666, ans=0.02 2023-11-18 11:17:47,684 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6200, loss[loss=0.1338, simple_loss=0.1473, pruned_loss=0.04983, audio_tagging_loss=0.0103, over 15152.00 frames. ], tot_loss[loss=0.1164, simple_loss=0.1266, pruned_loss=0.04093, audio_tagging_loss=0.01214, over 3047294.00 frames. ], batch size: 57, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:17:54,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=201633.33333333334, ans=0.125 2023-11-18 11:18:00,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=201700.0, ans=0.025 2023-11-18 11:18:10,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=201766.66666666666, ans=0.1 2023-11-18 11:18:14,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=201766.66666666666, ans=0.125 2023-11-18 11:18:29,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=201833.33333333334, ans=0.125 2023-11-18 11:18:41,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=201900.0, ans=0.125 2023-11-18 11:18:42,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=201966.66666666666, ans=0.1 2023-11-18 11:18:43,332 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6250, loss[loss=0.1277, simple_loss=0.1412, pruned_loss=0.04525, audio_tagging_loss=0.01184, over 14650.00 frames. ], tot_loss[loss=0.1165, simple_loss=0.1265, pruned_loss=0.041, audio_tagging_loss=0.01222, over 3048390.89 frames. ], batch size: 56, lr: 2.06e-02, grad_scale: 32.0 2023-11-18 11:18:49,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=201966.66666666666, ans=0.0 2023-11-18 11:19:08,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=202100.0, ans=0.0 2023-11-18 11:19:10,115 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.583e+01 9.428e+01 1.017e+02 1.154e+02 1.739e+02, threshold=2.034e+02, percent-clipped=0.0 2023-11-18 11:19:14,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=202100.0, ans=0.0 2023-11-18 11:19:23,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=202166.66666666666, ans=0.125 2023-11-18 11:19:25,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=202166.66666666666, ans=0.125 2023-11-18 11:19:39,067 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6300, loss[loss=0.111, simple_loss=0.1222, pruned_loss=0.03813, audio_tagging_loss=0.01177, over 15566.00 frames. ], tot_loss[loss=0.1162, simple_loss=0.126, pruned_loss=0.0408, audio_tagging_loss=0.01235, over 3050961.24 frames. ], batch size: 57, lr: 2.05e-02, grad_scale: 32.0 2023-11-18 11:19:42,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=202300.0, ans=0.0 2023-11-18 11:19:45,640 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.786e+00 2023-11-18 11:19:56,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=202366.66666666666, ans=0.125 2023-11-18 11:20:16,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=202500.0, ans=0.025 2023-11-18 11:20:20,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=202500.0, ans=0.0 2023-11-18 11:20:34,519 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6350, loss[loss=0.09436, simple_loss=0.09713, pruned_loss=0.03127, audio_tagging_loss=0.01452, over 14873.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.1251, pruned_loss=0.04039, audio_tagging_loss=0.01243, over 3050076.82 frames. ], batch size: 58, lr: 2.05e-02, grad_scale: 32.0 2023-11-18 11:20:36,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=202633.33333333334, ans=0.125 2023-11-18 11:20:55,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=202766.66666666666, ans=0.125 2023-11-18 11:21:01,569 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 9.831e+01 1.084e+02 1.220e+02 1.699e+02, threshold=2.169e+02, percent-clipped=0.0 2023-11-18 11:21:02,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=202766.66666666666, ans=0.0 2023-11-18 11:21:04,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=202766.66666666666, ans=0.0 2023-11-18 11:21:10,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=202833.33333333334, ans=0.125 2023-11-18 11:21:14,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=202833.33333333334, ans=0.1 2023-11-18 11:21:25,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=202900.0, ans=0.125 2023-11-18 11:21:29,931 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6400, loss[loss=0.1328, simple_loss=0.1383, pruned_loss=0.0531, audio_tagging_loss=0.01058, over 14952.00 frames. ], tot_loss[loss=0.1154, simple_loss=0.1246, pruned_loss=0.0405, audio_tagging_loss=0.01258, over 3044678.79 frames. ], batch size: 56, lr: 2.05e-02, grad_scale: 32.0 2023-11-18 11:21:33,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=202966.66666666666, ans=0.125 2023-11-18 11:21:36,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=202966.66666666666, ans=0.125 2023-11-18 11:21:53,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=203100.0, ans=0.025 2023-11-18 11:21:56,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=203100.0, ans=0.1 2023-11-18 11:22:04,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=203166.66666666666, ans=0.125 2023-11-18 11:22:12,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=203166.66666666666, ans=0.125 2023-11-18 11:22:13,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=203233.33333333334, ans=0.125 2023-11-18 11:22:26,001 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6450, loss[loss=0.1191, simple_loss=0.1157, pruned_loss=0.04793, audio_tagging_loss=0.01331, over 15032.00 frames. ], tot_loss[loss=0.1162, simple_loss=0.1258, pruned_loss=0.04075, audio_tagging_loss=0.01256, over 3043580.31 frames. ], batch size: 57, lr: 2.05e-02, grad_scale: 32.0 2023-11-18 11:22:28,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=203300.0, ans=0.2 2023-11-18 11:22:46,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=203433.33333333334, ans=0.1 2023-11-18 11:22:52,367 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 9.763e+01 1.082e+02 1.171e+02 1.453e+02, threshold=2.164e+02, percent-clipped=0.0 2023-11-18 11:22:58,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.73 vs. limit=15.0 2023-11-18 11:23:05,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=203500.0, ans=0.125 2023-11-18 11:23:11,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=203566.66666666666, ans=0.125 2023-11-18 11:23:21,039 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6500, loss[loss=0.136, simple_loss=0.1535, pruned_loss=0.05067, audio_tagging_loss=0.008537, over 15131.00 frames. ], tot_loss[loss=0.1163, simple_loss=0.1261, pruned_loss=0.04069, audio_tagging_loss=0.01256, over 3041731.81 frames. ], batch size: 54, lr: 2.05e-02, grad_scale: 64.0 2023-11-18 11:23:30,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=203633.33333333334, ans=0.0 2023-11-18 11:23:32,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=203700.0, ans=0.125 2023-11-18 11:23:37,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=203700.0, ans=0.125 2023-11-18 11:23:38,656 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=17.31 vs. limit=22.5 2023-11-18 11:23:42,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=203766.66666666666, ans=0.2 2023-11-18 11:24:17,213 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6550, loss[loss=0.09545, simple_loss=0.1025, pruned_loss=0.03382, audio_tagging_loss=0.01037, over 13878.00 frames. ], tot_loss[loss=0.1162, simple_loss=0.1262, pruned_loss=0.04066, audio_tagging_loss=0.01239, over 3040779.53 frames. ], batch size: 56, lr: 2.05e-02, grad_scale: 64.0 2023-11-18 11:24:36,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=204033.33333333334, ans=0.0 2023-11-18 11:24:43,889 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.972e+01 9.621e+01 1.067e+02 1.227e+02 1.729e+02, threshold=2.134e+02, percent-clipped=0.0 2023-11-18 11:24:59,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=204166.66666666666, ans=0.0 2023-11-18 11:25:13,377 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6600, loss[loss=0.1129, simple_loss=0.1182, pruned_loss=0.04307, audio_tagging_loss=0.01079, over 16304.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1263, pruned_loss=0.04068, audio_tagging_loss=0.01211, over 3039617.82 frames. ], batch size: 62, lr: 2.04e-02, grad_scale: 64.0 2023-11-18 11:25:23,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=204366.66666666666, ans=0.1 2023-11-18 11:25:28,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=204366.66666666666, ans=0.2 2023-11-18 11:25:33,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=204433.33333333334, ans=0.2 2023-11-18 11:25:42,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=204433.33333333334, ans=0.125 2023-11-18 11:25:46,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.80 vs. limit=8.0 2023-11-18 11:25:50,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=204500.0, ans=0.0 2023-11-18 11:25:52,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=204500.0, ans=0.1 2023-11-18 11:26:03,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=204566.66666666666, ans=0.0 2023-11-18 11:26:03,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=204566.66666666666, ans=0.09899494936611666 2023-11-18 11:26:07,846 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2023-11-18 11:26:08,413 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6650, loss[loss=0.04834, simple_loss=0.04749, pruned_loss=0.01054, audio_tagging_loss=0.01405, over 13655.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.1255, pruned_loss=0.04047, audio_tagging_loss=0.01211, over 3033428.86 frames. ], batch size: 54, lr: 2.04e-02, grad_scale: 64.0 2023-11-18 11:26:17,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=204633.33333333334, ans=0.125 2023-11-18 11:26:35,312 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.502e+01 9.475e+01 1.025e+02 1.163e+02 1.869e+02, threshold=2.050e+02, percent-clipped=0.0 2023-11-18 11:26:38,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=204766.66666666666, ans=0.5 2023-11-18 11:26:42,659 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.39 vs. limit=15.0 2023-11-18 11:26:56,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=204900.0, ans=0.09899494936611666 2023-11-18 11:26:59,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=204900.0, ans=0.0 2023-11-18 11:27:00,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=204900.0, ans=0.0 2023-11-18 11:27:03,167 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6700, loss[loss=0.1485, simple_loss=0.1544, pruned_loss=0.05555, audio_tagging_loss=0.01569, over 15053.00 frames. ], tot_loss[loss=0.1158, simple_loss=0.1264, pruned_loss=0.04054, audio_tagging_loss=0.01204, over 3039611.33 frames. ], batch size: 58, lr: 2.04e-02, grad_scale: 64.0 2023-11-18 11:27:06,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=204966.66666666666, ans=0.2 2023-11-18 11:27:09,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=204966.66666666666, ans=0.0 2023-11-18 11:27:15,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_abs, batch_count=205033.33333333334, ans=0.5 2023-11-18 11:27:26,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=205100.0, ans=0.125 2023-11-18 11:27:38,899 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.83 vs. limit=22.5 2023-11-18 11:27:39,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=205166.66666666666, ans=0.2 2023-11-18 11:27:59,109 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6750, loss[loss=0.08864, simple_loss=0.1018, pruned_loss=0.02843, audio_tagging_loss=0.009306, over 14750.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.126, pruned_loss=0.0405, audio_tagging_loss=0.01207, over 3039313.19 frames. ], batch size: 56, lr: 2.04e-02, grad_scale: 64.0 2023-11-18 11:28:05,659 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.98 vs. limit=15.0 2023-11-18 11:28:11,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=205366.66666666666, ans=0.015 2023-11-18 11:28:25,172 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.754e+01 9.647e+01 1.086e+02 1.295e+02 2.076e+02, threshold=2.172e+02, percent-clipped=1.0 2023-11-18 11:28:25,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=205433.33333333334, ans=0.125 2023-11-18 11:28:26,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=205433.33333333334, ans=0.125 2023-11-18 11:28:28,290 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.77 vs. limit=15.0 2023-11-18 11:28:30,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=205433.33333333334, ans=0.1 2023-11-18 11:28:33,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=205500.0, ans=0.125 2023-11-18 11:28:39,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=205500.0, ans=0.1 2023-11-18 11:28:46,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=205566.66666666666, ans=0.125 2023-11-18 11:28:50,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=205566.66666666666, ans=0.0 2023-11-18 11:28:52,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=205566.66666666666, ans=0.1 2023-11-18 11:28:54,666 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6800, loss[loss=0.1429, simple_loss=0.1575, pruned_loss=0.04832, audio_tagging_loss=0.01591, over 16126.00 frames. ], tot_loss[loss=0.1161, simple_loss=0.1266, pruned_loss=0.04076, audio_tagging_loss=0.01206, over 3036208.31 frames. ], batch size: 61, lr: 2.04e-02, grad_scale: 32.0 2023-11-18 11:29:00,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.12 vs. limit=15.0 2023-11-18 11:29:05,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=205700.0, ans=0.125 2023-11-18 11:29:30,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.96 vs. limit=10.0 2023-11-18 11:29:31,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=205833.33333333334, ans=0.1 2023-11-18 11:29:40,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=205900.0, ans=0.0 2023-11-18 11:29:44,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=205900.0, ans=0.125 2023-11-18 11:29:49,795 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6850, loss[loss=0.1511, simple_loss=0.1584, pruned_loss=0.06255, audio_tagging_loss=0.009312, over 15895.00 frames. ], tot_loss[loss=0.1164, simple_loss=0.1272, pruned_loss=0.04088, audio_tagging_loss=0.01197, over 3039648.94 frames. ], batch size: 59, lr: 2.04e-02, grad_scale: 32.0 2023-11-18 11:29:54,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=205966.66666666666, ans=0.125 2023-11-18 11:29:57,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.76 vs. limit=6.0 2023-11-18 11:29:59,056 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=13.18 vs. limit=15.0 2023-11-18 11:30:06,185 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.03 vs. limit=22.5 2023-11-18 11:30:14,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=206100.0, ans=0.125 2023-11-18 11:30:17,590 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.358e+01 9.332e+01 1.055e+02 1.143e+02 1.752e+02, threshold=2.109e+02, percent-clipped=0.0 2023-11-18 11:30:19,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=206100.0, ans=0.125 2023-11-18 11:30:42,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=206233.33333333334, ans=0.125 2023-11-18 11:30:45,649 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6900, loss[loss=0.1151, simple_loss=0.1303, pruned_loss=0.03773, audio_tagging_loss=0.01222, over 15086.00 frames. ], tot_loss[loss=0.1154, simple_loss=0.126, pruned_loss=0.04028, audio_tagging_loss=0.0121, over 3039159.64 frames. ], batch size: 55, lr: 2.04e-02, grad_scale: 32.0 2023-11-18 11:30:45,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=206300.0, ans=0.1 2023-11-18 11:30:46,080 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.55 vs. limit=22.5 2023-11-18 11:30:55,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=206366.66666666666, ans=0.0 2023-11-18 11:31:13,130 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.52 vs. limit=22.5 2023-11-18 11:31:19,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=206500.0, ans=0.1 2023-11-18 11:31:20,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.10 vs. limit=22.5 2023-11-18 11:31:21,947 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.48 vs. limit=22.5 2023-11-18 11:31:26,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=206500.0, ans=0.0 2023-11-18 11:31:27,819 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:31:37,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=206566.66666666666, ans=0.2 2023-11-18 11:31:40,914 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 6950, loss[loss=0.1067, simple_loss=0.1193, pruned_loss=0.03581, audio_tagging_loss=0.01127, over 15321.00 frames. ], tot_loss[loss=0.1152, simple_loss=0.1261, pruned_loss=0.04012, audio_tagging_loss=0.01203, over 3032258.07 frames. ], batch size: 60, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:31:55,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=206700.0, ans=0.1 2023-11-18 11:32:08,297 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.140e+01 9.386e+01 1.046e+02 1.149e+02 1.697e+02, threshold=2.092e+02, percent-clipped=0.0 2023-11-18 11:32:28,760 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.43 vs. limit=15.0 2023-11-18 11:32:30,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=206900.0, ans=0.125 2023-11-18 11:32:32,290 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.75 vs. limit=5.0 2023-11-18 11:32:35,654 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7000, loss[loss=0.1307, simple_loss=0.133, pruned_loss=0.04448, audio_tagging_loss=0.01971, over 14450.00 frames. ], tot_loss[loss=0.1154, simple_loss=0.1263, pruned_loss=0.04016, audio_tagging_loss=0.01208, over 3032016.37 frames. ], batch size: 53, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:32:43,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=206966.66666666666, ans=0.125 2023-11-18 11:32:47,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=207033.33333333334, ans=0.1 2023-11-18 11:32:49,549 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.837e-02 2023-11-18 11:32:58,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=207100.0, ans=0.0 2023-11-18 11:33:09,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-11-18 11:33:09,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=207166.66666666666, ans=0.125 2023-11-18 11:33:10,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=207166.66666666666, ans=0.125 2023-11-18 11:33:15,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.00 vs. limit=22.5 2023-11-18 11:33:17,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=207166.66666666666, ans=0.125 2023-11-18 11:33:31,238 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=12.0 2023-11-18 11:33:31,929 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7050, loss[loss=0.1242, simple_loss=0.1406, pruned_loss=0.04146, audio_tagging_loss=0.01242, over 15260.00 frames. ], tot_loss[loss=0.115, simple_loss=0.1255, pruned_loss=0.04005, audio_tagging_loss=0.01219, over 3036506.30 frames. ], batch size: 55, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:33:50,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=207366.66666666666, ans=0.125 2023-11-18 11:33:58,909 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.932e+01 9.545e+01 1.049e+02 1.197e+02 1.734e+02, threshold=2.097e+02, percent-clipped=0.0 2023-11-18 11:34:27,345 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7100, loss[loss=0.1098, simple_loss=0.1184, pruned_loss=0.03911, audio_tagging_loss=0.01152, over 14647.00 frames. ], tot_loss[loss=0.1144, simple_loss=0.1249, pruned_loss=0.03965, audio_tagging_loss=0.01226, over 3035942.96 frames. ], batch size: 57, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:34:30,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=207633.33333333334, ans=0.1 2023-11-18 11:34:32,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten.whitening_limit, batch_count=207633.33333333334, ans=15.0 2023-11-18 11:34:47,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=207700.0, ans=0.0 2023-11-18 11:34:59,763 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2023-11-18 11:35:09,400 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2023-11-18 11:35:22,611 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7150, loss[loss=0.1113, simple_loss=0.1246, pruned_loss=0.03728, audio_tagging_loss=0.01172, over 16060.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.1261, pruned_loss=0.04018, audio_tagging_loss=0.01236, over 3041010.87 frames. ], batch size: 59, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:35:48,073 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.22 vs. limit=6.0 2023-11-18 11:35:51,550 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.725e+01 9.617e+01 1.079e+02 1.252e+02 1.872e+02, threshold=2.157e+02, percent-clipped=0.0 2023-11-18 11:35:51,838 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=4.268e+00 2023-11-18 11:35:56,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=208166.66666666666, ans=0.04949747468305833 2023-11-18 11:35:57,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=208166.66666666666, ans=10.0 2023-11-18 11:35:57,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.03 vs. limit=10.0 2023-11-18 11:36:07,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=208233.33333333334, ans=0.125 2023-11-18 11:36:11,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.59 vs. limit=15.0 2023-11-18 11:36:19,130 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7200, loss[loss=0.1367, simple_loss=0.1533, pruned_loss=0.04984, audio_tagging_loss=0.01019, over 15905.00 frames. ], tot_loss[loss=0.1158, simple_loss=0.1265, pruned_loss=0.04025, audio_tagging_loss=0.01232, over 3041690.58 frames. ], batch size: 57, lr: 2.03e-02, grad_scale: 32.0 2023-11-18 11:36:38,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=208366.66666666666, ans=0.125 2023-11-18 11:37:08,803 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2023-11-18 11:37:15,133 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7250, loss[loss=0.08413, simple_loss=0.09089, pruned_loss=0.0253, audio_tagging_loss=0.01339, over 15779.00 frames. ], tot_loss[loss=0.1155, simple_loss=0.126, pruned_loss=0.04017, audio_tagging_loss=0.01233, over 3044799.72 frames. ], batch size: 62, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:37:22,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=208633.33333333334, ans=0.125 2023-11-18 11:37:39,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=208766.66666666666, ans=0.125 2023-11-18 11:37:40,966 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:37:41,783 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.722e+01 9.494e+01 1.040e+02 1.201e+02 1.952e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-18 11:38:09,853 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7300, loss[loss=0.1319, simple_loss=0.1484, pruned_loss=0.04728, audio_tagging_loss=0.0104, over 15925.00 frames. ], tot_loss[loss=0.1161, simple_loss=0.1268, pruned_loss=0.04052, audio_tagging_loss=0.01217, over 3044289.40 frames. ], batch size: 60, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:38:11,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=208966.66666666666, ans=0.5 2023-11-18 11:38:30,116 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:38:37,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=209100.0, ans=0.2 2023-11-18 11:38:58,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=209233.33333333334, ans=0.2 2023-11-18 11:39:05,507 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7350, loss[loss=0.07469, simple_loss=0.07905, pruned_loss=0.02076, audio_tagging_loss=0.01441, over 15551.00 frames. ], tot_loss[loss=0.1155, simple_loss=0.126, pruned_loss=0.04042, audio_tagging_loss=0.0121, over 3047994.59 frames. ], batch size: 59, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:39:20,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=209366.66666666666, ans=0.0 2023-11-18 11:39:23,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=209366.66666666666, ans=0.2 2023-11-18 11:39:26,604 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:39:33,660 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.784e+01 9.639e+01 1.055e+02 1.233e+02 1.941e+02, threshold=2.110e+02, percent-clipped=0.0 2023-11-18 11:39:41,509 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.51 vs. limit=15.0 2023-11-18 11:39:50,033 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.56 vs. limit=15.0 2023-11-18 11:40:01,574 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7400, loss[loss=0.1526, simple_loss=0.164, pruned_loss=0.05703, audio_tagging_loss=0.01363, over 15269.00 frames. ], tot_loss[loss=0.1155, simple_loss=0.1264, pruned_loss=0.04047, audio_tagging_loss=0.01187, over 3040561.26 frames. ], batch size: 56, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:40:02,068 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=15.0 2023-11-18 11:40:11,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=209700.0, ans=0.95 2023-11-18 11:40:15,296 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=5.57 vs. limit=15.0 2023-11-18 11:40:38,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=209833.33333333334, ans=0.125 2023-11-18 11:40:56,856 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7450, loss[loss=0.1076, simple_loss=0.1261, pruned_loss=0.03346, audio_tagging_loss=0.01111, over 14859.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1269, pruned_loss=0.04068, audio_tagging_loss=0.01183, over 3043641.48 frames. ], batch size: 57, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:41:03,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=209966.66666666666, ans=0.0 2023-11-18 11:41:12,688 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.65 vs. limit=15.0 2023-11-18 11:41:19,749 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.77 vs. limit=22.5 2023-11-18 11:41:24,896 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.797e+01 9.734e+01 1.062e+02 1.217e+02 1.649e+02, threshold=2.124e+02, percent-clipped=0.0 2023-11-18 11:41:43,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=210233.33333333334, ans=0.125 2023-11-18 11:41:49,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=210233.33333333334, ans=0.0 2023-11-18 11:41:52,348 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7500, loss[loss=0.1296, simple_loss=0.136, pruned_loss=0.0502, audio_tagging_loss=0.01144, over 14914.00 frames. ], tot_loss[loss=0.1157, simple_loss=0.1267, pruned_loss=0.04054, audio_tagging_loss=0.01182, over 3045035.17 frames. ], batch size: 54, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:42:08,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=210366.66666666666, ans=0.025 2023-11-18 11:42:13,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=210433.33333333334, ans=0.125 2023-11-18 11:42:22,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=210433.33333333334, ans=0.125 2023-11-18 11:42:32,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=210500.0, ans=0.5 2023-11-18 11:42:40,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=12.0 2023-11-18 11:42:48,182 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7550, loss[loss=0.1454, simple_loss=0.1668, pruned_loss=0.05165, audio_tagging_loss=0.01036, over 15860.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.1261, pruned_loss=0.04044, audio_tagging_loss=0.01184, over 3048933.23 frames. ], batch size: 57, lr: 2.02e-02, grad_scale: 32.0 2023-11-18 11:43:00,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=210700.0, ans=0.0 2023-11-18 11:43:15,905 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 1.017e+02 1.103e+02 1.286e+02 2.062e+02, threshold=2.206e+02, percent-clipped=0.0 2023-11-18 11:43:38,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=210900.0, ans=0.2 2023-11-18 11:43:39,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=210900.0, ans=0.0 2023-11-18 11:43:43,337 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7600, loss[loss=0.07598, simple_loss=0.07613, pruned_loss=0.02184, audio_tagging_loss=0.01607, over 13942.00 frames. ], tot_loss[loss=0.1148, simple_loss=0.1254, pruned_loss=0.04008, audio_tagging_loss=0.01197, over 3043169.31 frames. ], batch size: 53, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:44:10,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=211100.0, ans=0.2 2023-11-18 11:44:11,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=211100.0, ans=0.125 2023-11-18 11:44:11,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=211100.0, ans=0.95 2023-11-18 11:44:27,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=211233.33333333334, ans=0.125 2023-11-18 11:44:28,509 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.08 vs. limit=6.0 2023-11-18 11:44:39,627 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7650, loss[loss=0.1132, simple_loss=0.1273, pruned_loss=0.04056, audio_tagging_loss=0.009004, over 14247.00 frames. ], tot_loss[loss=0.1144, simple_loss=0.1249, pruned_loss=0.03999, audio_tagging_loss=0.0119, over 3037907.82 frames. ], batch size: 53, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:45:07,188 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.141e+01 9.868e+01 1.071e+02 1.213e+02 1.962e+02, threshold=2.142e+02, percent-clipped=0.0 2023-11-18 11:45:08,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=211433.33333333334, ans=0.125 2023-11-18 11:45:13,257 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.67 vs. limit=15.0 2023-11-18 11:45:35,752 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7700, loss[loss=0.09706, simple_loss=0.1056, pruned_loss=0.02744, audio_tagging_loss=0.01681, over 15150.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1269, pruned_loss=0.04052, audio_tagging_loss=0.01194, over 3044305.12 frames. ], batch size: 56, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:45:47,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=211700.0, ans=0.04949747468305833 2023-11-18 11:45:50,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=211700.0, ans=0.125 2023-11-18 11:45:57,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=211766.66666666666, ans=0.125 2023-11-18 11:46:04,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=211766.66666666666, ans=0.0 2023-11-18 11:46:04,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=211766.66666666666, ans=0.04949747468305833 2023-11-18 11:46:06,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.80 vs. limit=15.0 2023-11-18 11:46:10,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=211833.33333333334, ans=0.125 2023-11-18 11:46:12,733 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.00 vs. limit=22.5 2023-11-18 11:46:30,613 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7750, loss[loss=0.1212, simple_loss=0.1353, pruned_loss=0.0404, audio_tagging_loss=0.01317, over 14958.00 frames. ], tot_loss[loss=0.1162, simple_loss=0.127, pruned_loss=0.04067, audio_tagging_loss=0.01197, over 3034267.18 frames. ], batch size: 57, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:46:44,465 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.32 vs. limit=10.0 2023-11-18 11:46:56,215 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.97 vs. limit=15.0 2023-11-18 11:46:59,435 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.217e+01 9.456e+01 1.068e+02 1.204e+02 1.685e+02, threshold=2.136e+02, percent-clipped=0.0 2023-11-18 11:47:13,756 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=14.66 vs. limit=22.5 2023-11-18 11:47:14,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=212233.33333333334, ans=0.05 2023-11-18 11:47:16,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=212233.33333333334, ans=0.0 2023-11-18 11:47:17,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=212233.33333333334, ans=0.1 2023-11-18 11:47:26,654 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7800, loss[loss=0.08155, simple_loss=0.08599, pruned_loss=0.02808, audio_tagging_loss=0.01047, over 15781.00 frames. ], tot_loss[loss=0.1161, simple_loss=0.1271, pruned_loss=0.04064, audio_tagging_loss=0.0119, over 3035861.13 frames. ], batch size: 59, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:47:38,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=212366.66666666666, ans=0.125 2023-11-18 11:47:50,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=212433.33333333334, ans=0.2 2023-11-18 11:48:22,974 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7850, loss[loss=0.1308, simple_loss=0.1483, pruned_loss=0.04756, audio_tagging_loss=0.009101, over 15052.00 frames. ], tot_loss[loss=0.1163, simple_loss=0.1273, pruned_loss=0.04068, audio_tagging_loss=0.01199, over 3036482.83 frames. ], batch size: 54, lr: 2.01e-02, grad_scale: 32.0 2023-11-18 11:48:35,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=212700.0, ans=0.025 2023-11-18 11:48:39,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=212700.0, ans=0.125 2023-11-18 11:48:49,653 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.949e+01 1.011e+02 1.139e+02 1.309e+02 2.076e+02, threshold=2.278e+02, percent-clipped=0.0 2023-11-18 11:49:00,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=212833.33333333334, ans=0.125 2023-11-18 11:49:01,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=212833.33333333334, ans=0.125 2023-11-18 11:49:13,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=212900.0, ans=0.5 2023-11-18 11:49:14,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=212900.0, ans=0.125 2023-11-18 11:49:17,819 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7900, loss[loss=0.09616, simple_loss=0.1035, pruned_loss=0.03188, audio_tagging_loss=0.01251, over 16463.00 frames. ], tot_loss[loss=0.1167, simple_loss=0.1276, pruned_loss=0.04082, audio_tagging_loss=0.01207, over 3043749.61 frames. ], batch size: 62, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:49:17,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=212966.66666666666, ans=0.0 2023-11-18 11:49:20,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=212966.66666666666, ans=0.125 2023-11-18 11:49:26,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=212966.66666666666, ans=10.0 2023-11-18 11:49:29,898 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.06 vs. limit=22.5 2023-11-18 11:49:38,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=213033.33333333334, ans=0.0 2023-11-18 11:50:07,135 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.82 vs. limit=10.0 2023-11-18 11:50:07,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=213233.33333333334, ans=0.125 2023-11-18 11:50:11,967 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 7950, loss[loss=0.1263, simple_loss=0.1381, pruned_loss=0.04458, audio_tagging_loss=0.01265, over 14873.00 frames. ], tot_loss[loss=0.1168, simple_loss=0.1273, pruned_loss=0.04103, audio_tagging_loss=0.01216, over 3039037.37 frames. ], batch size: 56, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:50:15,255 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-18 11:50:17,009 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-32000.pt 2023-11-18 11:50:25,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=213366.66666666666, ans=0.125 2023-11-18 11:50:26,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=213366.66666666666, ans=0.1 2023-11-18 11:50:26,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=213366.66666666666, ans=0.0 2023-11-18 11:50:28,564 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:50:30,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=213366.66666666666, ans=0.1 2023-11-18 11:50:35,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=213366.66666666666, ans=0.1 2023-11-18 11:50:42,609 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.234e+01 9.499e+01 1.075e+02 1.220e+02 1.746e+02, threshold=2.150e+02, percent-clipped=0.0 2023-11-18 11:51:05,887 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.49 vs. limit=22.5 2023-11-18 11:51:11,124 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8000, loss[loss=0.1371, simple_loss=0.1535, pruned_loss=0.0461, audio_tagging_loss=0.01425, over 15028.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.1257, pruned_loss=0.04038, audio_tagging_loss=0.01231, over 3039743.09 frames. ], batch size: 55, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:51:11,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=213633.33333333334, ans=0.1 2023-11-18 11:51:13,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=213633.33333333334, ans=0.1 2023-11-18 11:51:17,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=213633.33333333334, ans=0.2 2023-11-18 11:51:19,229 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.00 vs. limit=6.0 2023-11-18 11:51:30,688 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.13 vs. limit=15.0 2023-11-18 11:51:37,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=213766.66666666666, ans=0.0 2023-11-18 11:51:55,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=213900.0, ans=0.2 2023-11-18 11:51:57,089 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.38 vs. limit=22.5 2023-11-18 11:52:05,882 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8050, loss[loss=0.1497, simple_loss=0.1633, pruned_loss=0.05749, audio_tagging_loss=0.01052, over 14866.00 frames. ], tot_loss[loss=0.1162, simple_loss=0.1262, pruned_loss=0.0407, audio_tagging_loss=0.01235, over 3042691.30 frames. ], batch size: 54, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:52:09,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=213966.66666666666, ans=0.0 2023-11-18 11:52:15,587 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:52:17,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=214033.33333333334, ans=0.0 2023-11-18 11:52:22,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=214033.33333333334, ans=0.0 2023-11-18 11:52:22,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=214033.33333333334, ans=0.0 2023-11-18 11:52:24,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.58 vs. limit=15.0 2023-11-18 11:52:25,525 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=3.989e-01 2023-11-18 11:52:26,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=214100.0, ans=0.1 2023-11-18 11:52:28,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=214100.0, ans=0.2 2023-11-18 11:52:30,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=214100.0, ans=0.0 2023-11-18 11:52:33,824 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.767e+01 9.594e+01 1.075e+02 1.227e+02 1.823e+02, threshold=2.150e+02, percent-clipped=0.0 2023-11-18 11:52:37,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=214100.0, ans=0.125 2023-11-18 11:52:41,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=214166.66666666666, ans=0.0 2023-11-18 11:53:00,857 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8100, loss[loss=0.1118, simple_loss=0.1187, pruned_loss=0.03893, audio_tagging_loss=0.01353, over 15522.00 frames. ], tot_loss[loss=0.1157, simple_loss=0.1259, pruned_loss=0.04048, audio_tagging_loss=0.01229, over 3039405.39 frames. ], batch size: 59, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:53:04,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=214300.0, ans=0.0 2023-11-18 11:53:10,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=214300.0, ans=0.1 2023-11-18 11:53:30,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=214433.33333333334, ans=0.0 2023-11-18 11:53:35,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=214500.0, ans=0.2 2023-11-18 11:53:38,374 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=16.98 vs. limit=15.0 2023-11-18 11:53:56,998 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8150, loss[loss=0.1091, simple_loss=0.1229, pruned_loss=0.03607, audio_tagging_loss=0.01159, over 14888.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.1259, pruned_loss=0.04024, audio_tagging_loss=0.01209, over 3039354.50 frames. ], batch size: 57, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:54:00,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=214633.33333333334, ans=0.125 2023-11-18 11:54:05,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=214633.33333333334, ans=0.0 2023-11-18 11:54:18,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=214766.66666666666, ans=6.0 2023-11-18 11:54:23,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=214766.66666666666, ans=0.125 2023-11-18 11:54:24,403 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 9.659e+01 1.081e+02 1.221e+02 1.815e+02, threshold=2.163e+02, percent-clipped=0.0 2023-11-18 11:54:30,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=214833.33333333334, ans=0.125 2023-11-18 11:54:33,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=214833.33333333334, ans=0.0 2023-11-18 11:54:36,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=214833.33333333334, ans=0.125 2023-11-18 11:54:43,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=214900.0, ans=0.125 2023-11-18 11:54:49,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=214900.0, ans=0.5 2023-11-18 11:54:53,069 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8200, loss[loss=0.1303, simple_loss=0.1603, pruned_loss=0.04122, audio_tagging_loss=0.008878, over 16507.00 frames. ], tot_loss[loss=0.1158, simple_loss=0.1267, pruned_loss=0.04053, audio_tagging_loss=0.01197, over 3042947.70 frames. ], batch size: 57, lr: 2.00e-02, grad_scale: 32.0 2023-11-18 11:54:53,096 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 11:54:55,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=214966.66666666666, ans=0.05 2023-11-18 11:55:00,695 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:55:04,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=215033.33333333334, ans=0.0 2023-11-18 11:55:15,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=215100.0, ans=0.2 2023-11-18 11:55:23,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.87 vs. limit=15.0 2023-11-18 11:55:33,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=215166.66666666666, ans=0.125 2023-11-18 11:55:37,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=215233.33333333334, ans=0.09899494936611666 2023-11-18 11:55:39,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=215233.33333333334, ans=0.125 2023-11-18 11:55:43,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=215233.33333333334, ans=0.125 2023-11-18 11:55:44,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=215233.33333333334, ans=0.0 2023-11-18 11:55:48,003 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8250, loss[loss=0.1038, simple_loss=0.1085, pruned_loss=0.03501, audio_tagging_loss=0.01455, over 16203.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.1264, pruned_loss=0.04045, audio_tagging_loss=0.01194, over 3049407.11 frames. ], batch size: 60, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:55:49,536 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.34 vs. limit=22.5 2023-11-18 11:56:14,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=215433.33333333334, ans=0.125 2023-11-18 11:56:16,461 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.670e+01 9.340e+01 1.070e+02 1.193e+02 1.705e+02, threshold=2.140e+02, percent-clipped=0.0 2023-11-18 11:56:18,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.36 vs. limit=15.0 2023-11-18 11:56:18,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=215433.33333333334, ans=0.125 2023-11-18 11:56:25,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=215500.0, ans=0.125 2023-11-18 11:56:36,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=215566.66666666666, ans=0.0 2023-11-18 11:56:38,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=215566.66666666666, ans=0.1 2023-11-18 11:56:40,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=215566.66666666666, ans=0.05 2023-11-18 11:56:43,854 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8300, loss[loss=0.1081, simple_loss=0.1121, pruned_loss=0.03781, audio_tagging_loss=0.01422, over 14467.00 frames. ], tot_loss[loss=0.1146, simple_loss=0.1254, pruned_loss=0.03995, audio_tagging_loss=0.01192, over 3044444.84 frames. ], batch size: 58, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:56:49,079 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.54 vs. limit=22.5 2023-11-18 11:56:55,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=215700.0, ans=0.1 2023-11-18 11:56:59,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=215700.0, ans=0.0 2023-11-18 11:57:06,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=215766.66666666666, ans=0.1 2023-11-18 11:57:15,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=215833.33333333334, ans=0.2 2023-11-18 11:57:21,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=215833.33333333334, ans=0.125 2023-11-18 11:57:26,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=215833.33333333334, ans=0.125 2023-11-18 11:57:39,802 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8350, loss[loss=0.117, simple_loss=0.1351, pruned_loss=0.04076, audio_tagging_loss=0.008713, over 14685.00 frames. ], tot_loss[loss=0.1146, simple_loss=0.1256, pruned_loss=0.0399, audio_tagging_loss=0.01188, over 3044888.54 frames. ], batch size: 54, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:57:42,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=215966.66666666666, ans=0.125 2023-11-18 11:57:48,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=215966.66666666666, ans=0.0 2023-11-18 11:58:05,642 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.74 vs. limit=12.0 2023-11-18 11:58:07,653 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.666e+01 9.952e+01 1.113e+02 1.251e+02 3.254e+02, threshold=2.227e+02, percent-clipped=1.0 2023-11-18 11:58:08,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=216100.0, ans=0.0 2023-11-18 11:58:16,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=216166.66666666666, ans=0.125 2023-11-18 11:58:26,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=216233.33333333334, ans=0.125 2023-11-18 11:58:35,201 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8400, loss[loss=0.132, simple_loss=0.1478, pruned_loss=0.04681, audio_tagging_loss=0.01125, over 14701.00 frames. ], tot_loss[loss=0.1145, simple_loss=0.126, pruned_loss=0.03973, audio_tagging_loss=0.01179, over 3047904.03 frames. ], batch size: 57, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:58:52,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=216366.66666666666, ans=0.125 2023-11-18 11:58:53,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=216366.66666666666, ans=0.2 2023-11-18 11:58:54,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=216366.66666666666, ans=0.125 2023-11-18 11:58:57,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=216433.33333333334, ans=0.1 2023-11-18 11:58:59,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=216433.33333333334, ans=0.05 2023-11-18 11:59:21,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=216566.66666666666, ans=0.125 2023-11-18 11:59:28,412 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:59:30,810 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8450, loss[loss=0.1421, simple_loss=0.1533, pruned_loss=0.05289, audio_tagging_loss=0.01252, over 16472.00 frames. ], tot_loss[loss=0.1152, simple_loss=0.1262, pruned_loss=0.04029, audio_tagging_loss=0.01184, over 3048706.92 frames. ], batch size: 58, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 11:59:33,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=216633.33333333334, ans=0.05 2023-11-18 11:59:33,107 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 11:59:45,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=216700.0, ans=0.2 2023-11-18 11:59:57,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.34 vs. limit=15.0 2023-11-18 11:59:58,324 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.579e+01 9.450e+01 1.064e+02 1.181e+02 2.171e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-18 12:00:10,891 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.28 vs. limit=22.5 2023-11-18 12:00:16,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=216900.0, ans=15.0 2023-11-18 12:00:17,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=216900.0, ans=0.0 2023-11-18 12:00:26,219 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8500, loss[loss=0.1104, simple_loss=0.1175, pruned_loss=0.03882, audio_tagging_loss=0.01288, over 15096.00 frames. ], tot_loss[loss=0.1157, simple_loss=0.1271, pruned_loss=0.04031, audio_tagging_loss=0.01187, over 3049817.98 frames. ], batch size: 56, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 12:00:43,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=217033.33333333334, ans=0.2 2023-11-18 12:00:56,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=217100.0, ans=0.125 2023-11-18 12:00:59,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=217166.66666666666, ans=0.125 2023-11-18 12:01:09,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=217233.33333333334, ans=0.2 2023-11-18 12:01:09,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten.whitening_limit, batch_count=217233.33333333334, ans=15.0 2023-11-18 12:01:17,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=217233.33333333334, ans=0.1 2023-11-18 12:01:21,575 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8550, loss[loss=0.1432, simple_loss=0.1541, pruned_loss=0.05047, audio_tagging_loss=0.01563, over 15818.00 frames. ], tot_loss[loss=0.1151, simple_loss=0.126, pruned_loss=0.03999, audio_tagging_loss=0.01211, over 3057735.58 frames. ], batch size: 58, lr: 1.99e-02, grad_scale: 32.0 2023-11-18 12:01:26,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=217300.0, ans=0.1 2023-11-18 12:01:47,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=217433.33333333334, ans=0.0 2023-11-18 12:01:49,700 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.522e+01 1.000e+02 1.079e+02 1.274e+02 1.597e+02, threshold=2.158e+02, percent-clipped=0.0 2023-11-18 12:01:52,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=217433.33333333334, ans=0.1 2023-11-18 12:01:58,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=217500.0, ans=0.125 2023-11-18 12:02:15,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=217566.66666666666, ans=0.125 2023-11-18 12:02:17,140 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8600, loss[loss=0.1393, simple_loss=0.152, pruned_loss=0.05141, audio_tagging_loss=0.01192, over 15596.00 frames. ], tot_loss[loss=0.1149, simple_loss=0.126, pruned_loss=0.03979, audio_tagging_loss=0.01211, over 3059169.43 frames. ], batch size: 56, lr: 1.98e-02, grad_scale: 32.0 2023-11-18 12:02:22,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=217633.33333333334, ans=0.1 2023-11-18 12:02:27,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=217633.33333333334, ans=0.125 2023-11-18 12:02:51,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=217833.33333333334, ans=0.1 2023-11-18 12:02:55,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=217833.33333333334, ans=0.1 2023-11-18 12:02:55,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=217833.33333333334, ans=10.0 2023-11-18 12:02:57,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=217833.33333333334, ans=0.125 2023-11-18 12:03:07,000 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=12.0 2023-11-18 12:03:13,323 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8650, loss[loss=0.1318, simple_loss=0.1444, pruned_loss=0.04623, audio_tagging_loss=0.01339, over 14906.00 frames. ], tot_loss[loss=0.1163, simple_loss=0.1275, pruned_loss=0.04039, audio_tagging_loss=0.0121, over 3059693.14 frames. ], batch size: 57, lr: 1.98e-02, grad_scale: 32.0 2023-11-18 12:03:17,128 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.28 vs. limit=22.5 2023-11-18 12:03:20,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.79 vs. limit=15.0 2023-11-18 12:03:34,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=218100.0, ans=0.0 2023-11-18 12:03:39,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=218100.0, ans=0.0 2023-11-18 12:03:40,756 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 9.473e+01 1.061e+02 1.180e+02 2.111e+02, threshold=2.123e+02, percent-clipped=0.0 2023-11-18 12:03:55,123 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.01 vs. limit=15.0 2023-11-18 12:03:55,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=218166.66666666666, ans=0.2 2023-11-18 12:04:02,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=218233.33333333334, ans=0.0 2023-11-18 12:04:08,813 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8700, loss[loss=0.1501, simple_loss=0.1639, pruned_loss=0.05671, audio_tagging_loss=0.0115, over 15677.00 frames. ], tot_loss[loss=0.1178, simple_loss=0.1291, pruned_loss=0.04103, audio_tagging_loss=0.01221, over 3058426.02 frames. ], batch size: 58, lr: 1.98e-02, grad_scale: 32.0 2023-11-18 12:04:32,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=218433.33333333334, ans=0.0 2023-11-18 12:04:52,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=218566.66666666666, ans=0.04949747468305833 2023-11-18 12:04:54,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=218566.66666666666, ans=0.0 2023-11-18 12:05:04,353 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8750, loss[loss=0.1409, simple_loss=0.1611, pruned_loss=0.04944, audio_tagging_loss=0.01089, over 15114.00 frames. ], tot_loss[loss=0.1177, simple_loss=0.129, pruned_loss=0.041, audio_tagging_loss=0.01219, over 3051447.87 frames. ], batch size: 55, lr: 1.98e-02, grad_scale: 32.0 2023-11-18 12:05:27,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-11-18 12:05:27,512 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.81 vs. limit=10.0 2023-11-18 12:05:32,026 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.716e+01 9.349e+01 1.069e+02 1.188e+02 1.662e+02, threshold=2.138e+02, percent-clipped=0.0 2023-11-18 12:05:34,697 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2023-11-18 12:05:40,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=218833.33333333334, ans=0.0 2023-11-18 12:05:46,466 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.48 vs. limit=12.0 2023-11-18 12:05:55,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=218900.0, ans=0.125 2023-11-18 12:06:00,709 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8800, loss[loss=0.1322, simple_loss=0.1395, pruned_loss=0.048, audio_tagging_loss=0.01442, over 15517.00 frames. ], tot_loss[loss=0.1178, simple_loss=0.1289, pruned_loss=0.0411, audio_tagging_loss=0.01229, over 3046789.29 frames. ], batch size: 57, lr: 1.98e-02, grad_scale: 64.0 2023-11-18 12:06:00,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=218966.66666666666, ans=0.07 2023-11-18 12:06:03,106 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=218966.66666666666, ans=0.2 2023-11-18 12:06:10,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=219033.33333333334, ans=0.125 2023-11-18 12:06:20,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=219033.33333333334, ans=0.1 2023-11-18 12:06:38,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=219166.66666666666, ans=0.125 2023-11-18 12:06:42,493 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.05 vs. limit=22.5 2023-11-18 12:06:55,585 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8850, loss[loss=0.09594, simple_loss=0.101, pruned_loss=0.03231, audio_tagging_loss=0.01311, over 15479.00 frames. ], tot_loss[loss=0.1169, simple_loss=0.1279, pruned_loss=0.0406, audio_tagging_loss=0.0123, over 3046774.07 frames. ], batch size: 61, lr: 1.98e-02, grad_scale: 64.0 2023-11-18 12:07:05,667 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:07:13,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=219366.66666666666, ans=0.1 2023-11-18 12:07:14,301 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=18.08 vs. limit=15.0 2023-11-18 12:07:24,033 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 9.378e+01 1.055e+02 1.190e+02 1.653e+02, threshold=2.110e+02, percent-clipped=0.0 2023-11-18 12:07:26,589 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.61 vs. limit=6.0 2023-11-18 12:07:46,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=219566.66666666666, ans=0.09899494936611666 2023-11-18 12:07:50,938 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8900, loss[loss=0.08908, simple_loss=0.1008, pruned_loss=0.02796, audio_tagging_loss=0.01072, over 14826.00 frames. ], tot_loss[loss=0.1169, simple_loss=0.1281, pruned_loss=0.04074, audio_tagging_loss=0.01217, over 3044607.58 frames. ], batch size: 58, lr: 1.98e-02, grad_scale: 64.0 2023-11-18 12:08:04,554 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:08:06,899 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.89 vs. limit=15.0 2023-11-18 12:08:11,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=219700.0, ans=0.125 2023-11-18 12:08:17,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=219766.66666666666, ans=0.1 2023-11-18 12:08:45,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=219900.0, ans=0.0 2023-11-18 12:08:47,596 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 8950, loss[loss=0.1332, simple_loss=0.1481, pruned_loss=0.04812, audio_tagging_loss=0.011, over 16600.00 frames. ], tot_loss[loss=0.1169, simple_loss=0.1285, pruned_loss=0.04068, audio_tagging_loss=0.01191, over 3049813.38 frames. ], batch size: 61, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:08:49,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.85 vs. limit=22.5 2023-11-18 12:08:53,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=219966.66666666666, ans=0.125 2023-11-18 12:09:10,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_na.min_abs, batch_count=220100.0, ans=0.02 2023-11-18 12:09:12,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=220100.0, ans=0.1 2023-11-18 12:09:13,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=220100.0, ans=0.1 2023-11-18 12:09:14,044 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.540e+01 9.669e+01 1.057e+02 1.154e+02 1.635e+02, threshold=2.114e+02, percent-clipped=0.0 2023-11-18 12:09:14,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=220100.0, ans=0.0 2023-11-18 12:09:14,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=220100.0, ans=0.0 2023-11-18 12:09:14,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=220100.0, ans=0.2 2023-11-18 12:09:31,915 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2023-11-18 12:09:37,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=220233.33333333334, ans=0.0 2023-11-18 12:09:41,980 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9000, loss[loss=0.09582, simple_loss=0.1042, pruned_loss=0.02877, audio_tagging_loss=0.01493, over 15108.00 frames. ], tot_loss[loss=0.1168, simple_loss=0.1283, pruned_loss=0.04082, audio_tagging_loss=0.01177, over 3056442.62 frames. ], batch size: 59, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:09:41,982 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 12:10:06,069 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7526, 5.7361, 5.8293, 5.8785], device='cuda:0') 2023-11-18 12:10:14,817 INFO [train_asr.py:1147] (0/4) Epoch 3, validation: loss=0.07901, simple_loss=0.06429, pruned_loss=0.01152, audio_tagging_loss=0.03534, over 4681554.00 frames. 2023-11-18 12:10:14,817 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 12:10:19,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=220300.0, ans=0.0 2023-11-18 12:10:25,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=220366.66666666666, ans=0.0 2023-11-18 12:10:26,421 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.64 vs. limit=15.0 2023-11-18 12:10:27,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=220366.66666666666, ans=0.0 2023-11-18 12:10:37,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=220433.33333333334, ans=0.125 2023-11-18 12:10:38,795 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.86 vs. limit=22.5 2023-11-18 12:10:47,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.02 vs. limit=15.0 2023-11-18 12:10:50,159 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:10:53,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=220500.0, ans=0.125 2023-11-18 12:10:57,860 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.95 vs. limit=15.0 2023-11-18 12:10:58,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=220566.66666666666, ans=0.025 2023-11-18 12:11:09,387 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9050, loss[loss=0.1319, simple_loss=0.1514, pruned_loss=0.04542, audio_tagging_loss=0.01082, over 16534.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1274, pruned_loss=0.0405, audio_tagging_loss=0.01169, over 3063304.94 frames. ], batch size: 59, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:11:10,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=220633.33333333334, ans=0.125 2023-11-18 12:11:13,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=220633.33333333334, ans=0.125 2023-11-18 12:11:15,103 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.21 vs. limit=10.0 2023-11-18 12:11:21,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=220700.0, ans=0.125 2023-11-18 12:11:29,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=220766.66666666666, ans=0.025 2023-11-18 12:11:36,245 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.611e+01 9.522e+01 1.061e+02 1.198e+02 2.427e+02, threshold=2.123e+02, percent-clipped=1.0 2023-11-18 12:11:39,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=220766.66666666666, ans=0.0 2023-11-18 12:11:40,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=220766.66666666666, ans=0.0 2023-11-18 12:12:04,411 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9100, loss[loss=0.1198, simple_loss=0.1328, pruned_loss=0.04337, audio_tagging_loss=0.01005, over 16559.00 frames. ], tot_loss[loss=0.1165, simple_loss=0.1282, pruned_loss=0.04078, audio_tagging_loss=0.01159, over 3063435.80 frames. ], batch size: 61, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:12:05,049 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2023-11-18 12:12:06,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=220966.66666666666, ans=0.125 2023-11-18 12:12:14,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=221033.33333333334, ans=0.0 2023-11-18 12:12:31,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.98 vs. limit=15.0 2023-11-18 12:12:36,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=221100.0, ans=0.125 2023-11-18 12:12:45,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=221166.66666666666, ans=0.125 2023-11-18 12:13:00,119 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9150, loss[loss=0.08105, simple_loss=0.09349, pruned_loss=0.0215, audio_tagging_loss=0.0128, over 15955.00 frames. ], tot_loss[loss=0.1157, simple_loss=0.1272, pruned_loss=0.04036, audio_tagging_loss=0.01174, over 3060691.29 frames. ], batch size: 61, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:13:03,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=221300.0, ans=0.125 2023-11-18 12:13:05,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=221300.0, ans=0.0 2023-11-18 12:13:19,362 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2023-11-18 12:13:28,253 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.044e+01 9.509e+01 1.025e+02 1.123e+02 1.698e+02, threshold=2.050e+02, percent-clipped=0.0 2023-11-18 12:13:31,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=221433.33333333334, ans=0.2 2023-11-18 12:13:57,068 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9200, loss[loss=0.09962, simple_loss=0.1207, pruned_loss=0.02913, audio_tagging_loss=0.01012, over 15981.00 frames. ], tot_loss[loss=0.1155, simple_loss=0.1268, pruned_loss=0.04042, audio_tagging_loss=0.01169, over 3063840.16 frames. ], batch size: 60, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:13:59,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=221633.33333333334, ans=0.07 2023-11-18 12:13:59,698 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=14.44 vs. limit=15.0 2023-11-18 12:13:59,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-18 12:14:06,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=221700.0, ans=0.0 2023-11-18 12:14:09,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=221700.0, ans=0.125 2023-11-18 12:14:36,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=221833.33333333334, ans=0.2 2023-11-18 12:14:51,677 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9250, loss[loss=0.08093, simple_loss=0.08383, pruned_loss=0.02413, audio_tagging_loss=0.01489, over 14287.00 frames. ], tot_loss[loss=0.1158, simple_loss=0.1271, pruned_loss=0.04049, audio_tagging_loss=0.01172, over 3060176.89 frames. ], batch size: 56, lr: 1.97e-02, grad_scale: 64.0 2023-11-18 12:14:56,509 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.50 vs. limit=22.5 2023-11-18 12:15:11,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=222033.33333333334, ans=0.0 2023-11-18 12:15:20,387 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.922e+01 9.718e+01 1.095e+02 1.245e+02 2.428e+02, threshold=2.190e+02, percent-clipped=1.0 2023-11-18 12:15:21,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=222100.0, ans=0.125 2023-11-18 12:15:30,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=222166.66666666666, ans=0.125 2023-11-18 12:15:35,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=222233.33333333334, ans=0.125 2023-11-18 12:15:44,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=222233.33333333334, ans=0.1 2023-11-18 12:15:46,850 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9300, loss[loss=0.129, simple_loss=0.1368, pruned_loss=0.04708, audio_tagging_loss=0.01355, over 16076.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1273, pruned_loss=0.04045, audio_tagging_loss=0.01184, over 3061683.10 frames. ], batch size: 60, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:15:49,791 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:15:53,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=222300.0, ans=0.125 2023-11-18 12:15:54,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=222300.0, ans=0.0 2023-11-18 12:15:58,507 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2023-11-18 12:15:59,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=222366.66666666666, ans=0.0 2023-11-18 12:16:02,517 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.82 vs. limit=22.5 2023-11-18 12:16:03,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=222366.66666666666, ans=12.0 2023-11-18 12:16:17,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=222433.33333333334, ans=0.125 2023-11-18 12:16:22,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=222500.0, ans=0.125 2023-11-18 12:16:23,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=222500.0, ans=0.1 2023-11-18 12:16:29,869 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.32 vs. limit=15.0 2023-11-18 12:16:36,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=222566.66666666666, ans=0.07 2023-11-18 12:16:36,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=222566.66666666666, ans=0.125 2023-11-18 12:16:37,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.94 vs. limit=15.0 2023-11-18 12:16:43,428 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9350, loss[loss=0.1174, simple_loss=0.1271, pruned_loss=0.04013, audio_tagging_loss=0.01376, over 14225.00 frames. ], tot_loss[loss=0.1158, simple_loss=0.1269, pruned_loss=0.0404, audio_tagging_loss=0.01192, over 3060439.07 frames. ], batch size: 55, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:16:54,490 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.56 vs. limit=22.5 2023-11-18 12:16:55,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=222700.0, ans=0.0 2023-11-18 12:17:06,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=222766.66666666666, ans=0.1 2023-11-18 12:17:10,516 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.296e+01 9.866e+01 1.128e+02 1.276e+02 1.788e+02, threshold=2.257e+02, percent-clipped=0.0 2023-11-18 12:17:13,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=222766.66666666666, ans=0.1 2023-11-18 12:17:25,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=222833.33333333334, ans=0.0 2023-11-18 12:17:39,409 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9400, loss[loss=0.12, simple_loss=0.1256, pruned_loss=0.04335, audio_tagging_loss=0.01387, over 15664.00 frames. ], tot_loss[loss=0.1162, simple_loss=0.1272, pruned_loss=0.0407, audio_tagging_loss=0.01188, over 3056939.91 frames. ], batch size: 56, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:17:44,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=222966.66666666666, ans=0.0 2023-11-18 12:18:32,267 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:18:34,404 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9450, loss[loss=0.1275, simple_loss=0.1499, pruned_loss=0.04372, audio_tagging_loss=0.008859, over 15169.00 frames. ], tot_loss[loss=0.116, simple_loss=0.1271, pruned_loss=0.04062, audio_tagging_loss=0.01188, over 3058388.12 frames. ], batch size: 53, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:18:38,386 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:18:45,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=223366.66666666666, ans=0.125 2023-11-18 12:18:47,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=223366.66666666666, ans=0.125 2023-11-18 12:19:00,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=223433.33333333334, ans=0.2 2023-11-18 12:19:03,046 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 9.898e+01 1.046e+02 1.175e+02 1.737e+02, threshold=2.092e+02, percent-clipped=0.0 2023-11-18 12:19:10,018 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.26 vs. limit=15.0 2023-11-18 12:19:14,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=223500.0, ans=0.2 2023-11-18 12:19:31,343 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9500, loss[loss=0.0847, simple_loss=0.08973, pruned_loss=0.02445, audio_tagging_loss=0.01538, over 14377.00 frames. ], tot_loss[loss=0.1157, simple_loss=0.1269, pruned_loss=0.04043, audio_tagging_loss=0.01182, over 3056621.71 frames. ], batch size: 55, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:19:32,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=223633.33333333334, ans=0.0 2023-11-18 12:19:46,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=223700.0, ans=0.125 2023-11-18 12:20:02,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=223766.66666666666, ans=0.125 2023-11-18 12:20:20,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=223900.0, ans=0.125 2023-11-18 12:20:27,318 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9550, loss[loss=0.1158, simple_loss=0.1163, pruned_loss=0.04473, audio_tagging_loss=0.01288, over 14980.00 frames. ], tot_loss[loss=0.1165, simple_loss=0.1278, pruned_loss=0.04073, audio_tagging_loss=0.01189, over 3050999.47 frames. ], batch size: 55, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:20:34,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=223966.66666666666, ans=0.0 2023-11-18 12:20:35,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=223966.66666666666, ans=0.0 2023-11-18 12:20:55,688 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.492e+01 1.006e+02 1.117e+02 1.248e+02 1.898e+02, threshold=2.233e+02, percent-clipped=0.0 2023-11-18 12:21:02,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=224166.66666666666, ans=0.125 2023-11-18 12:21:08,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=224166.66666666666, ans=0.125 2023-11-18 12:21:10,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2023-11-18 12:21:18,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.19 vs. limit=6.0 2023-11-18 12:21:22,538 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9600, loss[loss=0.08409, simple_loss=0.0859, pruned_loss=0.02758, audio_tagging_loss=0.01356, over 15730.00 frames. ], tot_loss[loss=0.1158, simple_loss=0.1269, pruned_loss=0.04035, audio_tagging_loss=0.01199, over 3051618.56 frames. ], batch size: 61, lr: 1.96e-02, grad_scale: 64.0 2023-11-18 12:22:18,508 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9650, loss[loss=0.1089, simple_loss=0.1143, pruned_loss=0.04122, audio_tagging_loss=0.01059, over 15023.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.1266, pruned_loss=0.0402, audio_tagging_loss=0.01207, over 3048634.70 frames. ], batch size: 58, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:22:30,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=224700.0, ans=0.125 2023-11-18 12:22:45,985 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.807e+01 9.215e+01 1.048e+02 1.170e+02 1.955e+02, threshold=2.095e+02, percent-clipped=0.0 2023-11-18 12:22:46,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=224766.66666666666, ans=0.125 2023-11-18 12:22:54,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=224833.33333333334, ans=0.0 2023-11-18 12:22:59,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=224833.33333333334, ans=0.0 2023-11-18 12:23:08,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=224900.0, ans=0.0 2023-11-18 12:23:14,156 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9700, loss[loss=0.08714, simple_loss=0.09467, pruned_loss=0.03089, audio_tagging_loss=0.008916, over 14237.00 frames. ], tot_loss[loss=0.1152, simple_loss=0.1267, pruned_loss=0.04003, audio_tagging_loss=0.01185, over 3043714.47 frames. ], batch size: 57, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:23:25,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=225033.33333333334, ans=0.1 2023-11-18 12:23:37,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=225100.0, ans=0.025 2023-11-18 12:23:42,616 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.53 vs. limit=15.0 2023-11-18 12:23:52,692 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2023-11-18 12:24:09,672 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9750, loss[loss=0.0931, simple_loss=0.1017, pruned_loss=0.02997, audio_tagging_loss=0.01228, over 16162.00 frames. ], tot_loss[loss=0.114, simple_loss=0.1255, pruned_loss=0.03952, audio_tagging_loss=0.01174, over 3042938.87 frames. ], batch size: 61, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:24:15,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=4.36 vs. limit=15.0 2023-11-18 12:24:31,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=225433.33333333334, ans=0.1 2023-11-18 12:24:38,106 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.639e+01 9.876e+01 1.096e+02 1.307e+02 1.863e+02, threshold=2.192e+02, percent-clipped=0.0 2023-11-18 12:24:50,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=225500.0, ans=0.04949747468305833 2023-11-18 12:24:57,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=225566.66666666666, ans=0.0 2023-11-18 12:25:03,464 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.06 vs. limit=15.0 2023-11-18 12:25:04,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=225566.66666666666, ans=0.125 2023-11-18 12:25:06,223 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9800, loss[loss=0.09893, simple_loss=0.114, pruned_loss=0.03028, audio_tagging_loss=0.01164, over 15487.00 frames. ], tot_loss[loss=0.114, simple_loss=0.1253, pruned_loss=0.0395, audio_tagging_loss=0.01183, over 3042410.06 frames. ], batch size: 58, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:25:07,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=225633.33333333334, ans=0.125 2023-11-18 12:25:08,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=225633.33333333334, ans=0.125 2023-11-18 12:25:09,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=225633.33333333334, ans=0.1 2023-11-18 12:25:11,556 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2023-11-18 12:25:30,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=225766.66666666666, ans=0.0 2023-11-18 12:25:46,014 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.30 vs. limit=15.0 2023-11-18 12:25:46,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.10 vs. limit=15.0 2023-11-18 12:25:55,525 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:26:01,875 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9850, loss[loss=0.1333, simple_loss=0.1422, pruned_loss=0.0462, audio_tagging_loss=0.01598, over 14486.00 frames. ], tot_loss[loss=0.1144, simple_loss=0.1258, pruned_loss=0.03972, audio_tagging_loss=0.01177, over 3040871.39 frames. ], batch size: 55, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:26:08,450 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:26:10,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=225966.66666666666, ans=0.1 2023-11-18 12:26:16,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=226033.33333333334, ans=0.125 2023-11-18 12:26:19,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=226033.33333333334, ans=0.125 2023-11-18 12:26:21,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=226033.33333333334, ans=0.125 2023-11-18 12:26:21,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=226033.33333333334, ans=0.125 2023-11-18 12:26:27,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=226100.0, ans=0.2 2023-11-18 12:26:29,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.59 vs. limit=22.5 2023-11-18 12:26:30,008 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.673e+01 9.358e+01 1.054e+02 1.143e+02 1.553e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 12:26:41,680 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.59 vs. limit=15.0 2023-11-18 12:26:47,881 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.13 vs. limit=15.0 2023-11-18 12:26:49,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=226233.33333333334, ans=0.125 2023-11-18 12:26:57,549 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9900, loss[loss=0.0756, simple_loss=0.08297, pruned_loss=0.01758, audio_tagging_loss=0.01654, over 14606.00 frames. ], tot_loss[loss=0.1143, simple_loss=0.1258, pruned_loss=0.03969, audio_tagging_loss=0.01176, over 3042806.39 frames. ], batch size: 56, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:27:13,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=226366.66666666666, ans=0.125 2023-11-18 12:27:23,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.79 vs. limit=15.0 2023-11-18 12:27:44,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=226566.66666666666, ans=0.1 2023-11-18 12:27:53,592 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 9950, loss[loss=0.1148, simple_loss=0.1248, pruned_loss=0.03925, audio_tagging_loss=0.01318, over 15626.00 frames. ], tot_loss[loss=0.1142, simple_loss=0.1257, pruned_loss=0.03955, audio_tagging_loss=0.01178, over 3045594.13 frames. ], batch size: 56, lr: 1.95e-02, grad_scale: 64.0 2023-11-18 12:27:59,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=226633.33333333334, ans=0.0 2023-11-18 12:28:14,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=226766.66666666666, ans=0.125 2023-11-18 12:28:20,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=226766.66666666666, ans=0.2 2023-11-18 12:28:20,833 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.751e+01 9.736e+01 1.077e+02 1.172e+02 1.969e+02, threshold=2.153e+02, percent-clipped=0.0 2023-11-18 12:28:23,818 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:28:36,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=226833.33333333334, ans=0.125 2023-11-18 12:28:37,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=226900.0, ans=0.125 2023-11-18 12:28:37,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=226900.0, ans=15.0 2023-11-18 12:28:44,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=226900.0, ans=0.07 2023-11-18 12:28:49,510 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10000, loss[loss=0.08315, simple_loss=0.09578, pruned_loss=0.02643, audio_tagging_loss=0.008822, over 14473.00 frames. ], tot_loss[loss=0.1133, simple_loss=0.1248, pruned_loss=0.03905, audio_tagging_loss=0.01185, over 3056002.51 frames. ], batch size: 56, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:28:51,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=226966.66666666666, ans=0.125 2023-11-18 12:28:52,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.86 vs. limit=15.0 2023-11-18 12:29:11,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=227100.0, ans=0.125 2023-11-18 12:29:15,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=227100.0, ans=0.125 2023-11-18 12:29:17,183 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=24.87 vs. limit=22.5 2023-11-18 12:29:19,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=227100.0, ans=15.0 2023-11-18 12:29:38,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=227233.33333333334, ans=0.125 2023-11-18 12:29:38,870 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.95 vs. limit=12.0 2023-11-18 12:29:44,583 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10050, loss[loss=0.1094, simple_loss=0.1254, pruned_loss=0.03554, audio_tagging_loss=0.01116, over 15021.00 frames. ], tot_loss[loss=0.1135, simple_loss=0.1251, pruned_loss=0.0391, audio_tagging_loss=0.01189, over 3057296.60 frames. ], batch size: 57, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:29:56,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=227366.66666666666, ans=0.125 2023-11-18 12:29:57,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=227366.66666666666, ans=0.0 2023-11-18 12:30:02,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=227366.66666666666, ans=0.04949747468305833 2023-11-18 12:30:13,277 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 9.408e+01 1.046e+02 1.122e+02 1.934e+02, threshold=2.091e+02, percent-clipped=0.0 2023-11-18 12:30:19,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=227500.0, ans=0.0 2023-11-18 12:30:21,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=227500.0, ans=0.1 2023-11-18 12:30:23,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.52 vs. limit=10.0 2023-11-18 12:30:41,434 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10100, loss[loss=0.1035, simple_loss=0.09952, pruned_loss=0.03539, audio_tagging_loss=0.01835, over 14605.00 frames. ], tot_loss[loss=0.1144, simple_loss=0.1258, pruned_loss=0.03954, audio_tagging_loss=0.01194, over 3055034.91 frames. ], batch size: 56, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:31:11,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=227766.66666666666, ans=0.125 2023-11-18 12:31:19,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=227833.33333333334, ans=0.0 2023-11-18 12:31:25,793 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:31:36,817 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10150, loss[loss=0.1005, simple_loss=0.1043, pruned_loss=0.03504, audio_tagging_loss=0.01331, over 15421.00 frames. ], tot_loss[loss=0.1159, simple_loss=0.1272, pruned_loss=0.0403, audio_tagging_loss=0.01202, over 3046758.20 frames. ], batch size: 57, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:31:40,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=227966.66666666666, ans=10.0 2023-11-18 12:32:01,844 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:32:04,448 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.369e+01 9.918e+01 1.075e+02 1.229e+02 2.012e+02, threshold=2.150e+02, percent-clipped=0.0 2023-11-18 12:32:15,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=228166.66666666666, ans=0.2 2023-11-18 12:32:30,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=228233.33333333334, ans=0.1 2023-11-18 12:32:32,043 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10200, loss[loss=0.1243, simple_loss=0.1446, pruned_loss=0.04305, audio_tagging_loss=0.008988, over 16178.00 frames. ], tot_loss[loss=0.1152, simple_loss=0.1263, pruned_loss=0.03992, audio_tagging_loss=0.01212, over 3048994.50 frames. ], batch size: 60, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:32:52,189 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:32:58,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=228433.33333333334, ans=0.125 2023-11-18 12:33:06,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.00 vs. limit=15.0 2023-11-18 12:33:12,223 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.07 vs. limit=22.5 2023-11-18 12:33:27,635 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10250, loss[loss=0.09893, simple_loss=0.1035, pruned_loss=0.03446, audio_tagging_loss=0.01272, over 15192.00 frames. ], tot_loss[loss=0.1154, simple_loss=0.1266, pruned_loss=0.03992, audio_tagging_loss=0.01219, over 3052138.75 frames. ], batch size: 58, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:33:35,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=228633.33333333334, ans=0.125 2023-11-18 12:33:48,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2023-11-18 12:33:51,558 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.93 vs. limit=15.0 2023-11-18 12:33:55,204 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.485e+01 9.506e+01 1.026e+02 1.188e+02 1.534e+02, threshold=2.052e+02, percent-clipped=0.0 2023-11-18 12:34:23,495 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10300, loss[loss=0.1166, simple_loss=0.1334, pruned_loss=0.03718, audio_tagging_loss=0.01274, over 15259.00 frames. ], tot_loss[loss=0.1152, simple_loss=0.1262, pruned_loss=0.03986, audio_tagging_loss=0.01224, over 3050868.68 frames. ], batch size: 56, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:34:24,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.14 vs. limit=15.0 2023-11-18 12:34:37,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=229033.33333333334, ans=0.125 2023-11-18 12:34:41,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=229033.33333333334, ans=0.125 2023-11-18 12:34:45,503 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.26 vs. limit=12.0 2023-11-18 12:34:47,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=229100.0, ans=0.125 2023-11-18 12:34:50,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=229100.0, ans=0.125 2023-11-18 12:35:02,596 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.71 vs. limit=10.0 2023-11-18 12:35:07,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=229233.33333333334, ans=0.2 2023-11-18 12:35:13,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=229233.33333333334, ans=0.0 2023-11-18 12:35:15,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=229233.33333333334, ans=0.1 2023-11-18 12:35:18,586 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10350, loss[loss=0.1306, simple_loss=0.1404, pruned_loss=0.04856, audio_tagging_loss=0.0118, over 14896.00 frames. ], tot_loss[loss=0.115, simple_loss=0.1259, pruned_loss=0.03969, audio_tagging_loss=0.01233, over 3048418.45 frames. ], batch size: 56, lr: 1.94e-02, grad_scale: 64.0 2023-11-18 12:35:20,933 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=1.041e-02 2023-11-18 12:35:28,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=229366.66666666666, ans=0.0 2023-11-18 12:35:35,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=229366.66666666666, ans=0.125 2023-11-18 12:35:47,380 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.929e+01 9.749e+01 1.144e+02 1.287e+02 1.806e+02, threshold=2.288e+02, percent-clipped=0.0 2023-11-18 12:36:06,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=229566.66666666666, ans=0.0 2023-11-18 12:36:11,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.14 vs. limit=15.0 2023-11-18 12:36:14,267 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10400, loss[loss=0.109, simple_loss=0.124, pruned_loss=0.03563, audio_tagging_loss=0.01138, over 14023.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.126, pruned_loss=0.03997, audio_tagging_loss=0.01232, over 3042868.89 frames. ], batch size: 53, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:36:19,501 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.26 vs. limit=22.5 2023-11-18 12:36:20,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=229633.33333333334, ans=0.2 2023-11-18 12:36:50,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=229833.33333333334, ans=0.125 2023-11-18 12:36:59,004 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.62 vs. limit=15.0 2023-11-18 12:37:00,739 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=15.0 2023-11-18 12:37:06,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.56 vs. limit=15.0 2023-11-18 12:37:07,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=229900.0, ans=0.2 2023-11-18 12:37:10,531 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10450, loss[loss=0.143, simple_loss=0.1667, pruned_loss=0.05109, audio_tagging_loss=0.00857, over 16045.00 frames. ], tot_loss[loss=0.1145, simple_loss=0.1252, pruned_loss=0.03956, audio_tagging_loss=0.0123, over 3036972.72 frames. ], batch size: 56, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:37:21,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=230033.33333333334, ans=0.2 2023-11-18 12:37:23,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=230033.33333333334, ans=0.0 2023-11-18 12:37:24,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=230033.33333333334, ans=0.0 2023-11-18 12:37:37,310 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.707e+01 9.254e+01 9.863e+01 1.141e+02 1.786e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 12:37:50,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=230166.66666666666, ans=0.2 2023-11-18 12:38:04,724 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.59 vs. limit=22.5 2023-11-18 12:38:05,268 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10500, loss[loss=0.1086, simple_loss=0.1328, pruned_loss=0.03451, audio_tagging_loss=0.007693, over 16019.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.1264, pruned_loss=0.0401, audio_tagging_loss=0.01204, over 3038569.94 frames. ], batch size: 56, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:38:10,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=230300.0, ans=0.125 2023-11-18 12:38:11,122 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.03 vs. limit=6.0 2023-11-18 12:38:16,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=230366.66666666666, ans=0.125 2023-11-18 12:38:49,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=230566.66666666666, ans=0.125 2023-11-18 12:39:00,317 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10550, loss[loss=0.1072, simple_loss=0.1183, pruned_loss=0.03513, audio_tagging_loss=0.01295, over 15185.00 frames. ], tot_loss[loss=0.1149, simple_loss=0.1261, pruned_loss=0.03988, audio_tagging_loss=0.01194, over 3041735.31 frames. ], batch size: 56, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:39:13,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=230700.0, ans=0.1 2023-11-18 12:39:22,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=230700.0, ans=0.125 2023-11-18 12:39:29,328 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 9.555e+01 1.067e+02 1.229e+02 1.948e+02, threshold=2.135e+02, percent-clipped=0.0 2023-11-18 12:39:52,126 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.28 vs. limit=15.0 2023-11-18 12:39:56,905 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10600, loss[loss=0.1471, simple_loss=0.1719, pruned_loss=0.05171, audio_tagging_loss=0.009404, over 14476.00 frames. ], tot_loss[loss=0.1148, simple_loss=0.1265, pruned_loss=0.03973, audio_tagging_loss=0.01184, over 3038053.73 frames. ], batch size: 52, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:40:01,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=230966.66666666666, ans=0.1 2023-11-18 12:40:11,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=231033.33333333334, ans=0.125 2023-11-18 12:40:21,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=231100.0, ans=0.125 2023-11-18 12:40:26,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2023-11-18 12:40:32,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=231166.66666666666, ans=0.1 2023-11-18 12:40:35,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=231166.66666666666, ans=0.125 2023-11-18 12:40:38,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=231166.66666666666, ans=0.0 2023-11-18 12:40:41,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.18 vs. limit=22.5 2023-11-18 12:40:42,423 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.85 vs. limit=22.5 2023-11-18 12:40:52,914 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10650, loss[loss=0.1366, simple_loss=0.1611, pruned_loss=0.04814, audio_tagging_loss=0.007887, over 15655.00 frames. ], tot_loss[loss=0.115, simple_loss=0.1267, pruned_loss=0.03981, audio_tagging_loss=0.0118, over 3039120.52 frames. ], batch size: 58, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:41:09,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=231366.66666666666, ans=0.125 2023-11-18 12:41:10,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=231366.66666666666, ans=0.125 2023-11-18 12:41:20,428 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.385e+01 9.562e+01 1.044e+02 1.196e+02 1.427e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 12:41:32,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.min_positive, batch_count=231500.0, ans=0.025 2023-11-18 12:41:43,475 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.97 vs. limit=22.5 2023-11-18 12:41:48,148 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10700, loss[loss=0.1223, simple_loss=0.1266, pruned_loss=0.04328, audio_tagging_loss=0.01568, over 14402.00 frames. ], tot_loss[loss=0.1152, simple_loss=0.1269, pruned_loss=0.04004, audio_tagging_loss=0.01174, over 3038279.54 frames. ], batch size: 54, lr: 1.93e-02, grad_scale: 64.0 2023-11-18 12:41:53,667 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.87 vs. limit=15.0 2023-11-18 12:42:17,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=231766.66666666666, ans=0.125 2023-11-18 12:42:28,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=231833.33333333334, ans=0.125 2023-11-18 12:42:44,284 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10750, loss[loss=0.1017, simple_loss=0.09945, pruned_loss=0.03383, audio_tagging_loss=0.01812, over 14768.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.1269, pruned_loss=0.04014, audio_tagging_loss=0.01177, over 3038228.40 frames. ], batch size: 59, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:43:05,568 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.92 vs. limit=22.5 2023-11-18 12:43:11,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=232100.0, ans=10.0 2023-11-18 12:43:12,206 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.130e+01 9.211e+01 1.033e+02 1.162e+02 1.735e+02, threshold=2.066e+02, percent-clipped=0.0 2023-11-18 12:43:13,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=232100.0, ans=0.0 2023-11-18 12:43:28,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=232233.33333333334, ans=0.2 2023-11-18 12:43:40,824 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10800, loss[loss=0.1321, simple_loss=0.1487, pruned_loss=0.04568, audio_tagging_loss=0.01203, over 15427.00 frames. ], tot_loss[loss=0.1155, simple_loss=0.1274, pruned_loss=0.04006, audio_tagging_loss=0.01171, over 3044293.53 frames. ], batch size: 56, lr: 1.92e-02, grad_scale: 128.0 2023-11-18 12:44:35,722 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10850, loss[loss=0.1013, simple_loss=0.1192, pruned_loss=0.03025, audio_tagging_loss=0.01148, over 15213.00 frames. ], tot_loss[loss=0.1153, simple_loss=0.1273, pruned_loss=0.03982, audio_tagging_loss=0.01182, over 3045031.24 frames. ], batch size: 56, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:44:43,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=232633.33333333334, ans=0.0 2023-11-18 12:44:45,602 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.86 vs. limit=15.0 2023-11-18 12:44:48,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=232700.0, ans=0.0 2023-11-18 12:44:58,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=232766.66666666666, ans=0.125 2023-11-18 12:45:04,522 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.924e+01 9.821e+01 1.081e+02 1.220e+02 1.822e+02, threshold=2.162e+02, percent-clipped=0.0 2023-11-18 12:45:27,313 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:45:31,520 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10900, loss[loss=0.1135, simple_loss=0.1268, pruned_loss=0.03945, audio_tagging_loss=0.01065, over 15252.00 frames. ], tot_loss[loss=0.1146, simple_loss=0.1268, pruned_loss=0.03943, audio_tagging_loss=0.01179, over 3050746.02 frames. ], batch size: 56, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:45:42,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=233033.33333333334, ans=0.125 2023-11-18 12:45:49,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=233033.33333333334, ans=0.1 2023-11-18 12:46:14,601 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.54 vs. limit=10.0 2023-11-18 12:46:20,162 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:46:27,307 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 10950, loss[loss=0.1294, simple_loss=0.1403, pruned_loss=0.04913, audio_tagging_loss=0.01017, over 14789.00 frames. ], tot_loss[loss=0.1141, simple_loss=0.1255, pruned_loss=0.03932, audio_tagging_loss=0.01202, over 3042832.59 frames. ], batch size: 57, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:46:42,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=233366.66666666666, ans=0.125 2023-11-18 12:46:52,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=233433.33333333334, ans=0.125 2023-11-18 12:46:56,639 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 9.530e+01 1.056e+02 1.170e+02 1.707e+02, threshold=2.112e+02, percent-clipped=0.0 2023-11-18 12:47:14,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=233566.66666666666, ans=0.1 2023-11-18 12:47:23,193 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11000, loss[loss=0.1087, simple_loss=0.121, pruned_loss=0.03973, audio_tagging_loss=0.008421, over 15185.00 frames. ], tot_loss[loss=0.1142, simple_loss=0.1254, pruned_loss=0.03952, audio_tagging_loss=0.01199, over 3036230.77 frames. ], batch size: 56, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:47:32,152 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:48:05,803 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.26 vs. limit=6.0 2023-11-18 12:48:16,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=233900.0, ans=0.0 2023-11-18 12:48:17,372 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.95 vs. limit=15.0 2023-11-18 12:48:19,019 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11050, loss[loss=0.1286, simple_loss=0.1458, pruned_loss=0.04244, audio_tagging_loss=0.01325, over 15742.00 frames. ], tot_loss[loss=0.114, simple_loss=0.1254, pruned_loss=0.03921, audio_tagging_loss=0.01208, over 3038086.11 frames. ], batch size: 58, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:48:34,997 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.94 vs. limit=15.0 2023-11-18 12:48:41,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=15.0 2023-11-18 12:48:47,582 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.482e+01 9.699e+01 1.056e+02 1.206e+02 1.867e+02, threshold=2.111e+02, percent-clipped=0.0 2023-11-18 12:48:48,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=234100.0, ans=0.2 2023-11-18 12:49:00,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=234166.66666666666, ans=0.125 2023-11-18 12:49:05,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=234233.33333333334, ans=0.0 2023-11-18 12:49:14,674 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11100, loss[loss=0.1244, simple_loss=0.1398, pruned_loss=0.04358, audio_tagging_loss=0.01092, over 14574.00 frames. ], tot_loss[loss=0.1147, simple_loss=0.126, pruned_loss=0.0396, audio_tagging_loss=0.01217, over 3034076.70 frames. ], batch size: 54, lr: 1.92e-02, grad_scale: 64.0 2023-11-18 12:49:49,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=234500.0, ans=0.125 2023-11-18 12:49:50,915 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.02 vs. limit=22.5 2023-11-18 12:50:09,610 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11150, loss[loss=0.1127, simple_loss=0.125, pruned_loss=0.03867, audio_tagging_loss=0.01154, over 15714.00 frames. ], tot_loss[loss=0.1158, simple_loss=0.1269, pruned_loss=0.04017, audio_tagging_loss=0.01217, over 3043025.93 frames. ], batch size: 58, lr: 1.91e-02, grad_scale: 64.0 2023-11-18 12:50:12,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=234633.33333333334, ans=0.125 2023-11-18 12:50:14,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=234633.33333333334, ans=0.015 2023-11-18 12:50:16,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=234633.33333333334, ans=0.0 2023-11-18 12:50:28,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=234700.0, ans=0.125 2023-11-18 12:50:39,028 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.57 vs. limit=12.0 2023-11-18 12:50:39,491 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.809e+01 9.879e+01 1.104e+02 1.293e+02 2.710e+02, threshold=2.209e+02, percent-clipped=1.0 2023-11-18 12:50:39,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=234766.66666666666, ans=0.0 2023-11-18 12:50:42,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=234833.33333333334, ans=0.125 2023-11-18 12:50:43,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=234833.33333333334, ans=0.125 2023-11-18 12:50:55,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=234900.0, ans=0.125 2023-11-18 12:50:58,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=234900.0, ans=0.1 2023-11-18 12:51:06,388 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11200, loss[loss=0.1254, simple_loss=0.1254, pruned_loss=0.04847, audio_tagging_loss=0.01422, over 15829.00 frames. ], tot_loss[loss=0.1156, simple_loss=0.1267, pruned_loss=0.04003, audio_tagging_loss=0.01222, over 3050373.93 frames. ], batch size: 60, lr: 1.91e-02, grad_scale: 64.0 2023-11-18 12:51:32,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2023-11-18 12:51:49,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=235233.33333333334, ans=0.125 2023-11-18 12:51:55,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=235233.33333333334, ans=0.125 2023-11-18 12:51:55,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=235233.33333333334, ans=0.0 2023-11-18 12:52:01,481 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11250, loss[loss=0.1108, simple_loss=0.1283, pruned_loss=0.03504, audio_tagging_loss=0.01158, over 15029.00 frames. ], tot_loss[loss=0.1144, simple_loss=0.1252, pruned_loss=0.03962, audio_tagging_loss=0.01217, over 3038866.99 frames. ], batch size: 56, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:52:08,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=235300.0, ans=0.0 2023-11-18 12:52:31,472 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.992e+01 9.651e+01 1.080e+02 1.306e+02 2.369e+02, threshold=2.160e+02, percent-clipped=1.0 2023-11-18 12:52:32,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-11-18 12:52:39,663 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 12:52:56,425 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11300, loss[loss=0.1533, simple_loss=0.1771, pruned_loss=0.05656, audio_tagging_loss=0.008206, over 16113.00 frames. ], tot_loss[loss=0.1136, simple_loss=0.1246, pruned_loss=0.03924, audio_tagging_loss=0.01209, over 3036283.22 frames. ], batch size: 57, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:52:56,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=235633.33333333334, ans=0.125 2023-11-18 12:52:58,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=235633.33333333334, ans=0.0 2023-11-18 12:53:09,709 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.66 vs. limit=15.0 2023-11-18 12:53:51,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=235966.66666666666, ans=0.1 2023-11-18 12:53:52,130 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11350, loss[loss=0.1618, simple_loss=0.186, pruned_loss=0.05824, audio_tagging_loss=0.01051, over 17625.00 frames. ], tot_loss[loss=0.1133, simple_loss=0.1242, pruned_loss=0.03918, audio_tagging_loss=0.01203, over 3041621.97 frames. ], batch size: 62, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:54:05,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=236033.33333333334, ans=0.1 2023-11-18 12:54:06,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=236033.33333333334, ans=0.1 2023-11-18 12:54:07,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=236033.33333333334, ans=0.125 2023-11-18 12:54:21,994 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.976e+01 9.667e+01 1.093e+02 1.238e+02 1.995e+02, threshold=2.185e+02, percent-clipped=0.0 2023-11-18 12:54:30,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=236166.66666666666, ans=0.1 2023-11-18 12:54:31,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=236166.66666666666, ans=0.2 2023-11-18 12:54:45,162 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.45 vs. limit=12.0 2023-11-18 12:54:46,948 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.71 vs. limit=6.0 2023-11-18 12:54:48,558 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11400, loss[loss=0.1267, simple_loss=0.1365, pruned_loss=0.04838, audio_tagging_loss=0.01007, over 15570.00 frames. ], tot_loss[loss=0.1133, simple_loss=0.1243, pruned_loss=0.03924, audio_tagging_loss=0.01193, over 3041636.12 frames. ], batch size: 56, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:54:48,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=236300.0, ans=0.0 2023-11-18 12:54:59,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=236366.66666666666, ans=0.1 2023-11-18 12:55:13,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=236433.33333333334, ans=0.0 2023-11-18 12:55:30,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=236500.0, ans=0.1 2023-11-18 12:55:43,122 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11450, loss[loss=0.1249, simple_loss=0.1456, pruned_loss=0.04349, audio_tagging_loss=0.008578, over 15330.00 frames. ], tot_loss[loss=0.1141, simple_loss=0.1254, pruned_loss=0.03956, audio_tagging_loss=0.01183, over 3042798.08 frames. ], batch size: 55, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:55:43,708 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=12.0 2023-11-18 12:56:05,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=236766.66666666666, ans=0.1 2023-11-18 12:56:13,500 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.236e+01 9.179e+01 9.957e+01 1.104e+02 1.348e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-18 12:56:16,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=236833.33333333334, ans=0.125 2023-11-18 12:56:19,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=236833.33333333334, ans=0.125 2023-11-18 12:56:22,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=236833.33333333334, ans=0.0 2023-11-18 12:56:26,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=236900.0, ans=0.0 2023-11-18 12:56:28,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=236900.0, ans=0.0 2023-11-18 12:56:28,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=236900.0, ans=0.0 2023-11-18 12:56:38,366 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11500, loss[loss=0.1262, simple_loss=0.1371, pruned_loss=0.04572, audio_tagging_loss=0.01196, over 15340.00 frames. ], tot_loss[loss=0.1144, simple_loss=0.1259, pruned_loss=0.03977, audio_tagging_loss=0.01171, over 3042575.07 frames. ], batch size: 59, lr: 1.91e-02, grad_scale: 32.0 2023-11-18 12:56:42,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=236966.66666666666, ans=0.2 2023-11-18 12:56:54,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=26.75 vs. limit=22.5 2023-11-18 12:56:56,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=237033.33333333334, ans=0.0 2023-11-18 12:56:59,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=237033.33333333334, ans=0.95 2023-11-18 12:57:02,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=237100.0, ans=0.0 2023-11-18 12:57:10,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=237166.66666666666, ans=0.1 2023-11-18 12:57:33,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=237300.0, ans=0.125 2023-11-18 12:57:35,032 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11550, loss[loss=0.1401, simple_loss=0.1473, pruned_loss=0.05133, audio_tagging_loss=0.01517, over 15203.00 frames. ], tot_loss[loss=0.1145, simple_loss=0.1261, pruned_loss=0.03977, audio_tagging_loss=0.01167, over 3042694.16 frames. ], batch size: 56, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 12:57:35,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=237300.0, ans=0.2 2023-11-18 12:57:56,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=237433.33333333334, ans=0.125 2023-11-18 12:58:04,245 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 9.411e+01 1.015e+02 1.135e+02 1.692e+02, threshold=2.029e+02, percent-clipped=0.0 2023-11-18 12:58:08,071 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 12:58:17,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=237500.0, ans=0.1 2023-11-18 12:58:30,026 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11600, loss[loss=0.1011, simple_loss=0.1153, pruned_loss=0.03213, audio_tagging_loss=0.0113, over 15095.00 frames. ], tot_loss[loss=0.1144, simple_loss=0.1257, pruned_loss=0.03984, audio_tagging_loss=0.0117, over 3047286.84 frames. ], batch size: 58, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 12:58:30,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.73 vs. limit=6.0 2023-11-18 12:58:43,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=237700.0, ans=0.125 2023-11-18 12:58:51,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=237766.66666666666, ans=0.0 2023-11-18 12:59:03,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=237833.33333333334, ans=0.0 2023-11-18 12:59:07,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=237833.33333333334, ans=0.1 2023-11-18 12:59:14,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=237900.0, ans=0.125 2023-11-18 12:59:16,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=237900.0, ans=0.1 2023-11-18 12:59:24,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=237966.66666666666, ans=0.0 2023-11-18 12:59:25,496 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11650, loss[loss=0.1264, simple_loss=0.1436, pruned_loss=0.04248, audio_tagging_loss=0.0121, over 15433.00 frames. ], tot_loss[loss=0.1147, simple_loss=0.1263, pruned_loss=0.0399, audio_tagging_loss=0.01166, over 3046779.79 frames. ], batch size: 60, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 12:59:29,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=237966.66666666666, ans=0.0 2023-11-18 12:59:39,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=238033.33333333334, ans=0.0 2023-11-18 12:59:46,717 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=18.00 vs. limit=22.5 2023-11-18 12:59:47,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=238100.0, ans=0.125 2023-11-18 12:59:47,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=238100.0, ans=0.0 2023-11-18 12:59:55,508 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 9.411e+01 1.037e+02 1.164e+02 1.752e+02, threshold=2.075e+02, percent-clipped=0.0 2023-11-18 13:00:01,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=238166.66666666666, ans=0.125 2023-11-18 13:00:06,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=238166.66666666666, ans=0.1 2023-11-18 13:00:11,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=238233.33333333334, ans=0.125 2023-11-18 13:00:18,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=238233.33333333334, ans=0.1 2023-11-18 13:00:20,962 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11700, loss[loss=0.08922, simple_loss=0.09142, pruned_loss=0.02814, audio_tagging_loss=0.01537, over 15571.00 frames. ], tot_loss[loss=0.1139, simple_loss=0.125, pruned_loss=0.03954, audio_tagging_loss=0.01179, over 3051309.35 frames. ], batch size: 60, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:01:12,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=238566.66666666666, ans=0.125 2023-11-18 13:01:16,331 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11750, loss[loss=0.1104, simple_loss=0.1232, pruned_loss=0.03653, audio_tagging_loss=0.01226, over 15229.00 frames. ], tot_loss[loss=0.114, simple_loss=0.1252, pruned_loss=0.03961, audio_tagging_loss=0.01182, over 3052309.26 frames. ], batch size: 58, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:01:18,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=238633.33333333334, ans=0.125 2023-11-18 13:01:22,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=238633.33333333334, ans=0.125 2023-11-18 13:01:33,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=238700.0, ans=0.0 2023-11-18 13:01:46,327 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.614e+01 1.047e+02 1.164e+02 1.458e+02 1.909e+02, threshold=2.328e+02, percent-clipped=0.0 2023-11-18 13:02:11,169 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11800, loss[loss=0.1014, simple_loss=0.1121, pruned_loss=0.0294, audio_tagging_loss=0.01589, over 14840.00 frames. ], tot_loss[loss=0.1141, simple_loss=0.1255, pruned_loss=0.03949, audio_tagging_loss=0.01187, over 3055180.52 frames. ], batch size: 54, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:02:19,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=238966.66666666666, ans=0.0 2023-11-18 13:02:26,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=239033.33333333334, ans=0.125 2023-11-18 13:02:31,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=239033.33333333334, ans=0.0 2023-11-18 13:02:56,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=239233.33333333334, ans=0.0 2023-11-18 13:03:06,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=239300.0, ans=0.1 2023-11-18 13:03:07,106 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11850, loss[loss=0.08107, simple_loss=0.07828, pruned_loss=0.02853, audio_tagging_loss=0.0134, over 15705.00 frames. ], tot_loss[loss=0.1135, simple_loss=0.1246, pruned_loss=0.03909, audio_tagging_loss=0.01208, over 3058005.92 frames. ], batch size: 62, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:03:08,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=239300.0, ans=0.09899494936611666 2023-11-18 13:03:25,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=239366.66666666666, ans=0.0 2023-11-18 13:03:35,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=239433.33333333334, ans=0.025 2023-11-18 13:03:36,864 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.964e+01 9.657e+01 1.079e+02 1.246e+02 1.721e+02, threshold=2.158e+02, percent-clipped=0.0 2023-11-18 13:04:02,553 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11900, loss[loss=0.07621, simple_loss=0.08467, pruned_loss=0.02005, audio_tagging_loss=0.01383, over 14613.00 frames. ], tot_loss[loss=0.1136, simple_loss=0.1248, pruned_loss=0.03907, audio_tagging_loss=0.01216, over 3055457.44 frames. ], batch size: 56, lr: 1.90e-02, grad_scale: 32.0 2023-11-18 13:04:09,431 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.73 vs. limit=22.5 2023-11-18 13:04:09,604 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.16 vs. limit=15.0 2023-11-18 13:04:15,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=239700.0, ans=0.0 2023-11-18 13:04:22,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=239700.0, ans=0.1 2023-11-18 13:04:31,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=239766.66666666666, ans=0.1 2023-11-18 13:04:33,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=239766.66666666666, ans=0.1 2023-11-18 13:04:41,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=239833.33333333334, ans=0.05 2023-11-18 13:04:51,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=239900.0, ans=0.125 2023-11-18 13:04:55,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=239900.0, ans=0.125 2023-11-18 13:04:57,148 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 11950, loss[loss=0.1445, simple_loss=0.1544, pruned_loss=0.05633, audio_tagging_loss=0.01091, over 14941.00 frames. ], tot_loss[loss=0.1138, simple_loss=0.125, pruned_loss=0.03914, audio_tagging_loss=0.01216, over 3050637.20 frames. ], batch size: 56, lr: 1.89e-02, grad_scale: 32.0 2023-11-18 13:04:57,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=239966.66666666666, ans=0.0 2023-11-18 13:04:59,970 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.38 vs. limit=12.0 2023-11-18 13:05:02,210 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-36000.pt 2023-11-18 13:05:27,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=240100.0, ans=0.0 2023-11-18 13:05:29,706 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 9.537e+01 1.099e+02 1.271e+02 1.974e+02, threshold=2.199e+02, percent-clipped=0.0 2023-11-18 13:05:38,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=240166.66666666666, ans=0.1 2023-11-18 13:05:53,176 INFO [train_asr.py:1115] (0/4) Epoch 3, batch 12000, loss[loss=0.1146, simple_loss=0.1291, pruned_loss=0.03695, audio_tagging_loss=0.01304, over 15225.00 frames. ], tot_loss[loss=0.1144, simple_loss=0.1253, pruned_loss=0.03939, audio_tagging_loss=0.01233, over 3047939.53 frames. ], batch size: 58, lr: 1.89e-02, grad_scale: 32.0 2023-11-18 13:05:53,178 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 13:06:26,331 INFO [train_asr.py:1147] (0/4) Epoch 3, validation: loss=0.07855, simple_loss=0.06384, pruned_loss=0.01132, audio_tagging_loss=0.03531, over 4681554.00 frames. 2023-11-18 13:06:26,332 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 13:06:28,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=240300.0, ans=0.05 2023-11-18 13:06:38,449 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.78 vs. limit=22.5 2023-11-18 13:06:40,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=240366.66666666666, ans=0.125 2023-11-18 13:06:51,249 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-3.pt 2023-11-18 13:07:28,601 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 0, loss[loss=0.1033, simple_loss=0.08319, pruned_loss=0.02803, audio_tagging_loss=0.03367, over 14400.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.08319, pruned_loss=0.02803, audio_tagging_loss=0.03367, over 14400.00 frames. ], batch size: 57, lr: 1.77e-02, grad_scale: 32.0 2023-11-18 13:07:28,603 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 13:07:45,169 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.5050, 2.0817, 1.8213, 1.2965, 2.1400, 2.0730, 2.1606, 1.4614], device='cuda:0') 2023-11-18 13:08:00,459 INFO [train_asr.py:1147] (0/4) Epoch 4, validation: loss=0.07694, simple_loss=0.06378, pruned_loss=0.01116, audio_tagging_loss=0.03389, over 4681554.00 frames. 2023-11-18 13:08:00,460 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 13:08:00,778 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:08:15,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=240520.0, ans=0.0 2023-11-18 13:08:18,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=240520.0, ans=0.125 2023-11-18 13:08:24,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=240586.66666666666, ans=0.0 2023-11-18 13:08:30,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=240586.66666666666, ans=0.1 2023-11-18 13:08:35,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=240653.33333333334, ans=0.125 2023-11-18 13:08:55,952 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 50, loss[loss=0.1301, simple_loss=0.1498, pruned_loss=0.03881, audio_tagging_loss=0.01637, over 16052.00 frames. ], tot_loss[loss=0.1235, simple_loss=0.1249, pruned_loss=0.03852, audio_tagging_loss=0.02259, over 684480.01 frames. ], batch size: 59, lr: 1.77e-02, grad_scale: 32.0 2023-11-18 13:08:58,599 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=9.26 vs. limit=15.0 2023-11-18 13:09:00,235 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.826e+01 9.935e+01 1.154e+02 1.332e+02 1.872e+02, threshold=2.308e+02, percent-clipped=0.0 2023-11-18 13:09:00,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=240786.66666666666, ans=0.1 2023-11-18 13:09:07,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=240853.33333333334, ans=0.0 2023-11-18 13:09:14,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=240853.33333333334, ans=0.0 2023-11-18 13:09:16,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=240853.33333333334, ans=0.125 2023-11-18 13:09:20,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=240920.0, ans=0.125 2023-11-18 13:09:28,969 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.83 vs. limit=15.0 2023-11-18 13:09:36,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=240986.66666666666, ans=6.0 2023-11-18 13:09:45,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=241053.33333333334, ans=10.0 2023-11-18 13:09:52,226 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 100, loss[loss=0.1165, simple_loss=0.1236, pruned_loss=0.03674, audio_tagging_loss=0.01799, over 14806.00 frames. ], tot_loss[loss=0.1228, simple_loss=0.125, pruned_loss=0.03841, audio_tagging_loss=0.02191, over 1201793.12 frames. ], batch size: 53, lr: 1.77e-02, grad_scale: 32.0 2023-11-18 13:09:55,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=241120.0, ans=0.125 2023-11-18 13:10:07,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=241186.66666666666, ans=0.0 2023-11-18 13:10:11,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=241186.66666666666, ans=0.125 2023-11-18 13:10:30,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=241320.0, ans=0.125 2023-11-18 13:10:39,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=241386.66666666666, ans=0.1 2023-11-18 13:10:48,257 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 150, loss[loss=0.1139, simple_loss=0.1286, pruned_loss=0.03729, audio_tagging_loss=0.01234, over 14288.00 frames. ], tot_loss[loss=0.1203, simple_loss=0.1246, pruned_loss=0.03821, audio_tagging_loss=0.01976, over 1613950.72 frames. ], batch size: 53, lr: 1.77e-02, grad_scale: 32.0 2023-11-18 13:10:49,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=241453.33333333334, ans=0.1 2023-11-18 13:10:51,969 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.23 vs. limit=6.0 2023-11-18 13:10:52,413 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 9.521e+01 1.016e+02 1.130e+02 1.451e+02, threshold=2.033e+02, percent-clipped=0.0 2023-11-18 13:10:55,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=241453.33333333334, ans=0.2 2023-11-18 13:11:00,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=241520.0, ans=0.0 2023-11-18 13:11:03,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=241520.0, ans=10.0 2023-11-18 13:11:06,451 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2023-11-18 13:11:10,924 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.82 vs. limit=6.0 2023-11-18 13:11:19,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=241586.66666666666, ans=0.0 2023-11-18 13:11:34,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=241720.0, ans=0.0 2023-11-18 13:11:38,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=241720.0, ans=0.1 2023-11-18 13:11:38,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=241720.0, ans=0.0 2023-11-18 13:11:44,092 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 200, loss[loss=0.1061, simple_loss=0.1215, pruned_loss=0.02967, audio_tagging_loss=0.01565, over 13536.00 frames. ], tot_loss[loss=0.1167, simple_loss=0.1231, pruned_loss=0.03764, audio_tagging_loss=0.01751, over 1926314.13 frames. ], batch size: 53, lr: 1.76e-02, grad_scale: 32.0 2023-11-18 13:11:55,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=241853.33333333334, ans=0.125 2023-11-18 13:11:58,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.53 vs. limit=6.0 2023-11-18 13:12:40,381 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 250, loss[loss=0.08241, simple_loss=0.09094, pruned_loss=0.02537, audio_tagging_loss=0.01157, over 15392.00 frames. ], tot_loss[loss=0.1154, simple_loss=0.1233, pruned_loss=0.03783, audio_tagging_loss=0.0159, over 2180740.42 frames. ], batch size: 59, lr: 1.76e-02, grad_scale: 16.0 2023-11-18 13:12:45,587 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 9.626e+01 1.050e+02 1.196e+02 1.667e+02, threshold=2.101e+02, percent-clipped=0.0 2023-11-18 13:13:01,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=242253.33333333334, ans=0.125 2023-11-18 13:13:11,663 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2023-11-18 13:13:35,868 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 300, loss[loss=0.1167, simple_loss=0.1367, pruned_loss=0.03891, audio_tagging_loss=0.009501, over 15949.00 frames. ], tot_loss[loss=0.1161, simple_loss=0.1253, pruned_loss=0.03881, audio_tagging_loss=0.01464, over 2370418.20 frames. ], batch size: 58, lr: 1.76e-02, grad_scale: 16.0 2023-11-18 13:13:41,768 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.58 vs. limit=15.0 2023-11-18 13:13:52,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=242520.0, ans=0.05 2023-11-18 13:13:57,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.24 vs. limit=12.0 2023-11-18 13:14:31,227 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 350, loss[loss=0.1089, simple_loss=0.1293, pruned_loss=0.03367, audio_tagging_loss=0.01062, over 15716.00 frames. ], tot_loss[loss=0.1154, simple_loss=0.1261, pruned_loss=0.03872, audio_tagging_loss=0.01365, over 2519839.00 frames. ], batch size: 58, lr: 1.76e-02, grad_scale: 16.0 2023-11-18 13:14:37,029 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.919e+01 9.705e+01 1.099e+02 1.261e+02 1.880e+02, threshold=2.197e+02, percent-clipped=0.0 2023-11-18 13:15:27,800 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 400, loss[loss=0.1009, simple_loss=0.1145, pruned_loss=0.03046, audio_tagging_loss=0.01317, over 16386.00 frames. ], tot_loss[loss=0.1138, simple_loss=0.1245, pruned_loss=0.03832, audio_tagging_loss=0.0132, over 2642285.22 frames. ], batch size: 61, lr: 1.76e-02, grad_scale: 32.0 2023-11-18 13:15:37,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=243186.66666666666, ans=0.0 2023-11-18 13:15:37,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=243186.66666666666, ans=0.0 2023-11-18 13:15:43,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=243186.66666666666, ans=0.125 2023-11-18 13:16:04,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=243320.0, ans=0.125 2023-11-18 13:16:23,000 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 450, loss[loss=0.1027, simple_loss=0.1123, pruned_loss=0.03161, audio_tagging_loss=0.01487, over 14936.00 frames. ], tot_loss[loss=0.1123, simple_loss=0.1234, pruned_loss=0.0377, audio_tagging_loss=0.01296, over 2736460.01 frames. ], batch size: 57, lr: 1.76e-02, grad_scale: 32.0 2023-11-18 13:16:23,228 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:16:28,335 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.979e+01 9.241e+01 1.029e+02 1.146e+02 1.664e+02, threshold=2.058e+02, percent-clipped=0.0 2023-11-18 13:16:28,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=243453.33333333334, ans=0.125 2023-11-18 13:16:31,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=243453.33333333334, ans=0.1 2023-11-18 13:16:35,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=243520.0, ans=0.2 2023-11-18 13:16:47,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=243586.66666666666, ans=0.125 2023-11-18 13:16:51,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=243586.66666666666, ans=0.125 2023-11-18 13:16:51,254 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:16:56,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=243653.33333333334, ans=0.125 2023-11-18 13:17:01,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=243653.33333333334, ans=0.125 2023-11-18 13:17:18,760 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 500, loss[loss=0.1179, simple_loss=0.1229, pruned_loss=0.0432, audio_tagging_loss=0.01327, over 14958.00 frames. ], tot_loss[loss=0.1125, simple_loss=0.1241, pruned_loss=0.03787, audio_tagging_loss=0.0126, over 2807189.73 frames. ], batch size: 57, lr: 1.76e-02, grad_scale: 16.0 2023-11-18 13:17:26,081 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=17.79 vs. limit=22.5 2023-11-18 13:17:26,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=243786.66666666666, ans=0.0 2023-11-18 13:17:30,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=243853.33333333334, ans=0.07 2023-11-18 13:17:30,351 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.25 vs. limit=10.0 2023-11-18 13:18:01,167 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.25 vs. limit=15.0 2023-11-18 13:18:07,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=244053.33333333334, ans=0.1 2023-11-18 13:18:15,517 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 550, loss[loss=0.1015, simple_loss=0.1121, pruned_loss=0.03445, audio_tagging_loss=0.01095, over 14802.00 frames. ], tot_loss[loss=0.1114, simple_loss=0.123, pruned_loss=0.03739, audio_tagging_loss=0.0125, over 2865879.58 frames. ], batch size: 55, lr: 1.76e-02, grad_scale: 8.0 2023-11-18 13:18:22,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=244120.0, ans=0.125 2023-11-18 13:18:23,484 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.387e+01 9.469e+01 1.045e+02 1.178e+02 1.805e+02, threshold=2.090e+02, percent-clipped=0.0 2023-11-18 13:18:39,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=244253.33333333334, ans=0.125 2023-11-18 13:19:04,393 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=15.02 vs. limit=15.0 2023-11-18 13:19:07,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=244386.66666666666, ans=0.2 2023-11-18 13:19:11,340 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 600, loss[loss=0.08278, simple_loss=0.08816, pruned_loss=0.0248, audio_tagging_loss=0.01391, over 14786.00 frames. ], tot_loss[loss=0.1107, simple_loss=0.1219, pruned_loss=0.03724, audio_tagging_loss=0.01248, over 2901421.18 frames. ], batch size: 56, lr: 1.76e-02, grad_scale: 8.0 2023-11-18 13:19:15,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=244453.33333333334, ans=0.125 2023-11-18 13:19:21,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=244520.0, ans=0.025 2023-11-18 13:19:39,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=244586.66666666666, ans=0.125 2023-11-18 13:20:06,676 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 650, loss[loss=0.09012, simple_loss=0.09833, pruned_loss=0.02642, audio_tagging_loss=0.01454, over 16036.00 frames. ], tot_loss[loss=0.1113, simple_loss=0.1224, pruned_loss=0.03768, audio_tagging_loss=0.01241, over 2935191.30 frames. ], batch size: 60, lr: 1.75e-02, grad_scale: 8.0 2023-11-18 13:20:15,226 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.783e+01 9.493e+01 1.068e+02 1.179e+02 1.760e+02, threshold=2.137e+02, percent-clipped=0.0 2023-11-18 13:20:21,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=244853.33333333334, ans=0.1 2023-11-18 13:20:27,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=244853.33333333334, ans=0.125 2023-11-18 13:20:37,960 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=1.373e-02 2023-11-18 13:20:47,831 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2023-11-18 13:20:51,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=245053.33333333334, ans=0.2 2023-11-18 13:20:51,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=245053.33333333334, ans=0.0 2023-11-18 13:21:01,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=245053.33333333334, ans=0.0 2023-11-18 13:21:03,272 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 700, loss[loss=0.1369, simple_loss=0.1587, pruned_loss=0.04845, audio_tagging_loss=0.009091, over 14784.00 frames. ], tot_loss[loss=0.1112, simple_loss=0.1224, pruned_loss=0.0377, audio_tagging_loss=0.01234, over 2960222.69 frames. ], batch size: 54, lr: 1.75e-02, grad_scale: 8.0 2023-11-18 13:21:03,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=245120.0, ans=0.125 2023-11-18 13:21:18,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=245186.66666666666, ans=0.0 2023-11-18 13:21:39,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=245320.0, ans=0.0 2023-11-18 13:21:53,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=245386.66666666666, ans=0.125 2023-11-18 13:21:53,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=245386.66666666666, ans=0.0 2023-11-18 13:21:57,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=245386.66666666666, ans=0.0 2023-11-18 13:21:57,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=245386.66666666666, ans=0.0 2023-11-18 13:21:59,686 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 750, loss[loss=0.151, simple_loss=0.1712, pruned_loss=0.05405, audio_tagging_loss=0.01133, over 14942.00 frames. ], tot_loss[loss=0.1119, simple_loss=0.1231, pruned_loss=0.03804, audio_tagging_loss=0.01229, over 2977028.66 frames. ], batch size: 57, lr: 1.75e-02, grad_scale: 8.0 2023-11-18 13:21:59,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=245453.33333333334, ans=0.2 2023-11-18 13:22:00,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=245453.33333333334, ans=0.1 2023-11-18 13:22:07,024 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.990e+01 9.458e+01 1.066e+02 1.214e+02 1.611e+02, threshold=2.132e+02, percent-clipped=0.0 2023-11-18 13:22:15,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=245520.0, ans=0.0 2023-11-18 13:22:17,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=245520.0, ans=0.2 2023-11-18 13:22:21,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.09 vs. limit=15.0 2023-11-18 13:22:40,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=245653.33333333334, ans=0.5 2023-11-18 13:22:41,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=245653.33333333334, ans=0.125 2023-11-18 13:22:53,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=245786.66666666666, ans=0.0 2023-11-18 13:22:54,559 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 800, loss[loss=0.1137, simple_loss=0.123, pruned_loss=0.03948, audio_tagging_loss=0.01271, over 14653.00 frames. ], tot_loss[loss=0.1121, simple_loss=0.1234, pruned_loss=0.03807, audio_tagging_loss=0.0123, over 2993321.69 frames. ], batch size: 56, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:23:10,088 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.54 vs. limit=22.5 2023-11-18 13:23:19,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=245920.0, ans=0.125 2023-11-18 13:23:28,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=245986.66666666666, ans=0.0 2023-11-18 13:23:50,688 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 850, loss[loss=0.1136, simple_loss=0.1138, pruned_loss=0.04119, audio_tagging_loss=0.01548, over 15190.00 frames. ], tot_loss[loss=0.1122, simple_loss=0.1235, pruned_loss=0.03811, audio_tagging_loss=0.0123, over 3001633.19 frames. ], batch size: 60, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:23:51,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=246120.0, ans=0.0 2023-11-18 13:23:54,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=246120.0, ans=0.125 2023-11-18 13:23:54,416 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.05 vs. limit=15.0 2023-11-18 13:23:59,082 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.134e+01 9.530e+01 1.051e+02 1.203e+02 1.738e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-18 13:24:02,731 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.30 vs. limit=10.0 2023-11-18 13:24:23,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=246320.0, ans=0.125 2023-11-18 13:24:28,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=246320.0, ans=0.125 2023-11-18 13:24:33,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=246320.0, ans=0.125 2023-11-18 13:24:42,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=246386.66666666666, ans=0.2 2023-11-18 13:24:46,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=246453.33333333334, ans=0.0 2023-11-18 13:24:47,023 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 900, loss[loss=0.1476, simple_loss=0.173, pruned_loss=0.05211, audio_tagging_loss=0.008947, over 15343.00 frames. ], tot_loss[loss=0.1116, simple_loss=0.1228, pruned_loss=0.0379, audio_tagging_loss=0.01237, over 3013183.96 frames. ], batch size: 54, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:25:04,756 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.38 vs. limit=6.0 2023-11-18 13:25:09,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=246586.66666666666, ans=0.2 2023-11-18 13:25:10,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=246586.66666666666, ans=0.0 2023-11-18 13:25:20,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=246653.33333333334, ans=0.0 2023-11-18 13:25:22,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=246653.33333333334, ans=0.125 2023-11-18 13:25:42,390 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 950, loss[loss=0.1569, simple_loss=0.1747, pruned_loss=0.06052, audio_tagging_loss=0.009041, over 15963.00 frames. ], tot_loss[loss=0.1114, simple_loss=0.1229, pruned_loss=0.03776, audio_tagging_loss=0.01218, over 3026109.26 frames. ], batch size: 57, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:25:46,839 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.046e-02 2023-11-18 13:25:49,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.19 vs. limit=12.0 2023-11-18 13:25:49,685 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.600e+01 9.343e+01 1.032e+02 1.151e+02 2.313e+02, threshold=2.063e+02, percent-clipped=1.0 2023-11-18 13:25:54,307 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=7.50 vs. limit=15.0 2023-11-18 13:26:20,955 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.59 vs. limit=15.0 2023-11-18 13:26:23,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=246986.66666666666, ans=0.0 2023-11-18 13:26:37,926 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1000, loss[loss=0.08352, simple_loss=0.08335, pruned_loss=0.03168, audio_tagging_loss=0.01016, over 14005.00 frames. ], tot_loss[loss=0.1117, simple_loss=0.124, pruned_loss=0.03792, audio_tagging_loss=0.01182, over 3031549.69 frames. ], batch size: 55, lr: 1.75e-02, grad_scale: 16.0 2023-11-18 13:26:47,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=247120.0, ans=0.0 2023-11-18 13:26:52,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=247186.66666666666, ans=0.125 2023-11-18 13:27:01,677 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:27:07,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten.whitening_limit, batch_count=247253.33333333334, ans=22.5 2023-11-18 13:27:10,658 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=8.88 vs. limit=12.0 2023-11-18 13:27:18,648 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.94 vs. limit=15.0 2023-11-18 13:27:33,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=247453.33333333334, ans=0.2 2023-11-18 13:27:33,826 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1050, loss[loss=0.08321, simple_loss=0.0874, pruned_loss=0.02595, audio_tagging_loss=0.01355, over 14977.00 frames. ], tot_loss[loss=0.112, simple_loss=0.1243, pruned_loss=0.03807, audio_tagging_loss=0.01174, over 3035948.13 frames. ], batch size: 59, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:27:41,182 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.026e+01 9.797e+01 1.106e+02 1.274e+02 2.848e+02, threshold=2.212e+02, percent-clipped=1.0 2023-11-18 13:27:55,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=247586.66666666666, ans=0.0 2023-11-18 13:28:00,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=247586.66666666666, ans=0.0 2023-11-18 13:28:12,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=247653.33333333334, ans=0.0 2023-11-18 13:28:28,416 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1100, loss[loss=0.07765, simple_loss=0.08613, pruned_loss=0.02089, audio_tagging_loss=0.01369, over 15204.00 frames. ], tot_loss[loss=0.1118, simple_loss=0.1243, pruned_loss=0.03802, audio_tagging_loss=0.01164, over 3040736.38 frames. ], batch size: 58, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:28:30,568 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:28:36,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=247786.66666666666, ans=0.2 2023-11-18 13:28:38,125 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=247853.33333333334, ans=0.0 2023-11-18 13:28:41,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=247853.33333333334, ans=0.125 2023-11-18 13:28:44,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.04 vs. limit=8.0 2023-11-18 13:28:49,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=247920.0, ans=0.125 2023-11-18 13:28:52,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=247920.0, ans=0.2 2023-11-18 13:29:01,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=247986.66666666666, ans=0.1 2023-11-18 13:29:24,438 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1150, loss[loss=0.09083, simple_loss=0.101, pruned_loss=0.02684, audio_tagging_loss=0.01348, over 15749.00 frames. ], tot_loss[loss=0.1112, simple_loss=0.1234, pruned_loss=0.03773, audio_tagging_loss=0.0118, over 3047105.88 frames. ], batch size: 61, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:29:27,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=248120.0, ans=0.0 2023-11-18 13:29:31,759 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 9.396e+01 1.043e+02 1.149e+02 1.593e+02, threshold=2.087e+02, percent-clipped=0.0 2023-11-18 13:29:39,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=248186.66666666666, ans=0.035 2023-11-18 13:29:41,316 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2023-11-18 13:29:48,629 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.76 vs. limit=15.0 2023-11-18 13:29:51,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=248253.33333333334, ans=0.125 2023-11-18 13:29:51,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=248253.33333333334, ans=0.1 2023-11-18 13:29:59,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=248320.0, ans=0.0 2023-11-18 13:30:13,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=248386.66666666666, ans=0.125 2023-11-18 13:30:21,174 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1200, loss[loss=0.08448, simple_loss=0.09455, pruned_loss=0.02637, audio_tagging_loss=0.01083, over 15460.00 frames. ], tot_loss[loss=0.1113, simple_loss=0.1237, pruned_loss=0.03777, audio_tagging_loss=0.0117, over 3054186.29 frames. ], batch size: 61, lr: 1.74e-02, grad_scale: 32.0 2023-11-18 13:30:49,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=248586.66666666666, ans=0.1 2023-11-18 13:30:49,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.83 vs. limit=22.5 2023-11-18 13:30:57,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=248653.33333333334, ans=0.0 2023-11-18 13:31:05,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=248720.0, ans=0.125 2023-11-18 13:31:08,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=248720.0, ans=0.125 2023-11-18 13:31:16,181 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1250, loss[loss=0.08905, simple_loss=0.08571, pruned_loss=0.02758, audio_tagging_loss=0.01862, over 14844.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1222, pruned_loss=0.03723, audio_tagging_loss=0.01165, over 3051135.26 frames. ], batch size: 57, lr: 1.74e-02, grad_scale: 32.0 2023-11-18 13:31:19,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=248786.66666666666, ans=0.0 2023-11-18 13:31:23,568 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.729e+01 9.534e+01 1.061e+02 1.217e+02 1.836e+02, threshold=2.122e+02, percent-clipped=0.0 2023-11-18 13:31:49,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=248986.66666666666, ans=0.125 2023-11-18 13:31:52,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.73 vs. limit=15.0 2023-11-18 13:32:02,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=249053.33333333334, ans=0.0 2023-11-18 13:32:05,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=249053.33333333334, ans=0.125 2023-11-18 13:32:05,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=249053.33333333334, ans=0.125 2023-11-18 13:32:11,691 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1300, loss[loss=0.1215, simple_loss=0.1414, pruned_loss=0.04089, audio_tagging_loss=0.009882, over 15238.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1218, pruned_loss=0.0367, audio_tagging_loss=0.01166, over 3051691.08 frames. ], batch size: 56, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:32:53,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=249320.0, ans=0.0 2023-11-18 13:33:00,071 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2023-11-18 13:33:08,239 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1350, loss[loss=0.174, simple_loss=0.209, pruned_loss=0.06327, audio_tagging_loss=0.006243, over 15898.00 frames. ], tot_loss[loss=0.1097, simple_loss=0.122, pruned_loss=0.03696, audio_tagging_loss=0.01177, over 3049447.21 frames. ], batch size: 56, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:33:17,327 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.519e+01 9.738e+01 1.103e+02 1.190e+02 1.796e+02, threshold=2.206e+02, percent-clipped=0.0 2023-11-18 13:33:33,739 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:33:33,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=249586.66666666666, ans=0.0 2023-11-18 13:33:38,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=249586.66666666666, ans=0.125 2023-11-18 13:33:45,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=249653.33333333334, ans=0.0 2023-11-18 13:33:47,437 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:33:52,443 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:34:02,856 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=22.5 2023-11-18 13:34:04,458 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1400, loss[loss=0.1123, simple_loss=0.1218, pruned_loss=0.04102, audio_tagging_loss=0.01036, over 15855.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1222, pruned_loss=0.03687, audio_tagging_loss=0.0118, over 3047231.57 frames. ], batch size: 59, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:34:20,516 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=8.24 vs. limit=10.0 2023-11-18 13:34:24,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=249853.33333333334, ans=0.125 2023-11-18 13:34:28,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=249920.0, ans=0.0 2023-11-18 13:34:39,683 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.48 vs. limit=12.0 2023-11-18 13:34:52,176 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.39 vs. limit=15.0 2023-11-18 13:34:53,005 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=15.64 vs. limit=15.0 2023-11-18 13:35:00,074 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1450, loss[loss=0.1144, simple_loss=0.1207, pruned_loss=0.04101, audio_tagging_loss=0.01308, over 14982.00 frames. ], tot_loss[loss=0.1113, simple_loss=0.1241, pruned_loss=0.0375, audio_tagging_loss=0.01179, over 3045060.07 frames. ], batch size: 56, lr: 1.74e-02, grad_scale: 16.0 2023-11-18 13:35:09,004 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 9.516e+01 1.029e+02 1.105e+02 1.571e+02, threshold=2.057e+02, percent-clipped=0.0 2023-11-18 13:35:29,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.63 vs. limit=15.0 2023-11-18 13:35:38,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=250320.0, ans=0.125 2023-11-18 13:35:51,008 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.55 vs. limit=15.0 2023-11-18 13:35:56,372 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1500, loss[loss=0.11, simple_loss=0.1198, pruned_loss=0.03591, audio_tagging_loss=0.01423, over 15006.00 frames. ], tot_loss[loss=0.1114, simple_loss=0.124, pruned_loss=0.03753, audio_tagging_loss=0.0119, over 3044079.49 frames. ], batch size: 59, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:36:05,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=250453.33333333334, ans=0.2 2023-11-18 13:36:08,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.15 vs. limit=15.0 2023-11-18 13:36:29,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=250653.33333333334, ans=0.0 2023-11-18 13:36:49,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=250720.0, ans=0.125 2023-11-18 13:36:52,284 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1550, loss[loss=0.123, simple_loss=0.1382, pruned_loss=0.04423, audio_tagging_loss=0.009689, over 17515.00 frames. ], tot_loss[loss=0.1117, simple_loss=0.1242, pruned_loss=0.03771, audio_tagging_loss=0.01194, over 3052473.51 frames. ], batch size: 64, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:36:53,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=250786.66666666666, ans=0.125 2023-11-18 13:36:55,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=250786.66666666666, ans=0.125 2023-11-18 13:37:01,192 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 9.375e+01 1.072e+02 1.254e+02 1.823e+02, threshold=2.144e+02, percent-clipped=0.0 2023-11-18 13:37:18,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=250920.0, ans=0.125 2023-11-18 13:37:21,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=250920.0, ans=10.0 2023-11-18 13:37:47,455 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1600, loss[loss=0.1014, simple_loss=0.106, pruned_loss=0.03446, audio_tagging_loss=0.01393, over 15351.00 frames. ], tot_loss[loss=0.1107, simple_loss=0.123, pruned_loss=0.03715, audio_tagging_loss=0.01209, over 3047687.29 frames. ], batch size: 57, lr: 1.73e-02, grad_scale: 32.0 2023-11-18 13:37:55,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.70 vs. limit=10.0 2023-11-18 13:38:05,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=251186.66666666666, ans=0.1 2023-11-18 13:38:05,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=251186.66666666666, ans=0.125 2023-11-18 13:38:09,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=251253.33333333334, ans=0.125 2023-11-18 13:38:24,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=251320.0, ans=0.125 2023-11-18 13:38:32,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.28 vs. limit=22.5 2023-11-18 13:38:43,341 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1650, loss[loss=0.1303, simple_loss=0.1426, pruned_loss=0.0462, audio_tagging_loss=0.01282, over 14759.00 frames. ], tot_loss[loss=0.1115, simple_loss=0.1235, pruned_loss=0.03762, audio_tagging_loss=0.01214, over 3043572.67 frames. ], batch size: 54, lr: 1.73e-02, grad_scale: 32.0 2023-11-18 13:38:46,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=251453.33333333334, ans=0.125 2023-11-18 13:38:52,825 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.480e+01 9.946e+01 1.090e+02 1.261e+02 1.677e+02, threshold=2.181e+02, percent-clipped=0.0 2023-11-18 13:39:00,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=251520.0, ans=0.2 2023-11-18 13:39:16,694 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=9.29 vs. limit=12.0 2023-11-18 13:39:21,630 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.618e-02 2023-11-18 13:39:28,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=251720.0, ans=0.125 2023-11-18 13:39:39,305 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1700, loss[loss=0.06211, simple_loss=0.06345, pruned_loss=0.01856, audio_tagging_loss=0.01183, over 15750.00 frames. ], tot_loss[loss=0.1114, simple_loss=0.1231, pruned_loss=0.03767, audio_tagging_loss=0.01219, over 3047812.93 frames. ], batch size: 61, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:39:47,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=251786.66666666666, ans=0.2 2023-11-18 13:40:16,065 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.93 vs. limit=22.5 2023-11-18 13:40:35,225 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1750, loss[loss=0.1432, simple_loss=0.1739, pruned_loss=0.04694, audio_tagging_loss=0.0093, over 16546.00 frames. ], tot_loss[loss=0.1119, simple_loss=0.1239, pruned_loss=0.03784, audio_tagging_loss=0.01208, over 3042754.98 frames. ], batch size: 57, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:40:38,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=252120.0, ans=0.125 2023-11-18 13:40:45,327 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.182e+01 9.260e+01 1.013e+02 1.177e+02 1.598e+02, threshold=2.026e+02, percent-clipped=0.0 2023-11-18 13:40:49,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=252186.66666666666, ans=0.125 2023-11-18 13:40:53,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=252186.66666666666, ans=10.0 2023-11-18 13:40:59,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=252253.33333333334, ans=0.125 2023-11-18 13:41:00,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=252253.33333333334, ans=0.0 2023-11-18 13:41:02,030 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.30 vs. limit=22.5 2023-11-18 13:41:07,370 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=15.0 2023-11-18 13:41:15,725 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.84 vs. limit=6.0 2023-11-18 13:41:23,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=252386.66666666666, ans=0.2 2023-11-18 13:41:30,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=252453.33333333334, ans=0.125 2023-11-18 13:41:31,151 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1800, loss[loss=0.106, simple_loss=0.108, pruned_loss=0.03808, audio_tagging_loss=0.01388, over 14150.00 frames. ], tot_loss[loss=0.1116, simple_loss=0.1236, pruned_loss=0.03776, audio_tagging_loss=0.01203, over 3043570.02 frames. ], batch size: 56, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:41:59,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=252586.66666666666, ans=0.1 2023-11-18 13:42:17,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=252720.0, ans=0.0 2023-11-18 13:42:23,936 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.98 vs. limit=15.0 2023-11-18 13:42:27,628 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1850, loss[loss=0.0983, simple_loss=0.1045, pruned_loss=0.03394, audio_tagging_loss=0.0121, over 14697.00 frames. ], tot_loss[loss=0.1104, simple_loss=0.1222, pruned_loss=0.03727, audio_tagging_loss=0.01203, over 3040438.10 frames. ], batch size: 55, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:42:32,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=252786.66666666666, ans=0.125 2023-11-18 13:42:37,089 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.497e+01 9.907e+01 1.064e+02 1.171e+02 1.741e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-18 13:42:39,734 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.74 vs. limit=22.5 2023-11-18 13:42:40,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=252853.33333333334, ans=0.0 2023-11-18 13:42:41,082 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.94 vs. limit=6.0 2023-11-18 13:42:48,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=252920.0, ans=0.125 2023-11-18 13:42:52,810 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.39 vs. limit=15.0 2023-11-18 13:43:12,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=253053.33333333334, ans=0.125 2023-11-18 13:43:19,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=253053.33333333334, ans=0.2 2023-11-18 13:43:22,178 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1900, loss[loss=0.1159, simple_loss=0.1308, pruned_loss=0.04012, audio_tagging_loss=0.01039, over 16762.00 frames. ], tot_loss[loss=0.1103, simple_loss=0.1225, pruned_loss=0.03726, audio_tagging_loss=0.01181, over 3049038.83 frames. ], batch size: 62, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:43:28,186 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.38 vs. limit=15.0 2023-11-18 13:43:39,515 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:43:50,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=253253.33333333334, ans=0.125 2023-11-18 13:43:53,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=253253.33333333334, ans=0.025 2023-11-18 13:44:01,236 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.79 vs. limit=15.0 2023-11-18 13:44:18,710 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 1950, loss[loss=0.07939, simple_loss=0.08625, pruned_loss=0.02423, audio_tagging_loss=0.01204, over 17855.00 frames. ], tot_loss[loss=0.1102, simple_loss=0.1227, pruned_loss=0.03723, audio_tagging_loss=0.01166, over 3049732.22 frames. ], batch size: 69, lr: 1.73e-02, grad_scale: 16.0 2023-11-18 13:44:19,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=253453.33333333334, ans=0.2 2023-11-18 13:44:27,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=253453.33333333334, ans=0.0 2023-11-18 13:44:29,458 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 9.223e+01 1.021e+02 1.142e+02 1.490e+02, threshold=2.042e+02, percent-clipped=0.0 2023-11-18 13:44:30,068 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2023-11-18 13:44:38,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=253520.0, ans=10.0 2023-11-18 13:44:48,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=253586.66666666666, ans=0.1 2023-11-18 13:45:07,898 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.54 vs. limit=15.0 2023-11-18 13:45:15,422 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2000, loss[loss=0.09841, simple_loss=0.1075, pruned_loss=0.03332, audio_tagging_loss=0.01133, over 15724.00 frames. ], tot_loss[loss=0.1095, simple_loss=0.1219, pruned_loss=0.03688, audio_tagging_loss=0.01167, over 3045968.17 frames. ], batch size: 58, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:45:44,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=253920.0, ans=0.125 2023-11-18 13:45:44,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=253920.0, ans=0.2 2023-11-18 13:46:10,804 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2050, loss[loss=0.1046, simple_loss=0.1137, pruned_loss=0.03561, audio_tagging_loss=0.0121, over 13985.00 frames. ], tot_loss[loss=0.1106, simple_loss=0.1228, pruned_loss=0.03742, audio_tagging_loss=0.01173, over 3048826.50 frames. ], batch size: 53, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:46:11,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=254120.0, ans=0.0 2023-11-18 13:46:15,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=254120.0, ans=0.0 2023-11-18 13:46:21,821 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.442e+01 9.338e+01 1.033e+02 1.135e+02 2.200e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 13:46:48,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=254320.0, ans=0.0 2023-11-18 13:46:49,935 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:46:51,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=254320.0, ans=0.2 2023-11-18 13:47:06,221 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2100, loss[loss=0.105, simple_loss=0.1198, pruned_loss=0.03329, audio_tagging_loss=0.01182, over 14832.00 frames. ], tot_loss[loss=0.1097, simple_loss=0.1216, pruned_loss=0.03705, audio_tagging_loss=0.01179, over 3038094.36 frames. ], batch size: 56, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:47:19,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=254520.0, ans=0.0 2023-11-18 13:47:21,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=254520.0, ans=0.1 2023-11-18 13:47:28,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=254586.66666666666, ans=0.125 2023-11-18 13:47:31,767 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.01 vs. limit=10.0 2023-11-18 13:47:34,972 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.71 vs. limit=15.0 2023-11-18 13:47:35,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=254586.66666666666, ans=0.125 2023-11-18 13:47:37,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=254586.66666666666, ans=0.1 2023-11-18 13:47:41,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=254653.33333333334, ans=0.125 2023-11-18 13:47:48,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=254653.33333333334, ans=0.125 2023-11-18 13:47:49,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=254653.33333333334, ans=0.0 2023-11-18 13:47:50,518 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.12 vs. limit=15.0 2023-11-18 13:47:56,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=254720.0, ans=0.125 2023-11-18 13:48:00,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=254720.0, ans=0.0 2023-11-18 13:48:03,509 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2150, loss[loss=0.1254, simple_loss=0.1458, pruned_loss=0.04109, audio_tagging_loss=0.01142, over 15178.00 frames. ], tot_loss[loss=0.111, simple_loss=0.1234, pruned_loss=0.03766, audio_tagging_loss=0.01168, over 3043827.02 frames. ], batch size: 57, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:48:14,125 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.668e+01 9.557e+01 1.080e+02 1.239e+02 1.582e+02, threshold=2.161e+02, percent-clipped=1.0 2023-11-18 13:48:21,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.max_abs, batch_count=254853.33333333334, ans=10.0 2023-11-18 13:48:30,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=254920.0, ans=0.0 2023-11-18 13:48:36,067 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:48:38,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=254986.66666666666, ans=0.0 2023-11-18 13:48:58,283 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2200, loss[loss=0.1099, simple_loss=0.1245, pruned_loss=0.03343, audio_tagging_loss=0.01422, over 15109.00 frames. ], tot_loss[loss=0.1114, simple_loss=0.1237, pruned_loss=0.0378, audio_tagging_loss=0.01182, over 3046437.85 frames. ], batch size: 56, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:49:00,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.61 vs. limit=22.5 2023-11-18 13:49:18,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=255186.66666666666, ans=0.125 2023-11-18 13:49:53,796 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2250, loss[loss=0.1301, simple_loss=0.1415, pruned_loss=0.04803, audio_tagging_loss=0.01131, over 13998.00 frames. ], tot_loss[loss=0.112, simple_loss=0.1246, pruned_loss=0.03788, audio_tagging_loss=0.01177, over 3042047.34 frames. ], batch size: 54, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:49:55,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=255453.33333333334, ans=0.125 2023-11-18 13:49:55,560 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.08 vs. limit=15.0 2023-11-18 13:49:57,532 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.18 vs. limit=22.5 2023-11-18 13:50:05,600 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.545e+01 9.450e+01 1.063e+02 1.205e+02 1.681e+02, threshold=2.126e+02, percent-clipped=0.0 2023-11-18 13:50:41,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.61 vs. limit=10.0 2023-11-18 13:50:44,494 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.49 vs. limit=15.0 2023-11-18 13:50:50,933 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2300, loss[loss=0.1181, simple_loss=0.1224, pruned_loss=0.04275, audio_tagging_loss=0.01412, over 15170.00 frames. ], tot_loss[loss=0.1121, simple_loss=0.1252, pruned_loss=0.03771, audio_tagging_loss=0.01177, over 3049364.10 frames. ], batch size: 57, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:50:51,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=255786.66666666666, ans=0.125 2023-11-18 13:50:54,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=255786.66666666666, ans=0.0 2023-11-18 13:51:14,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=255920.0, ans=0.04949747468305833 2023-11-18 13:51:39,954 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 13:51:42,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=256053.33333333334, ans=0.0 2023-11-18 13:51:43,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=256053.33333333334, ans=0.2 2023-11-18 13:51:46,286 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2350, loss[loss=0.09201, simple_loss=0.1021, pruned_loss=0.02637, audio_tagging_loss=0.01457, over 14628.00 frames. ], tot_loss[loss=0.1122, simple_loss=0.125, pruned_loss=0.03789, audio_tagging_loss=0.01181, over 3045534.79 frames. ], batch size: 57, lr: 1.72e-02, grad_scale: 16.0 2023-11-18 13:51:57,481 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.637e+01 9.372e+01 1.028e+02 1.162e+02 1.776e+02, threshold=2.057e+02, percent-clipped=0.0 2023-11-18 13:52:03,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.91 vs. limit=15.0 2023-11-18 13:52:11,532 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.74 vs. limit=22.5 2023-11-18 13:52:13,971 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.26 vs. limit=10.0 2023-11-18 13:52:22,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=256320.0, ans=0.1 2023-11-18 13:52:27,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=256320.0, ans=0.125 2023-11-18 13:52:27,528 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.39 vs. limit=6.0 2023-11-18 13:52:32,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=256386.66666666666, ans=0.0 2023-11-18 13:52:42,251 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2400, loss[loss=0.1134, simple_loss=0.1199, pruned_loss=0.03885, audio_tagging_loss=0.01459, over 16617.00 frames. ], tot_loss[loss=0.1126, simple_loss=0.1255, pruned_loss=0.038, audio_tagging_loss=0.01185, over 3054938.70 frames. ], batch size: 62, lr: 1.72e-02, grad_scale: 32.0 2023-11-18 13:52:44,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=256453.33333333334, ans=0.0 2023-11-18 13:52:47,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=256453.33333333334, ans=0.2 2023-11-18 13:52:52,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=256520.0, ans=0.0 2023-11-18 13:52:52,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.13 vs. limit=15.0 2023-11-18 13:53:05,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=256586.66666666666, ans=0.0 2023-11-18 13:53:06,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=256586.66666666666, ans=0.125 2023-11-18 13:53:09,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=256586.66666666666, ans=0.125 2023-11-18 13:53:12,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=256586.66666666666, ans=0.125 2023-11-18 13:53:20,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=256653.33333333334, ans=0.2 2023-11-18 13:53:21,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=256653.33333333334, ans=0.2 2023-11-18 13:53:38,554 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2450, loss[loss=0.1085, simple_loss=0.1158, pruned_loss=0.03517, audio_tagging_loss=0.01542, over 15242.00 frames. ], tot_loss[loss=0.1109, simple_loss=0.1237, pruned_loss=0.0372, audio_tagging_loss=0.0119, over 3056255.56 frames. ], batch size: 59, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:53:43,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=256786.66666666666, ans=0.0 2023-11-18 13:53:49,525 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.486e+01 9.544e+01 1.043e+02 1.156e+02 1.781e+02, threshold=2.086e+02, percent-clipped=0.0 2023-11-18 13:54:03,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=256920.0, ans=0.0 2023-11-18 13:54:11,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=256986.66666666666, ans=0.09899494936611666 2023-11-18 13:54:12,999 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.94 vs. limit=8.0 2023-11-18 13:54:33,905 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2500, loss[loss=0.1487, simple_loss=0.1732, pruned_loss=0.05207, audio_tagging_loss=0.0101, over 15785.00 frames. ], tot_loss[loss=0.1113, simple_loss=0.1239, pruned_loss=0.03733, audio_tagging_loss=0.01205, over 3056550.99 frames. ], batch size: 55, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:54:38,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_ff3.min_abs, batch_count=257120.0, ans=0.2 2023-11-18 13:54:44,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=257186.66666666666, ans=0.0 2023-11-18 13:54:46,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=257186.66666666666, ans=0.125 2023-11-18 13:54:52,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=257186.66666666666, ans=0.1 2023-11-18 13:55:18,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=257386.66666666666, ans=0.5 2023-11-18 13:55:29,868 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2550, loss[loss=0.1068, simple_loss=0.1277, pruned_loss=0.0322, audio_tagging_loss=0.01072, over 15400.00 frames. ], tot_loss[loss=0.1112, simple_loss=0.1238, pruned_loss=0.0374, audio_tagging_loss=0.0119, over 3051695.59 frames. ], batch size: 57, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:55:32,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=257453.33333333334, ans=0.0 2023-11-18 13:55:33,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=257453.33333333334, ans=0.0 2023-11-18 13:55:34,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=257453.33333333334, ans=0.1 2023-11-18 13:55:35,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=257453.33333333334, ans=0.1 2023-11-18 13:55:40,566 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.163e+01 9.922e+01 1.114e+02 1.302e+02 1.822e+02, threshold=2.229e+02, percent-clipped=0.0 2023-11-18 13:56:22,121 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 13:56:22,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=257720.0, ans=0.2 2023-11-18 13:56:25,577 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2600, loss[loss=0.1255, simple_loss=0.1483, pruned_loss=0.0397, audio_tagging_loss=0.01168, over 15525.00 frames. ], tot_loss[loss=0.1101, simple_loss=0.1226, pruned_loss=0.0371, audio_tagging_loss=0.01174, over 3056577.96 frames. ], batch size: 57, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:56:32,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=257786.66666666666, ans=0.1 2023-11-18 13:56:34,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=257786.66666666666, ans=0.1 2023-11-18 13:56:51,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=257920.0, ans=0.125 2023-11-18 13:56:53,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=257920.0, ans=0.125 2023-11-18 13:56:59,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=257986.66666666666, ans=0.125 2023-11-18 13:57:07,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=257986.66666666666, ans=0.035 2023-11-18 13:57:21,396 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2650, loss[loss=0.1223, simple_loss=0.1312, pruned_loss=0.04621, audio_tagging_loss=0.01044, over 16037.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1207, pruned_loss=0.03656, audio_tagging_loss=0.01172, over 3050562.38 frames. ], batch size: 59, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:57:30,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=258120.0, ans=0.1 2023-11-18 13:57:32,558 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.000e+01 9.528e+01 1.033e+02 1.143e+02 1.471e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 13:57:32,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=258186.66666666666, ans=0.125 2023-11-18 13:57:39,928 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.41 vs. limit=15.0 2023-11-18 13:58:16,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=258453.33333333334, ans=0.1 2023-11-18 13:58:16,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=258453.33333333334, ans=0.125 2023-11-18 13:58:17,146 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2700, loss[loss=0.1387, simple_loss=0.1568, pruned_loss=0.04971, audio_tagging_loss=0.01063, over 15494.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1216, pruned_loss=0.03684, audio_tagging_loss=0.01173, over 3054877.03 frames. ], batch size: 56, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:58:35,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=258520.0, ans=0.1 2023-11-18 13:58:44,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=258586.66666666666, ans=0.1 2023-11-18 13:58:47,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=258586.66666666666, ans=0.2 2023-11-18 13:59:04,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=258720.0, ans=0.0 2023-11-18 13:59:05,360 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.15 vs. limit=15.0 2023-11-18 13:59:13,214 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2750, loss[loss=0.09041, simple_loss=0.1046, pruned_loss=0.02864, audio_tagging_loss=0.009498, over 14871.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1207, pruned_loss=0.03655, audio_tagging_loss=0.01175, over 3050122.38 frames. ], batch size: 56, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 13:59:17,065 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.84 vs. limit=22.5 2023-11-18 13:59:19,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.96 vs. limit=10.0 2023-11-18 13:59:23,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=258786.66666666666, ans=0.125 2023-11-18 13:59:24,866 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.704e+01 9.301e+01 1.031e+02 1.106e+02 1.514e+02, threshold=2.061e+02, percent-clipped=0.0 2023-11-18 13:59:47,644 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=12.0 2023-11-18 14:00:01,473 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:00:08,828 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2800, loss[loss=0.1325, simple_loss=0.1485, pruned_loss=0.04713, audio_tagging_loss=0.01111, over 16460.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.1213, pruned_loss=0.0368, audio_tagging_loss=0.01164, over 3044260.67 frames. ], batch size: 61, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 14:00:17,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=259120.0, ans=0.125 2023-11-18 14:00:22,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=259186.66666666666, ans=0.125 2023-11-18 14:00:26,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=259186.66666666666, ans=0.125 2023-11-18 14:00:31,096 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:00:42,868 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:00:54,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=259386.66666666666, ans=0.125 2023-11-18 14:00:57,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=259386.66666666666, ans=0.125 2023-11-18 14:01:04,440 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2850, loss[loss=0.115, simple_loss=0.1296, pruned_loss=0.03818, audio_tagging_loss=0.012, over 15750.00 frames. ], tot_loss[loss=0.1095, simple_loss=0.122, pruned_loss=0.03695, audio_tagging_loss=0.01159, over 3037214.57 frames. ], batch size: 58, lr: 1.71e-02, grad_scale: 32.0 2023-11-18 14:01:15,600 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 9.766e+01 1.049e+02 1.164e+02 1.614e+02, threshold=2.099e+02, percent-clipped=0.0 2023-11-18 14:01:41,361 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:01:42,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.61 vs. limit=6.0 2023-11-18 14:01:44,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=259653.33333333334, ans=0.0 2023-11-18 14:01:53,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=259720.0, ans=0.0 2023-11-18 14:02:00,178 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2900, loss[loss=0.1557, simple_loss=0.1807, pruned_loss=0.05663, audio_tagging_loss=0.008685, over 15044.00 frames. ], tot_loss[loss=0.1099, simple_loss=0.1223, pruned_loss=0.03712, audio_tagging_loss=0.01167, over 3035583.33 frames. ], batch size: 54, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:02:00,724 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.52 vs. limit=15.0 2023-11-18 14:02:01,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=259786.66666666666, ans=0.2 2023-11-18 14:02:01,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=259786.66666666666, ans=0.1 2023-11-18 14:02:09,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=259786.66666666666, ans=0.125 2023-11-18 14:02:11,024 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:02:23,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=259920.0, ans=0.1 2023-11-18 14:02:24,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=259920.0, ans=0.125 2023-11-18 14:02:27,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=259920.0, ans=0.125 2023-11-18 14:02:56,629 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 2950, loss[loss=0.09343, simple_loss=0.1014, pruned_loss=0.03149, audio_tagging_loss=0.01123, over 15325.00 frames. ], tot_loss[loss=0.1099, simple_loss=0.1225, pruned_loss=0.03699, audio_tagging_loss=0.01169, over 3036454.78 frames. ], batch size: 56, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:03:04,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=260120.0, ans=0.0 2023-11-18 14:03:06,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=260186.66666666666, ans=15.0 2023-11-18 14:03:07,250 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 9.370e+01 1.013e+02 1.101e+02 1.808e+02, threshold=2.027e+02, percent-clipped=0.0 2023-11-18 14:03:23,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=260253.33333333334, ans=0.1 2023-11-18 14:03:25,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=260253.33333333334, ans=0.1 2023-11-18 14:03:27,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=260253.33333333334, ans=0.125 2023-11-18 14:03:29,188 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:03:31,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=260320.0, ans=0.125 2023-11-18 14:03:37,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=260320.0, ans=0.125 2023-11-18 14:03:49,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=260386.66666666666, ans=0.125 2023-11-18 14:03:51,840 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3000, loss[loss=0.1152, simple_loss=0.1287, pruned_loss=0.04051, audio_tagging_loss=0.01036, over 14483.00 frames. ], tot_loss[loss=0.1102, simple_loss=0.1227, pruned_loss=0.03704, audio_tagging_loss=0.01181, over 3038563.28 frames. ], batch size: 55, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:03:51,842 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 14:04:18,854 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.9602, 5.0178, 5.0061, 5.0480], device='cuda:0') 2023-11-18 14:04:25,239 INFO [train_asr.py:1147] (0/4) Epoch 4, validation: loss=0.07718, simple_loss=0.06278, pruned_loss=0.01045, audio_tagging_loss=0.03534, over 4681554.00 frames. 2023-11-18 14:04:25,240 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 14:04:35,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=260520.0, ans=0.125 2023-11-18 14:04:41,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=260520.0, ans=0.125 2023-11-18 14:05:17,595 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.10 vs. limit=22.5 2023-11-18 14:05:20,215 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3050, loss[loss=0.1159, simple_loss=0.1366, pruned_loss=0.03778, audio_tagging_loss=0.009798, over 14917.00 frames. ], tot_loss[loss=0.1107, simple_loss=0.1236, pruned_loss=0.03715, audio_tagging_loss=0.01175, over 3043816.06 frames. ], batch size: 53, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:05:20,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=260786.66666666666, ans=0.0 2023-11-18 14:05:30,850 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.697e+01 9.501e+01 1.094e+02 1.227e+02 1.890e+02, threshold=2.188e+02, percent-clipped=0.0 2023-11-18 14:05:32,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.45 vs. limit=15.0 2023-11-18 14:05:35,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=260853.33333333334, ans=0.125 2023-11-18 14:05:53,406 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:06:07,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=261053.33333333334, ans=0.125 2023-11-18 14:06:10,822 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.35 vs. limit=6.0 2023-11-18 14:06:13,182 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.57 vs. limit=15.0 2023-11-18 14:06:15,725 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3100, loss[loss=0.0708, simple_loss=0.07178, pruned_loss=0.01952, audio_tagging_loss=0.01538, over 14784.00 frames. ], tot_loss[loss=0.1112, simple_loss=0.1241, pruned_loss=0.03729, audio_tagging_loss=0.01185, over 3046650.71 frames. ], batch size: 59, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:06:29,975 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.27 vs. limit=12.0 2023-11-18 14:06:44,942 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.49 vs. limit=6.0 2023-11-18 14:07:00,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=261386.66666666666, ans=0.1 2023-11-18 14:07:07,100 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2023-11-18 14:07:10,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=261386.66666666666, ans=0.125 2023-11-18 14:07:12,447 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3150, loss[loss=0.1306, simple_loss=0.1524, pruned_loss=0.04489, audio_tagging_loss=0.009524, over 16293.00 frames. ], tot_loss[loss=0.1118, simple_loss=0.125, pruned_loss=0.03747, audio_tagging_loss=0.01189, over 3043316.73 frames. ], batch size: 58, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:07:17,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=261453.33333333334, ans=0.1 2023-11-18 14:07:23,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=261520.0, ans=0.125 2023-11-18 14:07:24,260 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 9.617e+01 1.054e+02 1.142e+02 1.769e+02, threshold=2.109e+02, percent-clipped=0.0 2023-11-18 14:07:27,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.44 vs. limit=22.5 2023-11-18 14:07:36,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=261586.66666666666, ans=0.125 2023-11-18 14:07:38,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=261586.66666666666, ans=0.125 2023-11-18 14:08:06,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=261720.0, ans=0.0 2023-11-18 14:08:09,115 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3200, loss[loss=0.08514, simple_loss=0.0951, pruned_loss=0.02393, audio_tagging_loss=0.01366, over 14536.00 frames. ], tot_loss[loss=0.1115, simple_loss=0.1242, pruned_loss=0.03726, audio_tagging_loss=0.01215, over 3041775.35 frames. ], batch size: 55, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:08:15,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=261786.66666666666, ans=0.125 2023-11-18 14:08:36,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=261920.0, ans=0.125 2023-11-18 14:08:51,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=261986.66666666666, ans=0.125 2023-11-18 14:08:59,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=262053.33333333334, ans=10.0 2023-11-18 14:09:04,139 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3250, loss[loss=0.0965, simple_loss=0.1079, pruned_loss=0.03107, audio_tagging_loss=0.01145, over 15964.00 frames. ], tot_loss[loss=0.1112, simple_loss=0.1239, pruned_loss=0.037, audio_tagging_loss=0.01226, over 3037966.56 frames. ], batch size: 60, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:09:15,321 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.287e+01 9.218e+01 1.067e+02 1.190e+02 1.746e+02, threshold=2.133e+02, percent-clipped=0.0 2023-11-18 14:09:23,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=262186.6666666667, ans=0.0 2023-11-18 14:09:49,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=262386.6666666667, ans=0.0 2023-11-18 14:09:59,321 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3300, loss[loss=0.1143, simple_loss=0.1255, pruned_loss=0.04028, audio_tagging_loss=0.01127, over 16617.00 frames. ], tot_loss[loss=0.1115, simple_loss=0.1242, pruned_loss=0.0371, audio_tagging_loss=0.01231, over 3044772.98 frames. ], batch size: 63, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:10:03,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=262453.3333333333, ans=0.0 2023-11-18 14:10:15,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=262520.0, ans=0.0 2023-11-18 14:10:21,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=262586.6666666667, ans=0.0 2023-11-18 14:10:54,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=262720.0, ans=0.125 2023-11-18 14:10:56,641 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3350, loss[loss=0.1072, simple_loss=0.1112, pruned_loss=0.03439, audio_tagging_loss=0.01726, over 15039.00 frames. ], tot_loss[loss=0.1101, simple_loss=0.1226, pruned_loss=0.03655, audio_tagging_loss=0.01229, over 3047577.46 frames. ], batch size: 56, lr: 1.70e-02, grad_scale: 32.0 2023-11-18 14:11:07,052 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.720e+01 9.463e+01 1.035e+02 1.183e+02 1.659e+02, threshold=2.070e+02, percent-clipped=0.0 2023-11-18 14:11:46,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=263053.3333333333, ans=0.125 2023-11-18 14:11:51,603 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3400, loss[loss=0.1187, simple_loss=0.1391, pruned_loss=0.03779, audio_tagging_loss=0.01136, over 16171.00 frames. ], tot_loss[loss=0.1112, simple_loss=0.1244, pruned_loss=0.03704, audio_tagging_loss=0.01194, over 3045501.57 frames. ], batch size: 56, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:11:54,335 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.23 vs. limit=15.0 2023-11-18 14:12:01,298 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.70 vs. limit=15.0 2023-11-18 14:12:11,379 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.60 vs. limit=22.5 2023-11-18 14:12:14,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=263253.3333333333, ans=0.125 2023-11-18 14:12:18,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.45 vs. limit=15.0 2023-11-18 14:12:26,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=263320.0, ans=0.1 2023-11-18 14:12:27,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.14 vs. limit=15.0 2023-11-18 14:12:31,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=263320.0, ans=0.125 2023-11-18 14:12:37,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=263386.6666666667, ans=0.0 2023-11-18 14:12:38,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=263386.6666666667, ans=0.2 2023-11-18 14:12:42,442 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.24 vs. limit=10.0 2023-11-18 14:12:47,599 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3450, loss[loss=0.1154, simple_loss=0.1297, pruned_loss=0.04233, audio_tagging_loss=0.00817, over 14717.00 frames. ], tot_loss[loss=0.112, simple_loss=0.1254, pruned_loss=0.03758, audio_tagging_loss=0.01175, over 3048374.39 frames. ], batch size: 56, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:12:52,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=263453.3333333333, ans=0.125 2023-11-18 14:12:59,380 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.121e+01 9.398e+01 1.018e+02 1.161e+02 1.639e+02, threshold=2.037e+02, percent-clipped=0.0 2023-11-18 14:13:07,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=263520.0, ans=0.125 2023-11-18 14:13:26,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=263653.3333333333, ans=0.09899494936611666 2023-11-18 14:13:44,350 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3500, loss[loss=0.07852, simple_loss=0.08909, pruned_loss=0.02016, audio_tagging_loss=0.01381, over 15399.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1233, pruned_loss=0.03669, audio_tagging_loss=0.01169, over 3053561.61 frames. ], batch size: 59, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:14:13,287 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:14:15,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=263920.0, ans=0.2 2023-11-18 14:14:23,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=263986.6666666667, ans=0.0 2023-11-18 14:14:27,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=263986.6666666667, ans=0.07 2023-11-18 14:14:29,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=264053.3333333333, ans=0.07 2023-11-18 14:14:35,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=264053.3333333333, ans=0.0 2023-11-18 14:14:40,054 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3550, loss[loss=0.1233, simple_loss=0.1315, pruned_loss=0.04404, audio_tagging_loss=0.01353, over 13928.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1224, pruned_loss=0.03649, audio_tagging_loss=0.01166, over 3046610.51 frames. ], batch size: 55, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:14:51,084 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 9.400e+01 1.087e+02 1.239e+02 1.521e+02, threshold=2.174e+02, percent-clipped=0.0 2023-11-18 14:14:58,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=264186.6666666667, ans=0.035 2023-11-18 14:15:28,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=264386.6666666667, ans=0.125 2023-11-18 14:15:29,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.59 vs. limit=12.0 2023-11-18 14:15:32,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=264386.6666666667, ans=6.0 2023-11-18 14:15:35,556 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3600, loss[loss=0.111, simple_loss=0.1154, pruned_loss=0.04206, audio_tagging_loss=0.01128, over 15110.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1212, pruned_loss=0.03597, audio_tagging_loss=0.0118, over 3045557.86 frames. ], batch size: 56, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:15:42,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=264453.3333333333, ans=0.0 2023-11-18 14:15:58,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=264586.6666666667, ans=0.0 2023-11-18 14:16:09,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=264653.3333333333, ans=0.0 2023-11-18 14:16:13,510 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:16:21,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.89 vs. limit=15.0 2023-11-18 14:16:23,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=264720.0, ans=0.125 2023-11-18 14:16:27,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=264720.0, ans=0.0 2023-11-18 14:16:32,047 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3650, loss[loss=0.1137, simple_loss=0.1226, pruned_loss=0.04112, audio_tagging_loss=0.01125, over 15443.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1224, pruned_loss=0.03616, audio_tagging_loss=0.01181, over 3052370.82 frames. ], batch size: 57, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:16:34,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=264786.6666666667, ans=0.125 2023-11-18 14:16:37,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=264786.6666666667, ans=0.125 2023-11-18 14:16:41,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=264786.6666666667, ans=0.1 2023-11-18 14:16:43,165 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 9.557e+01 1.072e+02 1.214e+02 1.788e+02, threshold=2.145e+02, percent-clipped=0.0 2023-11-18 14:16:50,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=264853.3333333333, ans=0.1 2023-11-18 14:16:50,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=264853.3333333333, ans=0.0 2023-11-18 14:16:57,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=264920.0, ans=0.125 2023-11-18 14:17:17,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=265053.3333333333, ans=0.125 2023-11-18 14:17:26,145 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.69 vs. limit=22.5 2023-11-18 14:17:27,647 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3700, loss[loss=0.08972, simple_loss=0.09892, pruned_loss=0.02725, audio_tagging_loss=0.01301, over 15191.00 frames. ], tot_loss[loss=0.1095, simple_loss=0.123, pruned_loss=0.03642, audio_tagging_loss=0.01157, over 3054083.14 frames. ], batch size: 59, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:17:32,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=265120.0, ans=0.1 2023-11-18 14:17:33,629 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.34 vs. limit=15.0 2023-11-18 14:17:37,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=265186.6666666667, ans=0.2 2023-11-18 14:17:51,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=265253.3333333333, ans=0.1 2023-11-18 14:17:54,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=265253.3333333333, ans=0.04949747468305833 2023-11-18 14:17:56,280 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.31 vs. limit=22.5 2023-11-18 14:18:02,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=265320.0, ans=0.1 2023-11-18 14:18:23,569 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3750, loss[loss=0.1014, simple_loss=0.1133, pruned_loss=0.03378, audio_tagging_loss=0.01097, over 15456.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1226, pruned_loss=0.03652, audio_tagging_loss=0.01164, over 3054682.70 frames. ], batch size: 58, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:18:27,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=265453.3333333333, ans=0.2 2023-11-18 14:18:34,675 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.605e+01 1.034e+02 1.153e+02 1.284e+02 1.931e+02, threshold=2.306e+02, percent-clipped=0.0 2023-11-18 14:18:35,431 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.20 vs. limit=15.0 2023-11-18 14:18:56,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=265653.3333333333, ans=0.125 2023-11-18 14:19:01,047 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.19 vs. limit=5.0 2023-11-18 14:19:02,317 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:19:19,913 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3800, loss[loss=0.1195, simple_loss=0.1252, pruned_loss=0.03716, audio_tagging_loss=0.01972, over 13573.00 frames. ], tot_loss[loss=0.1096, simple_loss=0.1222, pruned_loss=0.03656, audio_tagging_loss=0.0119, over 3049969.00 frames. ], batch size: 54, lr: 1.69e-02, grad_scale: 32.0 2023-11-18 14:19:25,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=265786.6666666667, ans=0.1 2023-11-18 14:19:57,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=265986.6666666667, ans=0.2 2023-11-18 14:20:01,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=265986.6666666667, ans=0.125 2023-11-18 14:20:01,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=265986.6666666667, ans=0.0 2023-11-18 14:20:15,033 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3850, loss[loss=0.1238, simple_loss=0.139, pruned_loss=0.04106, audio_tagging_loss=0.01328, over 15857.00 frames. ], tot_loss[loss=0.1103, simple_loss=0.1228, pruned_loss=0.03694, audio_tagging_loss=0.01195, over 3053493.70 frames. ], batch size: 63, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:20:26,229 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.817e+01 9.423e+01 1.054e+02 1.147e+02 1.619e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 14:20:36,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=12.0 2023-11-18 14:20:43,455 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.78 vs. limit=15.0 2023-11-18 14:21:06,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=266386.6666666667, ans=0.0 2023-11-18 14:21:10,654 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3900, loss[loss=0.06575, simple_loss=0.06889, pruned_loss=0.01883, audio_tagging_loss=0.01248, over 14944.00 frames. ], tot_loss[loss=0.1106, simple_loss=0.1236, pruned_loss=0.03691, audio_tagging_loss=0.01191, over 3057178.19 frames. ], batch size: 58, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:21:14,050 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.02 vs. limit=15.0 2023-11-18 14:21:18,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=266453.3333333333, ans=0.125 2023-11-18 14:21:18,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=266453.3333333333, ans=0.125 2023-11-18 14:21:26,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=266520.0, ans=0.125 2023-11-18 14:21:30,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=266520.0, ans=0.2 2023-11-18 14:21:31,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=266520.0, ans=0.1 2023-11-18 14:21:45,550 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-40000.pt 2023-11-18 14:21:48,756 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2023-11-18 14:21:49,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=266653.3333333333, ans=0.07 2023-11-18 14:22:00,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=266720.0, ans=0.2 2023-11-18 14:22:10,119 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 3950, loss[loss=0.1682, simple_loss=0.1835, pruned_loss=0.06767, audio_tagging_loss=0.008782, over 14660.00 frames. ], tot_loss[loss=0.1116, simple_loss=0.1244, pruned_loss=0.03743, audio_tagging_loss=0.012, over 3054167.14 frames. ], batch size: 53, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:22:17,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=266786.6666666667, ans=0.125 2023-11-18 14:22:20,673 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.654e+01 9.384e+01 1.022e+02 1.131e+02 1.477e+02, threshold=2.044e+02, percent-clipped=0.0 2023-11-18 14:22:29,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=266853.3333333333, ans=0.1 2023-11-18 14:22:52,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=266986.6666666667, ans=0.1 2023-11-18 14:22:54,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=267053.3333333333, ans=0.125 2023-11-18 14:23:01,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=267053.3333333333, ans=0.09899494936611666 2023-11-18 14:23:05,089 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4000, loss[loss=0.114, simple_loss=0.133, pruned_loss=0.03742, audio_tagging_loss=0.01004, over 15202.00 frames. ], tot_loss[loss=0.1123, simple_loss=0.1253, pruned_loss=0.03771, audio_tagging_loss=0.01194, over 3053797.93 frames. ], batch size: 55, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:23:14,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=267186.6666666667, ans=0.1 2023-11-18 14:23:16,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=267186.6666666667, ans=0.125 2023-11-18 14:23:24,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=267186.6666666667, ans=0.0 2023-11-18 14:23:50,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=267386.6666666667, ans=0.2 2023-11-18 14:23:57,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=267386.6666666667, ans=0.125 2023-11-18 14:23:58,245 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2023-11-18 14:23:59,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=267386.6666666667, ans=0.125 2023-11-18 14:24:01,222 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4050, loss[loss=0.09696, simple_loss=0.109, pruned_loss=0.03119, audio_tagging_loss=0.01127, over 15485.00 frames. ], tot_loss[loss=0.1121, simple_loss=0.125, pruned_loss=0.03764, audio_tagging_loss=0.01195, over 3055327.25 frames. ], batch size: 57, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:24:03,473 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:24:12,550 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 9.604e+01 1.092e+02 1.269e+02 1.663e+02, threshold=2.185e+02, percent-clipped=0.0 2023-11-18 14:24:22,758 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.45 vs. limit=22.5 2023-11-18 14:24:27,885 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2023-11-18 14:24:44,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.41 vs. limit=15.0 2023-11-18 14:24:55,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.52 vs. limit=6.0 2023-11-18 14:24:56,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=267786.6666666667, ans=0.0 2023-11-18 14:24:56,888 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.76 vs. limit=15.0 2023-11-18 14:24:57,452 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4100, loss[loss=0.1022, simple_loss=0.1039, pruned_loss=0.03664, audio_tagging_loss=0.01363, over 14671.00 frames. ], tot_loss[loss=0.1126, simple_loss=0.1259, pruned_loss=0.03784, audio_tagging_loss=0.01183, over 3054143.86 frames. ], batch size: 58, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:25:05,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=267786.6666666667, ans=0.1 2023-11-18 14:25:25,922 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.96 vs. limit=6.0 2023-11-18 14:25:28,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=267920.0, ans=0.125 2023-11-18 14:25:32,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=267986.6666666667, ans=10.0 2023-11-18 14:25:47,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=268053.3333333333, ans=0.125 2023-11-18 14:25:49,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=268053.3333333333, ans=0.125 2023-11-18 14:25:49,879 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.96 vs. limit=15.0 2023-11-18 14:25:53,590 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4150, loss[loss=0.09272, simple_loss=0.1043, pruned_loss=0.03112, audio_tagging_loss=0.009438, over 16317.00 frames. ], tot_loss[loss=0.1124, simple_loss=0.1256, pruned_loss=0.03786, audio_tagging_loss=0.01177, over 3045236.63 frames. ], batch size: 60, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:26:02,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=268120.0, ans=0.0 2023-11-18 14:26:04,174 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.914e+01 9.487e+01 1.055e+02 1.166e+02 1.501e+02, threshold=2.109e+02, percent-clipped=0.0 2023-11-18 14:26:06,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=268186.6666666667, ans=0.125 2023-11-18 14:26:07,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=268186.6666666667, ans=0.125 2023-11-18 14:26:30,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=268320.0, ans=0.2 2023-11-18 14:26:33,424 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:26:34,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=268320.0, ans=0.0 2023-11-18 14:26:41,442 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.18 vs. limit=15.0 2023-11-18 14:26:44,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=268386.6666666667, ans=0.125 2023-11-18 14:26:48,315 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4200, loss[loss=0.09129, simple_loss=0.1013, pruned_loss=0.02767, audio_tagging_loss=0.01297, over 14692.00 frames. ], tot_loss[loss=0.1117, simple_loss=0.1254, pruned_loss=0.03745, audio_tagging_loss=0.01156, over 3043296.41 frames. ], batch size: 53, lr: 1.68e-02, grad_scale: 64.0 2023-11-18 14:26:58,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=268453.3333333333, ans=0.125 2023-11-18 14:27:23,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=268653.3333333333, ans=0.125 2023-11-18 14:27:27,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=268653.3333333333, ans=0.125 2023-11-18 14:27:30,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=268653.3333333333, ans=0.2 2023-11-18 14:27:40,190 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.50 vs. limit=22.5 2023-11-18 14:27:44,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=268786.6666666667, ans=0.125 2023-11-18 14:27:44,857 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4250, loss[loss=0.09027, simple_loss=0.101, pruned_loss=0.03093, audio_tagging_loss=0.008863, over 15604.00 frames. ], tot_loss[loss=0.111, simple_loss=0.1245, pruned_loss=0.03722, audio_tagging_loss=0.01151, over 3046593.52 frames. ], batch size: 59, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:27:46,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=268786.6666666667, ans=0.04949747468305833 2023-11-18 14:27:57,591 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 9.513e+01 1.037e+02 1.128e+02 1.811e+02, threshold=2.074e+02, percent-clipped=0.0 2023-11-18 14:28:22,649 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:28:27,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.88 vs. limit=8.0 2023-11-18 14:28:41,083 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4300, loss[loss=0.08538, simple_loss=0.09444, pruned_loss=0.02431, audio_tagging_loss=0.01385, over 15745.00 frames. ], tot_loss[loss=0.1117, simple_loss=0.1253, pruned_loss=0.03754, audio_tagging_loss=0.0115, over 3046083.85 frames. ], batch size: 58, lr: 1.68e-02, grad_scale: 32.0 2023-11-18 14:28:49,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=269120.0, ans=0.125 2023-11-18 14:28:51,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=269186.6666666667, ans=0.125 2023-11-18 14:28:59,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=269186.6666666667, ans=0.1 2023-11-18 14:29:14,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=269320.0, ans=0.1 2023-11-18 14:29:36,920 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4350, loss[loss=0.1435, simple_loss=0.1712, pruned_loss=0.05074, audio_tagging_loss=0.007132, over 14638.00 frames. ], tot_loss[loss=0.1111, simple_loss=0.1244, pruned_loss=0.03731, audio_tagging_loss=0.01158, over 3048019.87 frames. ], batch size: 54, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:29:37,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=269453.3333333333, ans=0.04949747468305833 2023-11-18 14:29:38,604 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.35 vs. limit=15.0 2023-11-18 14:29:42,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=269453.3333333333, ans=0.125 2023-11-18 14:29:47,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=269520.0, ans=0.125 2023-11-18 14:29:48,987 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.972e+01 1.042e+02 1.123e+02 1.311e+02 1.927e+02, threshold=2.246e+02, percent-clipped=0.0 2023-11-18 14:29:50,569 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.54 vs. limit=12.0 2023-11-18 14:30:04,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=269586.6666666667, ans=0.2 2023-11-18 14:30:07,442 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.19 vs. limit=22.5 2023-11-18 14:30:12,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=269653.3333333333, ans=0.125 2023-11-18 14:30:31,879 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4400, loss[loss=0.1244, simple_loss=0.1428, pruned_loss=0.04009, audio_tagging_loss=0.01287, over 14236.00 frames. ], tot_loss[loss=0.1107, simple_loss=0.1243, pruned_loss=0.03703, audio_tagging_loss=0.01152, over 3041448.76 frames. ], batch size: 53, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:30:56,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=269920.0, ans=0.0 2023-11-18 14:30:58,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=269920.0, ans=0.0 2023-11-18 14:31:00,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=269920.0, ans=0.1 2023-11-18 14:31:12,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=269986.6666666667, ans=0.125 2023-11-18 14:31:14,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=269986.6666666667, ans=0.125 2023-11-18 14:31:28,578 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4450, loss[loss=0.1268, simple_loss=0.1536, pruned_loss=0.0408, audio_tagging_loss=0.009156, over 15857.00 frames. ], tot_loss[loss=0.1116, simple_loss=0.1254, pruned_loss=0.03754, audio_tagging_loss=0.01139, over 3044471.87 frames. ], batch size: 57, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:31:34,405 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.98 vs. limit=15.0 2023-11-18 14:31:40,207 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.353e+01 9.826e+01 1.064e+02 1.191e+02 1.732e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-18 14:31:54,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=270253.3333333333, ans=0.0 2023-11-18 14:31:59,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=270253.3333333333, ans=0.0 2023-11-18 14:32:23,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=270453.3333333333, ans=0.125 2023-11-18 14:32:23,823 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4500, loss[loss=0.08571, simple_loss=0.08698, pruned_loss=0.02694, audio_tagging_loss=0.01528, over 15213.00 frames. ], tot_loss[loss=0.1117, simple_loss=0.1252, pruned_loss=0.03767, audio_tagging_loss=0.01143, over 3046001.71 frames. ], batch size: 59, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:32:30,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=270453.3333333333, ans=0.0 2023-11-18 14:32:31,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=270453.3333333333, ans=0.0 2023-11-18 14:32:38,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.22 vs. limit=15.0 2023-11-18 14:32:43,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=270520.0, ans=0.125 2023-11-18 14:32:51,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=270586.6666666667, ans=0.1 2023-11-18 14:33:05,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=270653.3333333333, ans=0.125 2023-11-18 14:33:17,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=270720.0, ans=0.125 2023-11-18 14:33:20,081 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4550, loss[loss=0.0903, simple_loss=0.09607, pruned_loss=0.02761, audio_tagging_loss=0.01466, over 15249.00 frames. ], tot_loss[loss=0.1107, simple_loss=0.1244, pruned_loss=0.03706, audio_tagging_loss=0.01151, over 3044633.07 frames. ], batch size: 60, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:33:33,252 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.057e+01 9.598e+01 1.091e+02 1.194e+02 2.832e+02, threshold=2.183e+02, percent-clipped=1.0 2023-11-18 14:33:35,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=270853.3333333333, ans=0.0 2023-11-18 14:33:40,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=270853.3333333333, ans=0.125 2023-11-18 14:34:00,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=270986.6666666667, ans=0.125 2023-11-18 14:34:02,582 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:34:02,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=270986.6666666667, ans=0.1 2023-11-18 14:34:03,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=271053.3333333333, ans=0.125 2023-11-18 14:34:10,048 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.51 vs. limit=10.0 2023-11-18 14:34:17,074 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4600, loss[loss=0.08371, simple_loss=0.09127, pruned_loss=0.02693, audio_tagging_loss=0.01115, over 13905.00 frames. ], tot_loss[loss=0.1099, simple_loss=0.123, pruned_loss=0.03683, audio_tagging_loss=0.01159, over 3043923.19 frames. ], batch size: 53, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:34:19,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=271120.0, ans=0.125 2023-11-18 14:34:32,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=271186.6666666667, ans=0.125 2023-11-18 14:34:37,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=271253.3333333333, ans=0.125 2023-11-18 14:34:55,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=271320.0, ans=0.1 2023-11-18 14:34:57,432 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.15 vs. limit=15.0 2023-11-18 14:34:59,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.42 vs. limit=15.0 2023-11-18 14:35:03,853 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:35:06,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=271386.6666666667, ans=0.125 2023-11-18 14:35:12,027 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4650, loss[loss=0.1138, simple_loss=0.1344, pruned_loss=0.03654, audio_tagging_loss=0.01009, over 15968.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1233, pruned_loss=0.0367, audio_tagging_loss=0.01166, over 3043198.08 frames. ], batch size: 59, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:35:24,106 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.025e+01 1.020e+02 1.140e+02 1.306e+02 2.124e+02, threshold=2.280e+02, percent-clipped=0.0 2023-11-18 14:35:33,890 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.60 vs. limit=15.0 2023-11-18 14:36:04,459 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.90 vs. limit=15.0 2023-11-18 14:36:07,761 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4700, loss[loss=0.1254, simple_loss=0.1272, pruned_loss=0.04736, audio_tagging_loss=0.01446, over 14943.00 frames. ], tot_loss[loss=0.1102, simple_loss=0.1232, pruned_loss=0.03684, audio_tagging_loss=0.01178, over 3051397.21 frames. ], batch size: 57, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:36:08,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=271786.6666666667, ans=0.05 2023-11-18 14:36:24,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=271853.3333333333, ans=0.0 2023-11-18 14:36:36,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=271920.0, ans=0.2 2023-11-18 14:36:54,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=272053.3333333333, ans=0.0 2023-11-18 14:37:04,166 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4750, loss[loss=0.1146, simple_loss=0.1322, pruned_loss=0.03954, audio_tagging_loss=0.00893, over 15451.00 frames. ], tot_loss[loss=0.11, simple_loss=0.1231, pruned_loss=0.03666, audio_tagging_loss=0.01182, over 3055009.10 frames. ], batch size: 55, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:37:05,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=272120.0, ans=0.125 2023-11-18 14:37:13,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=272120.0, ans=0.1 2023-11-18 14:37:16,362 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.682e+01 9.592e+01 1.080e+02 1.195e+02 1.652e+02, threshold=2.159e+02, percent-clipped=0.0 2023-11-18 14:37:18,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=272186.6666666667, ans=0.125 2023-11-18 14:37:18,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=272186.6666666667, ans=0.2 2023-11-18 14:37:20,035 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.59 vs. limit=15.0 2023-11-18 14:37:20,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=272186.6666666667, ans=0.0 2023-11-18 14:37:24,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=272253.3333333333, ans=0.2 2023-11-18 14:37:26,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=272253.3333333333, ans=0.125 2023-11-18 14:37:32,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=272253.3333333333, ans=0.125 2023-11-18 14:37:38,507 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.13 vs. limit=15.0 2023-11-18 14:37:59,707 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4800, loss[loss=0.1045, simple_loss=0.1193, pruned_loss=0.03486, audio_tagging_loss=0.009983, over 15942.00 frames. ], tot_loss[loss=0.1095, simple_loss=0.1226, pruned_loss=0.03623, audio_tagging_loss=0.01198, over 3049304.98 frames. ], batch size: 59, lr: 1.67e-02, grad_scale: 32.0 2023-11-18 14:38:00,097 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.28 vs. limit=15.0 2023-11-18 14:38:00,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=272453.3333333333, ans=0.125 2023-11-18 14:38:07,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=272453.3333333333, ans=0.1 2023-11-18 14:38:13,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=272520.0, ans=0.07 2023-11-18 14:38:14,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=272520.0, ans=0.125 2023-11-18 14:38:31,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=272586.6666666667, ans=0.0 2023-11-18 14:38:31,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=272586.6666666667, ans=0.125 2023-11-18 14:38:38,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=12.10 vs. limit=15.0 2023-11-18 14:38:40,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=272653.3333333333, ans=0.2 2023-11-18 14:38:40,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=272653.3333333333, ans=0.0 2023-11-18 14:38:55,230 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4850, loss[loss=0.08333, simple_loss=0.1042, pruned_loss=0.02219, audio_tagging_loss=0.00904, over 14559.00 frames. ], tot_loss[loss=0.1105, simple_loss=0.1238, pruned_loss=0.03661, audio_tagging_loss=0.01201, over 3053160.16 frames. ], batch size: 53, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:38:57,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=272786.6666666667, ans=0.04949747468305833 2023-11-18 14:39:02,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=272786.6666666667, ans=0.125 2023-11-18 14:39:07,955 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.577e+01 9.439e+01 1.075e+02 1.233e+02 2.240e+02, threshold=2.150e+02, percent-clipped=1.0 2023-11-18 14:39:18,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=272920.0, ans=0.0 2023-11-18 14:39:19,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=272920.0, ans=0.0 2023-11-18 14:39:28,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=272986.6666666667, ans=0.0 2023-11-18 14:39:31,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=272986.6666666667, ans=0.125 2023-11-18 14:39:35,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=272986.6666666667, ans=0.0 2023-11-18 14:39:45,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=273053.3333333333, ans=0.125 2023-11-18 14:39:51,332 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4900, loss[loss=0.1097, simple_loss=0.1223, pruned_loss=0.03815, audio_tagging_loss=0.01042, over 16293.00 frames. ], tot_loss[loss=0.1106, simple_loss=0.1239, pruned_loss=0.03678, audio_tagging_loss=0.0119, over 3052170.27 frames. ], batch size: 63, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:40:20,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=273253.3333333333, ans=0.125 2023-11-18 14:40:33,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=273320.0, ans=0.0 2023-11-18 14:40:34,422 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.99 vs. limit=15.0 2023-11-18 14:40:46,586 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 4950, loss[loss=0.1471, simple_loss=0.1673, pruned_loss=0.05673, audio_tagging_loss=0.006759, over 15291.00 frames. ], tot_loss[loss=0.1101, simple_loss=0.1233, pruned_loss=0.03671, audio_tagging_loss=0.01171, over 3051456.34 frames. ], batch size: 55, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:40:58,627 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.761e+01 9.503e+01 1.074e+02 1.226e+02 1.825e+02, threshold=2.148e+02, percent-clipped=0.0 2023-11-18 14:41:02,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=273520.0, ans=0.0 2023-11-18 14:41:21,214 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.41 vs. limit=15.0 2023-11-18 14:41:26,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=273653.3333333333, ans=0.125 2023-11-18 14:41:28,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=273653.3333333333, ans=0.125 2023-11-18 14:41:42,366 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5000, loss[loss=0.06888, simple_loss=0.06382, pruned_loss=0.01861, audio_tagging_loss=0.01835, over 14353.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1224, pruned_loss=0.03638, audio_tagging_loss=0.01158, over 3047598.51 frames. ], batch size: 57, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:42:07,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=273920.0, ans=0.125 2023-11-18 14:42:08,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=273920.0, ans=0.09899494936611666 2023-11-18 14:42:13,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=273920.0, ans=0.1 2023-11-18 14:42:38,357 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5050, loss[loss=0.1074, simple_loss=0.121, pruned_loss=0.03877, audio_tagging_loss=0.008095, over 15865.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1228, pruned_loss=0.0365, audio_tagging_loss=0.01152, over 3047375.72 frames. ], batch size: 61, lr: 1.66e-02, grad_scale: 16.0 2023-11-18 14:42:44,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.18 vs. limit=15.0 2023-11-18 14:42:49,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=274186.6666666667, ans=0.125 2023-11-18 14:42:51,025 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 9.577e+01 1.097e+02 1.238e+02 1.791e+02, threshold=2.193e+02, percent-clipped=0.0 2023-11-18 14:43:04,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=274253.3333333333, ans=0.125 2023-11-18 14:43:12,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=274320.0, ans=0.125 2023-11-18 14:43:20,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=274320.0, ans=0.125 2023-11-18 14:43:20,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=274320.0, ans=0.125 2023-11-18 14:43:25,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=274386.6666666667, ans=0.0 2023-11-18 14:43:27,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=274386.6666666667, ans=0.125 2023-11-18 14:43:32,743 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5100, loss[loss=0.1046, simple_loss=0.1084, pruned_loss=0.03734, audio_tagging_loss=0.01306, over 15130.00 frames. ], tot_loss[loss=0.1085, simple_loss=0.1216, pruned_loss=0.03612, audio_tagging_loss=0.01159, over 3044817.35 frames. ], batch size: 56, lr: 1.66e-02, grad_scale: 16.0 2023-11-18 14:43:51,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=274520.0, ans=0.2 2023-11-18 14:43:58,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=274586.6666666667, ans=0.125 2023-11-18 14:44:27,772 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5150, loss[loss=0.08202, simple_loss=0.08925, pruned_loss=0.02115, audio_tagging_loss=0.01624, over 13519.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.1211, pruned_loss=0.03585, audio_tagging_loss=0.01169, over 3037746.34 frames. ], batch size: 56, lr: 1.66e-02, grad_scale: 16.0 2023-11-18 14:44:41,459 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.141e+01 9.655e+01 1.078e+02 1.221e+02 1.622e+02, threshold=2.156e+02, percent-clipped=0.0 2023-11-18 14:45:01,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.21 vs. limit=15.0 2023-11-18 14:45:05,851 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.93 vs. limit=10.0 2023-11-18 14:45:23,355 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5200, loss[loss=0.09736, simple_loss=0.1096, pruned_loss=0.03345, audio_tagging_loss=0.009107, over 15073.00 frames. ], tot_loss[loss=0.1088, simple_loss=0.122, pruned_loss=0.03622, audio_tagging_loss=0.01158, over 3042062.47 frames. ], batch size: 57, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:45:42,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=275186.6666666667, ans=0.0 2023-11-18 14:45:45,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=275253.3333333333, ans=0.0 2023-11-18 14:45:47,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=275253.3333333333, ans=0.125 2023-11-18 14:45:58,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=275320.0, ans=0.05 2023-11-18 14:46:06,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=275386.6666666667, ans=0.125 2023-11-18 14:46:11,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=275386.6666666667, ans=0.125 2023-11-18 14:46:18,241 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5250, loss[loss=0.09316, simple_loss=0.1042, pruned_loss=0.02859, audio_tagging_loss=0.01244, over 16159.00 frames. ], tot_loss[loss=0.109, simple_loss=0.1218, pruned_loss=0.03647, audio_tagging_loss=0.01165, over 3039655.39 frames. ], batch size: 61, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:46:30,876 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.969e+01 9.429e+01 1.029e+02 1.136e+02 1.567e+02, threshold=2.057e+02, percent-clipped=0.0 2023-11-18 14:46:41,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=275586.6666666667, ans=0.1 2023-11-18 14:46:51,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=275653.3333333333, ans=0.125 2023-11-18 14:47:12,130 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5300, loss[loss=0.1319, simple_loss=0.1436, pruned_loss=0.04815, audio_tagging_loss=0.012, over 14880.00 frames. ], tot_loss[loss=0.1099, simple_loss=0.1229, pruned_loss=0.03692, audio_tagging_loss=0.01156, over 3042972.96 frames. ], batch size: 58, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:47:36,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=275920.0, ans=0.125 2023-11-18 14:47:40,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=275920.0, ans=0.0 2023-11-18 14:47:42,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=275920.0, ans=0.1 2023-11-18 14:47:57,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=276053.3333333333, ans=0.2 2023-11-18 14:48:07,978 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5350, loss[loss=0.1096, simple_loss=0.1209, pruned_loss=0.03913, audio_tagging_loss=0.01006, over 15263.00 frames. ], tot_loss[loss=0.109, simple_loss=0.1216, pruned_loss=0.03649, audio_tagging_loss=0.01167, over 3042388.33 frames. ], batch size: 55, lr: 1.66e-02, grad_scale: 32.0 2023-11-18 14:48:13,275 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.40 vs. limit=15.0 2023-11-18 14:48:21,233 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.530e+01 9.718e+01 1.034e+02 1.191e+02 1.805e+02, threshold=2.068e+02, percent-clipped=0.0 2023-11-18 14:48:36,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=276253.3333333333, ans=0.0 2023-11-18 14:48:43,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=276320.0, ans=0.1 2023-11-18 14:48:45,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=276320.0, ans=0.0 2023-11-18 14:49:00,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.10 vs. limit=15.0 2023-11-18 14:49:01,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=276386.6666666667, ans=0.0 2023-11-18 14:49:03,117 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5400, loss[loss=0.1253, simple_loss=0.1532, pruned_loss=0.03934, audio_tagging_loss=0.009358, over 14901.00 frames. ], tot_loss[loss=0.1093, simple_loss=0.1222, pruned_loss=0.03659, audio_tagging_loss=0.01165, over 3041295.23 frames. ], batch size: 55, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:49:13,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=276520.0, ans=0.125 2023-11-18 14:49:23,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=276586.6666666667, ans=0.125 2023-11-18 14:49:28,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=276586.6666666667, ans=0.1 2023-11-18 14:49:38,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=276653.3333333333, ans=0.125 2023-11-18 14:49:40,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=276653.3333333333, ans=0.0 2023-11-18 14:49:41,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=276653.3333333333, ans=0.125 2023-11-18 14:49:55,043 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.72 vs. limit=15.0 2023-11-18 14:49:57,658 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5450, loss[loss=0.1325, simple_loss=0.1547, pruned_loss=0.04686, audio_tagging_loss=0.00825, over 15513.00 frames. ], tot_loss[loss=0.1106, simple_loss=0.1238, pruned_loss=0.03712, audio_tagging_loss=0.01164, over 3045614.71 frames. ], batch size: 58, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:50:05,452 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.99 vs. limit=15.0 2023-11-18 14:50:10,737 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 9.674e+01 1.094e+02 1.267e+02 1.723e+02, threshold=2.188e+02, percent-clipped=0.0 2023-11-18 14:50:34,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=276986.6666666667, ans=0.0 2023-11-18 14:50:52,384 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5500, loss[loss=0.09633, simple_loss=0.1026, pruned_loss=0.03191, audio_tagging_loss=0.01312, over 13715.00 frames. ], tot_loss[loss=0.112, simple_loss=0.1256, pruned_loss=0.03767, audio_tagging_loss=0.01154, over 3044742.55 frames. ], batch size: 54, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:50:57,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=277120.0, ans=0.125 2023-11-18 14:51:47,655 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5550, loss[loss=0.08221, simple_loss=0.08898, pruned_loss=0.02376, audio_tagging_loss=0.01396, over 13846.00 frames. ], tot_loss[loss=0.1117, simple_loss=0.1251, pruned_loss=0.0374, audio_tagging_loss=0.01171, over 3046580.47 frames. ], batch size: 54, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:51:58,924 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2023-11-18 14:52:00,288 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.658e+01 9.567e+01 1.041e+02 1.171e+02 1.468e+02, threshold=2.082e+02, percent-clipped=0.0 2023-11-18 14:52:11,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=277586.6666666667, ans=0.1 2023-11-18 14:52:24,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=277653.3333333333, ans=0.0 2023-11-18 14:52:25,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=277653.3333333333, ans=0.0 2023-11-18 14:52:41,985 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5600, loss[loss=0.1178, simple_loss=0.1439, pruned_loss=0.03747, audio_tagging_loss=0.00836, over 15663.00 frames. ], tot_loss[loss=0.1106, simple_loss=0.1243, pruned_loss=0.03671, audio_tagging_loss=0.01172, over 3040767.53 frames. ], batch size: 53, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:52:42,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=277786.6666666667, ans=0.1 2023-11-18 14:52:43,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.42 vs. limit=15.0 2023-11-18 14:52:43,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.52 vs. limit=12.0 2023-11-18 14:52:45,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=277786.6666666667, ans=0.125 2023-11-18 14:53:05,020 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:53:06,298 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.77 vs. limit=15.0 2023-11-18 14:53:19,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=277986.6666666667, ans=0.0 2023-11-18 14:53:21,532 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 14:53:21,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=277986.6666666667, ans=0.125 2023-11-18 14:53:23,835 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 14:53:31,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.14 vs. limit=15.0 2023-11-18 14:53:36,766 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5650, loss[loss=0.1088, simple_loss=0.1317, pruned_loss=0.03356, audio_tagging_loss=0.009369, over 15790.00 frames. ], tot_loss[loss=0.1098, simple_loss=0.1231, pruned_loss=0.0363, audio_tagging_loss=0.01198, over 3042555.02 frames. ], batch size: 59, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:53:37,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=278120.0, ans=0.0 2023-11-18 14:53:40,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=278120.0, ans=0.125 2023-11-18 14:53:44,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=278120.0, ans=0.125 2023-11-18 14:53:48,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=278186.6666666667, ans=0.04949747468305833 2023-11-18 14:53:50,486 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.691e+01 9.354e+01 1.022e+02 1.173e+02 1.530e+02, threshold=2.043e+02, percent-clipped=0.0 2023-11-18 14:54:23,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=278386.6666666667, ans=0.125 2023-11-18 14:54:29,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=278386.6666666667, ans=0.125 2023-11-18 14:54:32,134 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5700, loss[loss=0.1223, simple_loss=0.1447, pruned_loss=0.03977, audio_tagging_loss=0.01011, over 15352.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1226, pruned_loss=0.03629, audio_tagging_loss=0.01182, over 3045094.28 frames. ], batch size: 56, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:54:33,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=278453.3333333333, ans=0.125 2023-11-18 14:54:37,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=278453.3333333333, ans=0.0 2023-11-18 14:54:53,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=278586.6666666667, ans=0.0 2023-11-18 14:54:58,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.62 vs. limit=15.0 2023-11-18 14:55:17,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=278720.0, ans=0.02 2023-11-18 14:55:22,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=278720.0, ans=0.1 2023-11-18 14:55:27,002 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5750, loss[loss=0.08968, simple_loss=0.1048, pruned_loss=0.02392, audio_tagging_loss=0.01334, over 14804.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.1212, pruned_loss=0.03582, audio_tagging_loss=0.01179, over 3048514.78 frames. ], batch size: 58, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:55:29,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=278786.6666666667, ans=0.2 2023-11-18 14:55:34,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=278786.6666666667, ans=0.125 2023-11-18 14:55:35,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=278786.6666666667, ans=0.0 2023-11-18 14:55:40,247 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.975e+01 9.668e+01 1.031e+02 1.141e+02 1.503e+02, threshold=2.062e+02, percent-clipped=0.0 2023-11-18 14:55:49,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.81 vs. limit=22.5 2023-11-18 14:56:02,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=278986.6666666667, ans=0.125 2023-11-18 14:56:06,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=278986.6666666667, ans=0.125 2023-11-18 14:56:22,416 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5800, loss[loss=0.1198, simple_loss=0.1284, pruned_loss=0.04303, audio_tagging_loss=0.01253, over 14427.00 frames. ], tot_loss[loss=0.1075, simple_loss=0.1202, pruned_loss=0.0357, audio_tagging_loss=0.01173, over 3044407.05 frames. ], batch size: 53, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:56:33,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2023-11-18 14:57:18,277 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5850, loss[loss=0.1031, simple_loss=0.1255, pruned_loss=0.02932, audio_tagging_loss=0.011, over 16010.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1224, pruned_loss=0.03654, audio_tagging_loss=0.01146, over 3048963.97 frames. ], batch size: 60, lr: 1.65e-02, grad_scale: 32.0 2023-11-18 14:57:31,471 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.015e+01 9.648e+01 1.054e+02 1.215e+02 1.872e+02, threshold=2.108e+02, percent-clipped=0.0 2023-11-18 14:58:13,667 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5900, loss[loss=0.1303, simple_loss=0.1466, pruned_loss=0.04744, audio_tagging_loss=0.009612, over 15859.00 frames. ], tot_loss[loss=0.1089, simple_loss=0.1219, pruned_loss=0.03645, audio_tagging_loss=0.01145, over 3048575.05 frames. ], batch size: 59, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 14:58:36,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=279920.0, ans=0.0 2023-11-18 14:58:43,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=279920.0, ans=0.125 2023-11-18 14:58:55,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=279986.6666666667, ans=0.1 2023-11-18 14:58:55,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=279986.6666666667, ans=0.025 2023-11-18 14:59:02,005 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=7.71 vs. limit=15.0 2023-11-18 14:59:08,890 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 5950, loss[loss=0.1202, simple_loss=0.1283, pruned_loss=0.04146, audio_tagging_loss=0.0146, over 14469.00 frames. ], tot_loss[loss=0.1088, simple_loss=0.1221, pruned_loss=0.03631, audio_tagging_loss=0.01146, over 3052560.19 frames. ], batch size: 55, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 14:59:23,196 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 1.041e+02 1.163e+02 1.306e+02 1.742e+02, threshold=2.325e+02, percent-clipped=0.0 2023-11-18 14:59:25,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=280186.6666666667, ans=0.05 2023-11-18 15:00:00,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=280386.6666666667, ans=0.1 2023-11-18 15:00:05,303 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6000, loss[loss=0.08559, simple_loss=0.09325, pruned_loss=0.02804, audio_tagging_loss=0.01093, over 14937.00 frames. ], tot_loss[loss=0.1085, simple_loss=0.1218, pruned_loss=0.03622, audio_tagging_loss=0.0114, over 3049397.62 frames. ], batch size: 55, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:00:05,305 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 15:00:38,407 INFO [train_asr.py:1147] (0/4) Epoch 4, validation: loss=0.07584, simple_loss=0.06235, pruned_loss=0.0102, audio_tagging_loss=0.03446, over 4681554.00 frames. 2023-11-18 15:00:38,408 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 15:01:06,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=280586.6666666667, ans=0.125 2023-11-18 15:01:11,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=280653.3333333333, ans=0.04949747468305833 2023-11-18 15:01:18,961 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:01:33,798 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6050, loss[loss=0.115, simple_loss=0.1239, pruned_loss=0.04373, audio_tagging_loss=0.00931, over 14366.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1217, pruned_loss=0.03614, audio_tagging_loss=0.01136, over 3044538.89 frames. ], batch size: 56, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:01:34,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=280786.6666666667, ans=0.125 2023-11-18 15:01:47,523 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.202e+01 9.320e+01 1.035e+02 1.195e+02 1.658e+02, threshold=2.071e+02, percent-clipped=0.0 2023-11-18 15:02:04,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=280920.0, ans=0.125 2023-11-18 15:02:18,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=281053.3333333333, ans=0.0 2023-11-18 15:02:27,329 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.39 vs. limit=15.0 2023-11-18 15:02:29,914 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6100, loss[loss=0.09307, simple_loss=0.1082, pruned_loss=0.02982, audio_tagging_loss=0.009146, over 14878.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.1215, pruned_loss=0.03607, audio_tagging_loss=0.01137, over 3039695.93 frames. ], batch size: 56, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:02:32,605 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.92 vs. limit=22.5 2023-11-18 15:02:52,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=281253.3333333333, ans=0.07 2023-11-18 15:02:53,052 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.87 vs. limit=15.0 2023-11-18 15:02:53,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_ff3.min_abs, batch_count=281253.3333333333, ans=0.2 2023-11-18 15:03:13,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=281386.6666666667, ans=0.125 2023-11-18 15:03:20,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=281386.6666666667, ans=0.1 2023-11-18 15:03:24,781 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6150, loss[loss=0.1226, simple_loss=0.1397, pruned_loss=0.04291, audio_tagging_loss=0.009858, over 15194.00 frames. ], tot_loss[loss=0.1093, simple_loss=0.1228, pruned_loss=0.03652, audio_tagging_loss=0.01142, over 3041194.04 frames. ], batch size: 57, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:03:27,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=281453.3333333333, ans=0.1 2023-11-18 15:03:38,017 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.635e+01 9.712e+01 1.096e+02 1.258e+02 1.781e+02, threshold=2.192e+02, percent-clipped=0.0 2023-11-18 15:03:40,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=281520.0, ans=0.125 2023-11-18 15:04:06,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=281653.3333333333, ans=0.0 2023-11-18 15:04:07,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=281653.3333333333, ans=0.125 2023-11-18 15:04:20,408 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6200, loss[loss=0.1241, simple_loss=0.1379, pruned_loss=0.04532, audio_tagging_loss=0.009831, over 16022.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.1224, pruned_loss=0.03642, audio_tagging_loss=0.01146, over 3037502.65 frames. ], batch size: 57, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:04:23,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=281786.6666666667, ans=0.07 2023-11-18 15:04:24,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=281786.6666666667, ans=15.0 2023-11-18 15:04:26,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=281786.6666666667, ans=0.1 2023-11-18 15:04:26,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=281786.6666666667, ans=0.1 2023-11-18 15:04:27,006 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.42 vs. limit=6.0 2023-11-18 15:04:29,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=281786.6666666667, ans=0.1 2023-11-18 15:04:30,438 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.14 vs. limit=12.0 2023-11-18 15:04:48,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=281920.0, ans=0.125 2023-11-18 15:04:58,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=281986.6666666667, ans=0.05 2023-11-18 15:05:08,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=282053.3333333333, ans=0.125 2023-11-18 15:05:17,031 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6250, loss[loss=0.1079, simple_loss=0.1147, pruned_loss=0.03685, audio_tagging_loss=0.01372, over 15929.00 frames. ], tot_loss[loss=0.1087, simple_loss=0.122, pruned_loss=0.03608, audio_tagging_loss=0.01167, over 3038133.17 frames. ], batch size: 60, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:05:29,636 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.742e+01 9.446e+01 1.080e+02 1.226e+02 1.932e+02, threshold=2.161e+02, percent-clipped=0.0 2023-11-18 15:05:52,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=282320.0, ans=0.125 2023-11-18 15:06:11,965 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6300, loss[loss=0.1204, simple_loss=0.132, pruned_loss=0.04319, audio_tagging_loss=0.01122, over 15053.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1223, pruned_loss=0.03627, audio_tagging_loss=0.01176, over 3041230.16 frames. ], batch size: 56, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:06:27,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=282520.0, ans=0.125 2023-11-18 15:06:53,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=282653.3333333333, ans=10.0 2023-11-18 15:07:07,486 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6350, loss[loss=0.1246, simple_loss=0.1499, pruned_loss=0.04271, audio_tagging_loss=0.006903, over 16860.00 frames. ], tot_loss[loss=0.1093, simple_loss=0.1221, pruned_loss=0.03643, audio_tagging_loss=0.01179, over 3039291.68 frames. ], batch size: 62, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:07:21,679 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 9.648e+01 1.090e+02 1.229e+02 1.753e+02, threshold=2.179e+02, percent-clipped=0.0 2023-11-18 15:07:30,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=282920.0, ans=0.1 2023-11-18 15:07:33,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=282920.0, ans=0.0 2023-11-18 15:07:35,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=282920.0, ans=0.0 2023-11-18 15:07:39,936 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.16 vs. limit=15.0 2023-11-18 15:07:56,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=283053.3333333333, ans=0.125 2023-11-18 15:07:59,711 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:07:59,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=283053.3333333333, ans=0.2 2023-11-18 15:08:03,795 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6400, loss[loss=0.113, simple_loss=0.121, pruned_loss=0.03723, audio_tagging_loss=0.01527, over 14901.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.1208, pruned_loss=0.03589, audio_tagging_loss=0.01197, over 3039036.90 frames. ], batch size: 56, lr: 1.64e-02, grad_scale: 32.0 2023-11-18 15:08:05,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=283120.0, ans=0.125 2023-11-18 15:08:13,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=283186.6666666667, ans=0.2 2023-11-18 15:08:30,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=283253.3333333333, ans=0.0 2023-11-18 15:08:43,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=283320.0, ans=0.125 2023-11-18 15:08:45,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=283320.0, ans=0.0 2023-11-18 15:08:47,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=283386.6666666667, ans=0.0 2023-11-18 15:08:58,502 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6450, loss[loss=0.1223, simple_loss=0.1431, pruned_loss=0.04293, audio_tagging_loss=0.00781, over 14987.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1211, pruned_loss=0.03592, audio_tagging_loss=0.01193, over 3037300.44 frames. ], batch size: 54, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:09:02,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=283453.3333333333, ans=0.125 2023-11-18 15:09:07,432 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.51 vs. limit=6.0 2023-11-18 15:09:11,021 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.683e+01 9.197e+01 1.014e+02 1.179e+02 1.440e+02, threshold=2.029e+02, percent-clipped=0.0 2023-11-18 15:09:20,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=283586.6666666667, ans=0.1 2023-11-18 15:09:22,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=283586.6666666667, ans=0.1 2023-11-18 15:09:53,327 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6500, loss[loss=0.1092, simple_loss=0.1151, pruned_loss=0.04076, audio_tagging_loss=0.01086, over 14434.00 frames. ], tot_loss[loss=0.1087, simple_loss=0.1215, pruned_loss=0.03609, audio_tagging_loss=0.01185, over 3046024.57 frames. ], batch size: 53, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:10:10,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=283853.3333333333, ans=0.2 2023-11-18 15:10:20,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=283920.0, ans=0.0 2023-11-18 15:10:23,293 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.49 vs. limit=15.0 2023-11-18 15:10:28,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=283986.6666666667, ans=0.125 2023-11-18 15:10:29,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=283986.6666666667, ans=0.125 2023-11-18 15:10:39,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=284053.3333333333, ans=22.5 2023-11-18 15:10:43,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.79 vs. limit=10.0 2023-11-18 15:10:49,939 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6550, loss[loss=0.1289, simple_loss=0.1424, pruned_loss=0.04707, audio_tagging_loss=0.01061, over 15092.00 frames. ], tot_loss[loss=0.108, simple_loss=0.1211, pruned_loss=0.03578, audio_tagging_loss=0.01173, over 3050253.91 frames. ], batch size: 57, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:11:03,062 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.721e+01 9.628e+01 1.072e+02 1.195e+02 1.710e+02, threshold=2.144e+02, percent-clipped=0.0 2023-11-18 15:11:10,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=284253.3333333333, ans=0.125 2023-11-18 15:11:23,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=284320.0, ans=0.0 2023-11-18 15:11:45,571 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6600, loss[loss=0.09758, simple_loss=0.1059, pruned_loss=0.03261, audio_tagging_loss=0.01202, over 13906.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.1204, pruned_loss=0.03569, audio_tagging_loss=0.01175, over 3046825.13 frames. ], batch size: 52, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:12:22,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=284653.3333333333, ans=0.1 2023-11-18 15:12:26,296 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=24.41 vs. limit=22.5 2023-11-18 15:12:32,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=284720.0, ans=0.125 2023-11-18 15:12:38,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=284720.0, ans=0.0 2023-11-18 15:12:40,474 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6650, loss[loss=0.09814, simple_loss=0.108, pruned_loss=0.03173, audio_tagging_loss=0.0124, over 14786.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.1206, pruned_loss=0.03564, audio_tagging_loss=0.01167, over 3042547.17 frames. ], batch size: 56, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:12:40,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=284786.6666666667, ans=0.125 2023-11-18 15:12:45,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=284786.6666666667, ans=0.125 2023-11-18 15:12:54,221 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.970e+01 9.511e+01 1.065e+02 1.198e+02 1.619e+02, threshold=2.129e+02, percent-clipped=0.0 2023-11-18 15:12:58,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=284853.3333333333, ans=0.125 2023-11-18 15:12:59,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=284853.3333333333, ans=0.2 2023-11-18 15:13:00,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=284853.3333333333, ans=0.125 2023-11-18 15:13:07,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=284920.0, ans=0.125 2023-11-18 15:13:08,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=284920.0, ans=0.125 2023-11-18 15:13:17,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=284986.6666666667, ans=0.0 2023-11-18 15:13:36,278 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6700, loss[loss=0.1118, simple_loss=0.1266, pruned_loss=0.03512, audio_tagging_loss=0.01336, over 14928.00 frames. ], tot_loss[loss=0.1079, simple_loss=0.1214, pruned_loss=0.03564, audio_tagging_loss=0.01151, over 3046741.65 frames. ], batch size: 56, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:13:39,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.64 vs. limit=15.0 2023-11-18 15:13:44,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=285120.0, ans=0.125 2023-11-18 15:13:53,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=285186.6666666667, ans=0.0 2023-11-18 15:13:53,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.29 vs. limit=15.0 2023-11-18 15:13:59,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=285253.3333333333, ans=0.05 2023-11-18 15:14:04,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=285253.3333333333, ans=0.0 2023-11-18 15:14:09,554 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:14:32,997 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6750, loss[loss=0.1244, simple_loss=0.1419, pruned_loss=0.04409, audio_tagging_loss=0.009342, over 14787.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.122, pruned_loss=0.03585, audio_tagging_loss=0.0115, over 3038526.28 frames. ], batch size: 55, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:14:37,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=285453.3333333333, ans=0.2 2023-11-18 15:14:40,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=285453.3333333333, ans=0.125 2023-11-18 15:14:45,690 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.590e+01 9.541e+01 1.044e+02 1.172e+02 1.686e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 15:14:46,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=285520.0, ans=0.2 2023-11-18 15:14:57,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.16 vs. limit=12.0 2023-11-18 15:15:00,483 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.84 vs. limit=15.0 2023-11-18 15:15:02,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=285586.6666666667, ans=0.0 2023-11-18 15:15:13,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=285653.3333333333, ans=0.125 2023-11-18 15:15:28,138 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6800, loss[loss=0.102, simple_loss=0.1098, pruned_loss=0.03628, audio_tagging_loss=0.01077, over 15555.00 frames. ], tot_loss[loss=0.1088, simple_loss=0.1226, pruned_loss=0.036, audio_tagging_loss=0.0115, over 3042412.68 frames. ], batch size: 61, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:15:38,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=285853.3333333333, ans=0.1 2023-11-18 15:15:52,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=285920.0, ans=0.125 2023-11-18 15:15:55,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=285920.0, ans=0.125 2023-11-18 15:15:55,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=285920.0, ans=0.0 2023-11-18 15:15:57,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=285920.0, ans=0.125 2023-11-18 15:16:03,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=285986.6666666667, ans=0.1 2023-11-18 15:16:04,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=285986.6666666667, ans=0.0 2023-11-18 15:16:22,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=286120.0, ans=0.2 2023-11-18 15:16:23,776 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6850, loss[loss=0.08513, simple_loss=0.09257, pruned_loss=0.0253, audio_tagging_loss=0.01355, over 16072.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1209, pruned_loss=0.03547, audio_tagging_loss=0.0115, over 3035141.15 frames. ], batch size: 62, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:16:28,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=286120.0, ans=0.0 2023-11-18 15:16:37,993 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.225e+01 9.571e+01 1.055e+02 1.193e+02 1.601e+02, threshold=2.111e+02, percent-clipped=0.0 2023-11-18 15:16:53,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=286253.3333333333, ans=0.0 2023-11-18 15:16:56,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=286320.0, ans=0.2 2023-11-18 15:17:20,142 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6900, loss[loss=0.1019, simple_loss=0.1147, pruned_loss=0.03173, audio_tagging_loss=0.01279, over 14690.00 frames. ], tot_loss[loss=0.1065, simple_loss=0.12, pruned_loss=0.035, audio_tagging_loss=0.01153, over 3038410.97 frames. ], batch size: 55, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:17:25,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=286453.3333333333, ans=0.1 2023-11-18 15:17:38,375 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:18:04,995 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:18:12,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=286720.0, ans=0.1 2023-11-18 15:18:15,637 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 6950, loss[loss=0.1296, simple_loss=0.1447, pruned_loss=0.04789, audio_tagging_loss=0.009395, over 15338.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.1204, pruned_loss=0.03516, audio_tagging_loss=0.01161, over 3043445.71 frames. ], batch size: 57, lr: 1.63e-02, grad_scale: 32.0 2023-11-18 15:18:28,744 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.933e+01 9.398e+01 1.033e+02 1.158e+02 1.660e+02, threshold=2.066e+02, percent-clipped=0.0 2023-11-18 15:18:29,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=286853.3333333333, ans=0.0 2023-11-18 15:18:49,911 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.80 vs. limit=15.0 2023-11-18 15:19:04,684 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.34 vs. limit=12.0 2023-11-18 15:19:10,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=287120.0, ans=0.125 2023-11-18 15:19:11,044 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7000, loss[loss=0.1457, simple_loss=0.1516, pruned_loss=0.05714, audio_tagging_loss=0.01273, over 16164.00 frames. ], tot_loss[loss=0.1062, simple_loss=0.1194, pruned_loss=0.03483, audio_tagging_loss=0.01169, over 3043321.17 frames. ], batch size: 57, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:19:11,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=287120.0, ans=0.07 2023-11-18 15:19:14,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=287120.0, ans=0.125 2023-11-18 15:19:36,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=287253.3333333333, ans=0.0 2023-11-18 15:19:40,691 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.63 vs. limit=15.0 2023-11-18 15:20:03,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=287386.6666666667, ans=0.125 2023-11-18 15:20:04,792 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.20 vs. limit=12.0 2023-11-18 15:20:07,124 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7050, loss[loss=0.131, simple_loss=0.1424, pruned_loss=0.04804, audio_tagging_loss=0.01175, over 13870.00 frames. ], tot_loss[loss=0.1073, simple_loss=0.1202, pruned_loss=0.03535, audio_tagging_loss=0.01181, over 3039812.97 frames. ], batch size: 53, lr: 1.62e-02, grad_scale: 64.0 2023-11-18 15:20:20,228 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.175e+01 9.557e+01 1.044e+02 1.189e+02 1.971e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 15:21:02,537 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7100, loss[loss=0.1422, simple_loss=0.1732, pruned_loss=0.04949, audio_tagging_loss=0.006094, over 16537.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.12, pruned_loss=0.03523, audio_tagging_loss=0.01194, over 3040636.30 frames. ], batch size: 59, lr: 1.62e-02, grad_scale: 64.0 2023-11-18 15:21:14,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=287853.3333333333, ans=0.125 2023-11-18 15:21:20,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=287853.3333333333, ans=0.1 2023-11-18 15:21:21,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=287853.3333333333, ans=0.125 2023-11-18 15:21:27,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=287920.0, ans=0.0 2023-11-18 15:21:35,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=287986.6666666667, ans=0.0 2023-11-18 15:21:42,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=287986.6666666667, ans=0.2 2023-11-18 15:21:44,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=287986.6666666667, ans=0.125 2023-11-18 15:21:54,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=288053.3333333333, ans=0.0 2023-11-18 15:21:55,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=288053.3333333333, ans=0.125 2023-11-18 15:21:56,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.63 vs. limit=15.0 2023-11-18 15:21:56,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=288120.0, ans=0.1 2023-11-18 15:21:58,421 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7150, loss[loss=0.07813, simple_loss=0.0788, pruned_loss=0.02617, audio_tagging_loss=0.01256, over 16320.00 frames. ], tot_loss[loss=0.1068, simple_loss=0.1194, pruned_loss=0.03511, audio_tagging_loss=0.01198, over 3048672.24 frames. ], batch size: 63, lr: 1.62e-02, grad_scale: 64.0 2023-11-18 15:22:12,084 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.925e+01 9.651e+01 1.094e+02 1.204e+02 1.585e+02, threshold=2.188e+02, percent-clipped=0.0 2023-11-18 15:22:26,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=20.16 vs. limit=22.5 2023-11-18 15:22:27,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=288253.3333333333, ans=0.0 2023-11-18 15:22:35,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=288320.0, ans=0.0 2023-11-18 15:22:54,521 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7200, loss[loss=0.1133, simple_loss=0.1154, pruned_loss=0.04448, audio_tagging_loss=0.01109, over 14682.00 frames. ], tot_loss[loss=0.1064, simple_loss=0.1189, pruned_loss=0.035, audio_tagging_loss=0.01198, over 3047527.56 frames. ], batch size: 56, lr: 1.62e-02, grad_scale: 64.0 2023-11-18 15:23:16,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=288586.6666666667, ans=0.125 2023-11-18 15:23:16,932 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.05 vs. limit=15.0 2023-11-18 15:23:18,612 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:23:20,190 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.71 vs. limit=8.0 2023-11-18 15:23:31,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=288653.3333333333, ans=0.0 2023-11-18 15:23:35,307 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.22 vs. limit=15.0 2023-11-18 15:23:41,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=288720.0, ans=0.0 2023-11-18 15:23:44,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.47 vs. limit=15.0 2023-11-18 15:23:48,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=288786.6666666667, ans=0.125 2023-11-18 15:23:49,858 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7250, loss[loss=0.09858, simple_loss=0.09704, pruned_loss=0.03883, audio_tagging_loss=0.01123, over 14481.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.1201, pruned_loss=0.03547, audio_tagging_loss=0.01205, over 3049804.41 frames. ], batch size: 56, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:24:03,639 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 9.776e+01 1.072e+02 1.209e+02 1.575e+02, threshold=2.144e+02, percent-clipped=0.0 2023-11-18 15:24:13,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=288920.0, ans=0.1 2023-11-18 15:24:22,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=288986.6666666667, ans=0.1 2023-11-18 15:24:44,975 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7300, loss[loss=0.09746, simple_loss=0.1135, pruned_loss=0.03028, audio_tagging_loss=0.01041, over 15390.00 frames. ], tot_loss[loss=0.1068, simple_loss=0.1195, pruned_loss=0.03511, audio_tagging_loss=0.012, over 3047648.60 frames. ], batch size: 59, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:24:46,718 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.98 vs. limit=10.0 2023-11-18 15:25:25,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=289320.0, ans=0.0 2023-11-18 15:25:38,921 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:25:40,812 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7350, loss[loss=0.08762, simple_loss=0.1005, pruned_loss=0.02618, audio_tagging_loss=0.0112, over 16553.00 frames. ], tot_loss[loss=0.1067, simple_loss=0.1192, pruned_loss=0.03533, audio_tagging_loss=0.01183, over 3041587.64 frames. ], batch size: 62, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:25:50,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=289520.0, ans=10.0 2023-11-18 15:25:54,546 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.880e+01 9.633e+01 1.075e+02 1.263e+02 1.928e+02, threshold=2.150e+02, percent-clipped=0.0 2023-11-18 15:26:25,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=289720.0, ans=0.125 2023-11-18 15:26:26,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=289720.0, ans=0.125 2023-11-18 15:26:32,766 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.92 vs. limit=22.5 2023-11-18 15:26:35,457 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7400, loss[loss=0.1185, simple_loss=0.1345, pruned_loss=0.03795, audio_tagging_loss=0.01334, over 15632.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1186, pruned_loss=0.03502, audio_tagging_loss=0.01178, over 3033656.89 frames. ], batch size: 58, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:26:35,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=289786.6666666667, ans=0.125 2023-11-18 15:27:04,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=289920.0, ans=0.125 2023-11-18 15:27:12,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=289986.6666666667, ans=0.2 2023-11-18 15:27:27,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=290053.3333333333, ans=0.125 2023-11-18 15:27:30,963 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7450, loss[loss=0.1275, simple_loss=0.1471, pruned_loss=0.04537, audio_tagging_loss=0.008613, over 16044.00 frames. ], tot_loss[loss=0.1065, simple_loss=0.1194, pruned_loss=0.03518, audio_tagging_loss=0.01163, over 3038267.62 frames. ], batch size: 61, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:27:37,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=290120.0, ans=0.2 2023-11-18 15:27:46,292 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.678e+01 9.437e+01 1.026e+02 1.201e+02 2.000e+02, threshold=2.053e+02, percent-clipped=0.0 2023-11-18 15:27:47,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=290186.6666666667, ans=0.125 2023-11-18 15:27:58,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=290253.3333333333, ans=0.0 2023-11-18 15:28:20,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=290386.6666666667, ans=0.125 2023-11-18 15:28:22,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=290386.6666666667, ans=15.0 2023-11-18 15:28:27,309 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7500, loss[loss=0.1395, simple_loss=0.1514, pruned_loss=0.05335, audio_tagging_loss=0.01045, over 15070.00 frames. ], tot_loss[loss=0.107, simple_loss=0.1201, pruned_loss=0.03539, audio_tagging_loss=0.0116, over 3045792.21 frames. ], batch size: 54, lr: 1.62e-02, grad_scale: 32.0 2023-11-18 15:28:48,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=290586.6666666667, ans=0.125 2023-11-18 15:28:51,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=290586.6666666667, ans=0.1 2023-11-18 15:28:53,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=290586.6666666667, ans=0.0 2023-11-18 15:29:08,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=290653.3333333333, ans=0.0 2023-11-18 15:29:22,436 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7550, loss[loss=0.1391, simple_loss=0.1622, pruned_loss=0.04827, audio_tagging_loss=0.00975, over 15570.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1198, pruned_loss=0.0355, audio_tagging_loss=0.01168, over 3047109.73 frames. ], batch size: 55, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:29:33,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=290853.3333333333, ans=0.125 2023-11-18 15:29:36,108 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.781e+01 9.490e+01 1.043e+02 1.208e+02 1.931e+02, threshold=2.087e+02, percent-clipped=0.0 2023-11-18 15:29:38,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=290853.3333333333, ans=0.125 2023-11-18 15:29:47,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=290920.0, ans=0.125 2023-11-18 15:29:49,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=290920.0, ans=0.125 2023-11-18 15:29:54,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=290920.0, ans=0.125 2023-11-18 15:29:55,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=290986.6666666667, ans=0.035 2023-11-18 15:30:00,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=290986.6666666667, ans=0.0 2023-11-18 15:30:04,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=290986.6666666667, ans=0.1 2023-11-18 15:30:10,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=291053.3333333333, ans=0.125 2023-11-18 15:30:15,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=291053.3333333333, ans=0.125 2023-11-18 15:30:15,806 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.16 vs. limit=15.0 2023-11-18 15:30:17,207 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7600, loss[loss=0.0981, simple_loss=0.1096, pruned_loss=0.03162, audio_tagging_loss=0.0117, over 16103.00 frames. ], tot_loss[loss=0.1073, simple_loss=0.1201, pruned_loss=0.03564, audio_tagging_loss=0.01162, over 3051201.61 frames. ], batch size: 58, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:30:27,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=291120.0, ans=0.0 2023-11-18 15:30:27,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=291186.6666666667, ans=0.1 2023-11-18 15:30:28,404 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.99 vs. limit=6.0 2023-11-18 15:30:44,176 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2023-11-18 15:31:02,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=291386.6666666667, ans=0.04949747468305833 2023-11-18 15:31:08,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.77 vs. limit=15.0 2023-11-18 15:31:11,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=291453.3333333333, ans=0.125 2023-11-18 15:31:11,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=291453.3333333333, ans=0.0 2023-11-18 15:31:13,066 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7650, loss[loss=0.1077, simple_loss=0.1356, pruned_loss=0.03239, audio_tagging_loss=0.007553, over 15807.00 frames. ], tot_loss[loss=0.1068, simple_loss=0.1198, pruned_loss=0.0353, audio_tagging_loss=0.01158, over 3048774.46 frames. ], batch size: 58, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:31:27,117 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 9.408e+01 1.037e+02 1.133e+02 1.442e+02, threshold=2.074e+02, percent-clipped=0.0 2023-11-18 15:31:40,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.31 vs. limit=12.0 2023-11-18 15:31:57,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=291720.0, ans=0.0 2023-11-18 15:32:05,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=291720.0, ans=0.125 2023-11-18 15:32:07,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=291786.6666666667, ans=0.0 2023-11-18 15:32:08,460 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7700, loss[loss=0.1456, simple_loss=0.1642, pruned_loss=0.05428, audio_tagging_loss=0.009258, over 15798.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.12, pruned_loss=0.03522, audio_tagging_loss=0.01167, over 3051994.95 frames. ], batch size: 56, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:32:10,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=291786.6666666667, ans=0.1 2023-11-18 15:32:20,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=291853.3333333333, ans=0.125 2023-11-18 15:32:25,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=291853.3333333333, ans=0.0 2023-11-18 15:32:26,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=291853.3333333333, ans=0.125 2023-11-18 15:32:31,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=291920.0, ans=0.04949747468305833 2023-11-18 15:32:47,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=291986.6666666667, ans=0.1 2023-11-18 15:32:47,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=291986.6666666667, ans=0.125 2023-11-18 15:32:51,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=291986.6666666667, ans=0.125 2023-11-18 15:32:56,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=292053.3333333333, ans=0.0 2023-11-18 15:33:03,729 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7750, loss[loss=0.09541, simple_loss=0.1188, pruned_loss=0.02394, audio_tagging_loss=0.01207, over 16354.00 frames. ], tot_loss[loss=0.1068, simple_loss=0.12, pruned_loss=0.03508, audio_tagging_loss=0.01174, over 3051681.03 frames. ], batch size: 59, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:33:11,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=292120.0, ans=0.1 2023-11-18 15:33:18,449 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.257e+01 9.507e+01 1.083e+02 1.273e+02 2.415e+02, threshold=2.165e+02, percent-clipped=1.0 2023-11-18 15:33:21,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=292186.6666666667, ans=0.035 2023-11-18 15:33:21,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=292186.6666666667, ans=0.2 2023-11-18 15:33:34,489 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:33:52,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=292386.6666666667, ans=0.125 2023-11-18 15:33:59,637 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7800, loss[loss=0.08532, simple_loss=0.09969, pruned_loss=0.02423, audio_tagging_loss=0.01124, over 16000.00 frames. ], tot_loss[loss=0.1065, simple_loss=0.1197, pruned_loss=0.03491, audio_tagging_loss=0.01176, over 3055859.02 frames. ], batch size: 59, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:34:14,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=292520.0, ans=0.125 2023-11-18 15:34:18,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=292520.0, ans=0.125 2023-11-18 15:34:27,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=292586.6666666667, ans=0.125 2023-11-18 15:34:29,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=292586.6666666667, ans=0.125 2023-11-18 15:34:49,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=292720.0, ans=0.2 2023-11-18 15:34:51,813 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.15 vs. limit=22.5 2023-11-18 15:34:55,498 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7850, loss[loss=0.1218, simple_loss=0.133, pruned_loss=0.04494, audio_tagging_loss=0.01036, over 16098.00 frames. ], tot_loss[loss=0.1073, simple_loss=0.1205, pruned_loss=0.03518, audio_tagging_loss=0.01184, over 3059450.49 frames. ], batch size: 62, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:35:07,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=292853.3333333333, ans=0.125 2023-11-18 15:35:09,098 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.474e+01 9.851e+01 1.052e+02 1.175e+02 1.725e+02, threshold=2.105e+02, percent-clipped=0.0 2023-11-18 15:35:24,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=292920.0, ans=0.125 2023-11-18 15:35:29,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=292986.6666666667, ans=0.125 2023-11-18 15:35:33,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=292986.6666666667, ans=0.0 2023-11-18 15:35:35,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=292986.6666666667, ans=0.2 2023-11-18 15:35:50,172 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7900, loss[loss=0.1132, simple_loss=0.1374, pruned_loss=0.03545, audio_tagging_loss=0.009009, over 14388.00 frames. ], tot_loss[loss=0.1072, simple_loss=0.1204, pruned_loss=0.03502, audio_tagging_loss=0.01199, over 3056905.86 frames. ], batch size: 53, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:36:19,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=293253.3333333333, ans=0.0 2023-11-18 15:36:25,001 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-44000.pt 2023-11-18 15:36:45,058 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.03 vs. limit=15.0 2023-11-18 15:36:47,576 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 7950, loss[loss=0.1372, simple_loss=0.1527, pruned_loss=0.04973, audio_tagging_loss=0.01115, over 14856.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1186, pruned_loss=0.0346, audio_tagging_loss=0.01217, over 3050325.81 frames. ], batch size: 54, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:36:50,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=293453.3333333333, ans=0.125 2023-11-18 15:36:56,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=293453.3333333333, ans=0.0 2023-11-18 15:37:02,817 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.429e+01 9.694e+01 1.093e+02 1.229e+02 1.791e+02, threshold=2.186e+02, percent-clipped=0.0 2023-11-18 15:37:02,879 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:37:25,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=293653.3333333333, ans=0.09899494936611666 2023-11-18 15:37:27,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=293653.3333333333, ans=0.125 2023-11-18 15:37:28,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=293653.3333333333, ans=15.0 2023-11-18 15:37:43,992 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8000, loss[loss=0.08963, simple_loss=0.102, pruned_loss=0.02628, audio_tagging_loss=0.01234, over 15209.00 frames. ], tot_loss[loss=0.1064, simple_loss=0.119, pruned_loss=0.03475, audio_tagging_loss=0.01219, over 3050374.08 frames. ], batch size: 55, lr: 1.61e-02, grad_scale: 32.0 2023-11-18 15:37:51,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.11 vs. limit=22.5 2023-11-18 15:37:55,146 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.47 vs. limit=15.0 2023-11-18 15:38:02,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=293853.3333333333, ans=0.2 2023-11-18 15:38:17,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=293986.6666666667, ans=0.1 2023-11-18 15:38:25,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-11-18 15:38:38,564 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8050, loss[loss=0.1137, simple_loss=0.1258, pruned_loss=0.03915, audio_tagging_loss=0.0117, over 14209.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1198, pruned_loss=0.0353, audio_tagging_loss=0.01225, over 3040845.19 frames. ], batch size: 55, lr: 1.61e-02, grad_scale: 16.0 2023-11-18 15:38:40,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.38 vs. limit=15.0 2023-11-18 15:38:47,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=294120.0, ans=0.2 2023-11-18 15:38:53,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=294186.6666666667, ans=0.0 2023-11-18 15:38:53,847 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.548e+01 1.018e+02 1.096e+02 1.204e+02 1.820e+02, threshold=2.193e+02, percent-clipped=0.0 2023-11-18 15:39:03,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=294253.3333333333, ans=0.1 2023-11-18 15:39:06,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=294253.3333333333, ans=0.0 2023-11-18 15:39:06,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=294253.3333333333, ans=0.0 2023-11-18 15:39:16,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=294320.0, ans=0.0 2023-11-18 15:39:25,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=294386.6666666667, ans=0.125 2023-11-18 15:39:25,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=294386.6666666667, ans=0.0 2023-11-18 15:39:33,371 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8100, loss[loss=0.1353, simple_loss=0.1475, pruned_loss=0.04455, audio_tagging_loss=0.01705, over 16066.00 frames. ], tot_loss[loss=0.1075, simple_loss=0.12, pruned_loss=0.03542, audio_tagging_loss=0.01212, over 3041763.87 frames. ], batch size: 58, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:39:34,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=294453.3333333333, ans=0.05 2023-11-18 15:39:43,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=294453.3333333333, ans=0.1 2023-11-18 15:39:52,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=294520.0, ans=0.0 2023-11-18 15:40:11,752 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2023-11-18 15:40:29,790 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8150, loss[loss=0.1372, simple_loss=0.1652, pruned_loss=0.04747, audio_tagging_loss=0.007075, over 16705.00 frames. ], tot_loss[loss=0.1079, simple_loss=0.1208, pruned_loss=0.03564, audio_tagging_loss=0.01183, over 3043251.55 frames. ], batch size: 60, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:40:35,650 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.85 vs. limit=15.0 2023-11-18 15:40:39,740 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.49 vs. limit=15.0 2023-11-18 15:40:44,587 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.218e+01 9.329e+01 1.045e+02 1.150e+02 1.655e+02, threshold=2.090e+02, percent-clipped=0.0 2023-11-18 15:40:49,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=294853.3333333333, ans=0.125 2023-11-18 15:40:54,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=294920.0, ans=0.0 2023-11-18 15:41:01,551 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.71 vs. limit=22.5 2023-11-18 15:41:24,203 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8200, loss[loss=0.09648, simple_loss=0.1084, pruned_loss=0.03265, audio_tagging_loss=0.009622, over 15867.00 frames. ], tot_loss[loss=0.1075, simple_loss=0.1204, pruned_loss=0.0355, audio_tagging_loss=0.01176, over 3036517.84 frames. ], batch size: 60, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:41:26,334 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:41:31,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=295120.0, ans=0.1 2023-11-18 15:41:34,215 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2023-11-18 15:41:40,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=295186.6666666667, ans=0.125 2023-11-18 15:41:46,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=295253.3333333333, ans=0.0 2023-11-18 15:41:48,673 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.89 vs. limit=15.0 2023-11-18 15:41:56,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=295253.3333333333, ans=0.125 2023-11-18 15:41:59,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=295320.0, ans=10.0 2023-11-18 15:42:19,560 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8250, loss[loss=0.1206, simple_loss=0.1394, pruned_loss=0.04145, audio_tagging_loss=0.0094, over 15082.00 frames. ], tot_loss[loss=0.1079, simple_loss=0.1211, pruned_loss=0.03571, audio_tagging_loss=0.01161, over 3040038.69 frames. ], batch size: 55, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:42:34,806 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.584e+01 9.274e+01 1.030e+02 1.127e+02 2.119e+02, threshold=2.060e+02, percent-clipped=1.0 2023-11-18 15:42:40,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.35 vs. limit=22.5 2023-11-18 15:42:45,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=295586.6666666667, ans=0.0 2023-11-18 15:42:48,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=295586.6666666667, ans=0.0 2023-11-18 15:42:49,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=295586.6666666667, ans=0.1 2023-11-18 15:43:15,126 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8300, loss[loss=0.1018, simple_loss=0.1198, pruned_loss=0.03215, audio_tagging_loss=0.00978, over 15702.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.1213, pruned_loss=0.03591, audio_tagging_loss=0.01164, over 3047279.46 frames. ], batch size: 58, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:43:15,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=295786.6666666667, ans=0.0 2023-11-18 15:43:25,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=295853.3333333333, ans=0.0 2023-11-18 15:43:26,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=295853.3333333333, ans=0.1 2023-11-18 15:43:31,328 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.96 vs. limit=15.0 2023-11-18 15:43:34,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=295853.3333333333, ans=0.125 2023-11-18 15:43:36,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=295920.0, ans=0.1 2023-11-18 15:43:47,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=295986.6666666667, ans=0.1 2023-11-18 15:43:50,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.28 vs. limit=15.0 2023-11-18 15:44:11,154 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8350, loss[loss=0.06702, simple_loss=0.07056, pruned_loss=0.0175, audio_tagging_loss=0.01424, over 15055.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.1208, pruned_loss=0.03581, audio_tagging_loss=0.01164, over 3050480.13 frames. ], batch size: 59, lr: 1.60e-02, grad_scale: 16.0 2023-11-18 15:44:12,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=296120.0, ans=0.125 2023-11-18 15:44:26,493 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.804e+01 9.554e+01 1.077e+02 1.196e+02 1.483e+02, threshold=2.155e+02, percent-clipped=0.0 2023-11-18 15:44:43,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=296320.0, ans=0.0 2023-11-18 15:44:54,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=296386.6666666667, ans=0.125 2023-11-18 15:44:55,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=296386.6666666667, ans=0.2 2023-11-18 15:44:57,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=296386.6666666667, ans=0.125 2023-11-18 15:45:06,081 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8400, loss[loss=0.101, simple_loss=0.1077, pruned_loss=0.03507, audio_tagging_loss=0.0121, over 15255.00 frames. ], tot_loss[loss=0.1073, simple_loss=0.1201, pruned_loss=0.0356, audio_tagging_loss=0.01167, over 3045157.34 frames. ], batch size: 58, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:45:22,064 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2023-11-18 15:45:28,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=296586.6666666667, ans=0.125 2023-11-18 15:45:48,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=296653.3333333333, ans=0.0 2023-11-18 15:45:54,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=296720.0, ans=0.0 2023-11-18 15:46:02,605 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8450, loss[loss=0.1481, simple_loss=0.1703, pruned_loss=0.05642, audio_tagging_loss=0.006542, over 16869.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.1208, pruned_loss=0.03609, audio_tagging_loss=0.0117, over 3043606.43 frames. ], batch size: 60, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:46:16,348 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.59 vs. limit=15.0 2023-11-18 15:46:17,856 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.027e+01 9.338e+01 1.042e+02 1.138e+02 1.608e+02, threshold=2.084e+02, percent-clipped=0.0 2023-11-18 15:46:24,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=296920.0, ans=0.125 2023-11-18 15:46:30,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=296920.0, ans=0.0 2023-11-18 15:46:36,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=296986.6666666667, ans=0.1 2023-11-18 15:46:57,412 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8500, loss[loss=0.09483, simple_loss=0.1031, pruned_loss=0.0295, audio_tagging_loss=0.01378, over 15342.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.1213, pruned_loss=0.03593, audio_tagging_loss=0.01167, over 3042480.97 frames. ], batch size: 57, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:47:01,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=297120.0, ans=0.125 2023-11-18 15:47:06,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=297120.0, ans=0.2 2023-11-18 15:47:16,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=297186.6666666667, ans=0.0 2023-11-18 15:47:32,990 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:47:38,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=297320.0, ans=0.07 2023-11-18 15:47:53,008 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8550, loss[loss=0.08526, simple_loss=0.1008, pruned_loss=0.0244, audio_tagging_loss=0.01044, over 14792.00 frames. ], tot_loss[loss=0.1089, simple_loss=0.1223, pruned_loss=0.03608, audio_tagging_loss=0.01169, over 3043239.91 frames. ], batch size: 56, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:48:02,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=297453.3333333333, ans=0.125 2023-11-18 15:48:09,326 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.282e+01 9.983e+01 1.095e+02 1.210e+02 1.627e+02, threshold=2.189e+02, percent-clipped=0.0 2023-11-18 15:48:14,940 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:48:17,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=297586.6666666667, ans=0.0 2023-11-18 15:48:29,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=297653.3333333333, ans=0.125 2023-11-18 15:48:40,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=297720.0, ans=0.125 2023-11-18 15:48:49,342 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8600, loss[loss=0.09058, simple_loss=0.09868, pruned_loss=0.02995, audio_tagging_loss=0.01129, over 15773.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.1205, pruned_loss=0.0356, audio_tagging_loss=0.01182, over 3043563.04 frames. ], batch size: 58, lr: 1.60e-02, grad_scale: 32.0 2023-11-18 15:49:29,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=297986.6666666667, ans=0.2 2023-11-18 15:49:38,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=298053.3333333333, ans=0.1 2023-11-18 15:49:43,367 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8650, loss[loss=0.07372, simple_loss=0.07443, pruned_loss=0.02181, audio_tagging_loss=0.01469, over 15836.00 frames. ], tot_loss[loss=0.1065, simple_loss=0.1194, pruned_loss=0.03487, audio_tagging_loss=0.01197, over 3042601.19 frames. ], batch size: 62, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:49:58,604 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.945e+01 9.623e+01 1.078e+02 1.210e+02 1.696e+02, threshold=2.155e+02, percent-clipped=0.0 2023-11-18 15:50:11,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=298253.3333333333, ans=0.0 2023-11-18 15:50:22,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=298320.0, ans=0.0 2023-11-18 15:50:35,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=298386.6666666667, ans=0.0 2023-11-18 15:50:38,378 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8700, loss[loss=0.1021, simple_loss=0.1122, pruned_loss=0.03325, audio_tagging_loss=0.01274, over 14395.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.1205, pruned_loss=0.03539, audio_tagging_loss=0.01198, over 3042892.67 frames. ], batch size: 54, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:50:41,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=298453.3333333333, ans=0.125 2023-11-18 15:50:42,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=298453.3333333333, ans=0.0 2023-11-18 15:51:04,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=298586.6666666667, ans=0.125 2023-11-18 15:51:08,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=298586.6666666667, ans=0.07 2023-11-18 15:51:18,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=298653.3333333333, ans=0.2 2023-11-18 15:51:22,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=298720.0, ans=0.2 2023-11-18 15:51:33,494 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8750, loss[loss=0.1186, simple_loss=0.1374, pruned_loss=0.03971, audio_tagging_loss=0.01021, over 15634.00 frames. ], tot_loss[loss=0.1083, simple_loss=0.1214, pruned_loss=0.0356, audio_tagging_loss=0.01197, over 3043784.59 frames. ], batch size: 57, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:51:48,792 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.159e+01 9.840e+01 1.091e+02 1.232e+02 1.815e+02, threshold=2.181e+02, percent-clipped=0.0 2023-11-18 15:52:28,372 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8800, loss[loss=0.1455, simple_loss=0.1741, pruned_loss=0.04856, audio_tagging_loss=0.009909, over 15699.00 frames. ], tot_loss[loss=0.1091, simple_loss=0.1226, pruned_loss=0.03587, audio_tagging_loss=0.01191, over 3043527.95 frames. ], batch size: 55, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:52:33,973 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.83 vs. limit=15.0 2023-11-18 15:52:35,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=299120.0, ans=0.0 2023-11-18 15:52:37,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=299120.0, ans=0.0 2023-11-18 15:52:40,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=299186.6666666667, ans=0.125 2023-11-18 15:52:42,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=299186.6666666667, ans=0.1 2023-11-18 15:52:51,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=299253.3333333333, ans=0.0 2023-11-18 15:52:54,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=299253.3333333333, ans=0.09899494936611666 2023-11-18 15:52:55,233 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.95 vs. limit=15.0 2023-11-18 15:52:56,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=299253.3333333333, ans=0.125 2023-11-18 15:52:56,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=299253.3333333333, ans=0.1 2023-11-18 15:53:01,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=299320.0, ans=0.07 2023-11-18 15:53:08,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=299320.0, ans=0.1 2023-11-18 15:53:22,534 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8850, loss[loss=0.09991, simple_loss=0.1176, pruned_loss=0.02979, audio_tagging_loss=0.01134, over 14587.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.121, pruned_loss=0.03521, audio_tagging_loss=0.01189, over 3044254.83 frames. ], batch size: 55, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:53:27,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=299453.3333333333, ans=0.0 2023-11-18 15:53:31,209 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:53:35,237 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 15:53:38,346 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.457e+01 9.407e+01 1.047e+02 1.181e+02 1.757e+02, threshold=2.094e+02, percent-clipped=0.0 2023-11-18 15:53:42,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=299520.0, ans=0.125 2023-11-18 15:54:17,928 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8900, loss[loss=0.09378, simple_loss=0.1066, pruned_loss=0.0267, audio_tagging_loss=0.01379, over 14403.00 frames. ], tot_loss[loss=0.1086, simple_loss=0.1226, pruned_loss=0.03561, audio_tagging_loss=0.01167, over 3043144.92 frames. ], batch size: 56, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:54:28,668 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.30 vs. limit=10.0 2023-11-18 15:54:33,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=299853.3333333333, ans=0.0 2023-11-18 15:54:56,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=299986.6666666667, ans=0.125 2023-11-18 15:55:04,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=300053.3333333333, ans=0.5 2023-11-18 15:55:11,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=300120.0, ans=0.0 2023-11-18 15:55:12,595 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 8950, loss[loss=0.09496, simple_loss=0.1079, pruned_loss=0.03029, audio_tagging_loss=0.01072, over 14285.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.1218, pruned_loss=0.03544, audio_tagging_loss=0.01144, over 3040445.97 frames. ], batch size: 54, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 15:55:12,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=300120.0, ans=0.0 2023-11-18 15:55:27,259 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 9.233e+01 1.016e+02 1.150e+02 1.659e+02, threshold=2.033e+02, percent-clipped=0.0 2023-11-18 15:55:48,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=300320.0, ans=0.0 2023-11-18 15:55:54,940 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.18 vs. limit=12.0 2023-11-18 15:55:58,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=300386.6666666667, ans=0.125 2023-11-18 15:56:03,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=300386.6666666667, ans=0.2 2023-11-18 15:56:03,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=300386.6666666667, ans=0.125 2023-11-18 15:56:06,850 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9000, loss[loss=0.1308, simple_loss=0.1567, pruned_loss=0.04153, audio_tagging_loss=0.01096, over 15546.00 frames. ], tot_loss[loss=0.1075, simple_loss=0.1216, pruned_loss=0.03539, audio_tagging_loss=0.0113, over 3041346.24 frames. ], batch size: 56, lr: 1.59e-02, grad_scale: 16.0 2023-11-18 15:56:06,852 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 15:56:40,139 INFO [train_asr.py:1147] (0/4) Epoch 4, validation: loss=0.07668, simple_loss=0.06181, pruned_loss=0.009869, audio_tagging_loss=0.03591, over 4681554.00 frames. 2023-11-18 15:56:40,140 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 15:56:41,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=300453.3333333333, ans=0.125 2023-11-18 15:56:59,494 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=12.0 2023-11-18 15:57:03,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=300586.6666666667, ans=0.125 2023-11-18 15:57:15,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=300653.3333333333, ans=0.125 2023-11-18 15:57:22,097 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:57:31,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=300720.0, ans=0.125 2023-11-18 15:57:34,507 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9050, loss[loss=0.1161, simple_loss=0.1418, pruned_loss=0.03493, audio_tagging_loss=0.0102, over 15945.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.1221, pruned_loss=0.03572, audio_tagging_loss=0.01136, over 3037664.69 frames. ], batch size: 57, lr: 1.59e-02, grad_scale: 16.0 2023-11-18 15:57:50,191 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.867e+01 9.255e+01 1.039e+02 1.147e+02 2.056e+02, threshold=2.078e+02, percent-clipped=1.0 2023-11-18 15:57:58,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=300920.0, ans=0.1 2023-11-18 15:58:04,532 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.40 vs. limit=15.0 2023-11-18 15:58:09,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=300986.6666666667, ans=0.1 2023-11-18 15:58:28,438 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9100, loss[loss=0.1097, simple_loss=0.1292, pruned_loss=0.03642, audio_tagging_loss=0.008693, over 15287.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.1208, pruned_loss=0.03524, audio_tagging_loss=0.01127, over 3039719.80 frames. ], batch size: 61, lr: 1.59e-02, grad_scale: 16.0 2023-11-18 15:58:36,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=301120.0, ans=0.0 2023-11-18 15:58:37,630 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 15:58:38,986 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=12.0 2023-11-18 15:58:44,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=301186.6666666667, ans=0.1 2023-11-18 15:58:47,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=301186.6666666667, ans=0.125 2023-11-18 15:58:50,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=301253.3333333333, ans=0.125 2023-11-18 15:58:51,923 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.37 vs. limit=15.0 2023-11-18 15:59:00,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=301253.3333333333, ans=0.125 2023-11-18 15:59:06,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=301320.0, ans=0.0 2023-11-18 15:59:08,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=301320.0, ans=0.1 2023-11-18 15:59:15,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=301386.6666666667, ans=0.0 2023-11-18 15:59:21,287 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.79 vs. limit=15.0 2023-11-18 15:59:22,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=15.02 vs. limit=15.0 2023-11-18 15:59:23,958 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9150, loss[loss=0.1138, simple_loss=0.1286, pruned_loss=0.03615, audio_tagging_loss=0.01337, over 15633.00 frames. ], tot_loss[loss=0.1067, simple_loss=0.1206, pruned_loss=0.03507, audio_tagging_loss=0.01134, over 3035872.02 frames. ], batch size: 60, lr: 1.59e-02, grad_scale: 16.0 2023-11-18 15:59:30,619 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.23 vs. limit=15.0 2023-11-18 15:59:32,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.42 vs. limit=10.0 2023-11-18 15:59:39,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=301520.0, ans=0.125 2023-11-18 15:59:41,554 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.692e+01 9.042e+01 1.024e+02 1.134e+02 1.471e+02, threshold=2.048e+02, percent-clipped=0.0 2023-11-18 15:59:50,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=301586.6666666667, ans=0.1 2023-11-18 15:59:55,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=301586.6666666667, ans=0.0 2023-11-18 15:59:58,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=301653.3333333333, ans=0.125 2023-11-18 16:00:16,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=301720.0, ans=22.5 2023-11-18 16:00:17,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=301720.0, ans=0.125 2023-11-18 16:00:20,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=301786.6666666667, ans=0.025 2023-11-18 16:00:21,046 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9200, loss[loss=0.0981, simple_loss=0.1147, pruned_loss=0.02744, audio_tagging_loss=0.01329, over 14289.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1198, pruned_loss=0.03474, audio_tagging_loss=0.01145, over 3035932.88 frames. ], batch size: 55, lr: 1.59e-02, grad_scale: 32.0 2023-11-18 16:00:23,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=301786.6666666667, ans=0.07 2023-11-18 16:00:40,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=301853.3333333333, ans=0.125 2023-11-18 16:00:45,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=301920.0, ans=0.125 2023-11-18 16:01:16,225 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9250, loss[loss=0.1145, simple_loss=0.1431, pruned_loss=0.03314, audio_tagging_loss=0.009825, over 15514.00 frames. ], tot_loss[loss=0.1057, simple_loss=0.1193, pruned_loss=0.03457, audio_tagging_loss=0.01147, over 3044743.42 frames. ], batch size: 56, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:01:18,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=302120.0, ans=0.2 2023-11-18 16:01:30,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=302186.6666666667, ans=0.125 2023-11-18 16:01:33,126 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.552e+01 9.477e+01 1.067e+02 1.208e+02 1.657e+02, threshold=2.134e+02, percent-clipped=0.0 2023-11-18 16:01:56,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=302320.0, ans=0.125 2023-11-18 16:02:00,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=302386.6666666667, ans=0.125 2023-11-18 16:02:02,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=302386.6666666667, ans=0.05 2023-11-18 16:02:11,900 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9300, loss[loss=0.08336, simple_loss=0.09278, pruned_loss=0.02689, audio_tagging_loss=0.01008, over 16192.00 frames. ], tot_loss[loss=0.106, simple_loss=0.1194, pruned_loss=0.03479, audio_tagging_loss=0.01151, over 3049560.36 frames. ], batch size: 61, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:02:32,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=302520.0, ans=0.0 2023-11-18 16:02:32,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=302520.0, ans=0.125 2023-11-18 16:02:41,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=302586.6666666667, ans=0.125 2023-11-18 16:02:51,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=302653.3333333333, ans=0.1 2023-11-18 16:02:55,107 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=15.0 2023-11-18 16:03:02,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=302720.0, ans=0.0 2023-11-18 16:03:09,204 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9350, loss[loss=0.09317, simple_loss=0.106, pruned_loss=0.02734, audio_tagging_loss=0.01284, over 14714.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.1203, pruned_loss=0.03522, audio_tagging_loss=0.01155, over 3049418.07 frames. ], batch size: 58, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:03:12,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=302786.6666666667, ans=0.1 2023-11-18 16:03:18,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=302853.3333333333, ans=0.125 2023-11-18 16:03:24,974 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 9.020e+01 1.030e+02 1.167e+02 1.548e+02, threshold=2.059e+02, percent-clipped=0.0 2023-11-18 16:03:37,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=302920.0, ans=0.0 2023-11-18 16:03:44,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=302986.6666666667, ans=0.1 2023-11-18 16:03:51,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=302986.6666666667, ans=0.125 2023-11-18 16:03:56,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=303053.3333333333, ans=0.0 2023-11-18 16:03:57,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=303053.3333333333, ans=0.0 2023-11-18 16:04:03,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=303120.0, ans=0.2 2023-11-18 16:04:04,193 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9400, loss[loss=0.1146, simple_loss=0.1193, pruned_loss=0.04215, audio_tagging_loss=0.01274, over 13929.00 frames. ], tot_loss[loss=0.1064, simple_loss=0.1196, pruned_loss=0.03508, audio_tagging_loss=0.01159, over 3034415.64 frames. ], batch size: 56, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:05:00,034 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9450, loss[loss=0.1038, simple_loss=0.1156, pruned_loss=0.03596, audio_tagging_loss=0.01001, over 15342.00 frames. ], tot_loss[loss=0.1067, simple_loss=0.1196, pruned_loss=0.03521, audio_tagging_loss=0.01168, over 3034595.48 frames. ], batch size: 57, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:05:00,062 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:05:14,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=303520.0, ans=0.0 2023-11-18 16:05:17,121 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.847e+01 9.577e+01 1.061e+02 1.222e+02 1.461e+02, threshold=2.121e+02, percent-clipped=0.0 2023-11-18 16:05:28,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=303586.6666666667, ans=0.0 2023-11-18 16:05:33,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=303653.3333333333, ans=0.0 2023-11-18 16:05:34,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=303653.3333333333, ans=0.125 2023-11-18 16:05:37,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=303653.3333333333, ans=0.0 2023-11-18 16:05:39,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.74 vs. limit=22.5 2023-11-18 16:05:44,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn2.whiten.whitening_limit, batch_count=303720.0, ans=22.5 2023-11-18 16:05:56,432 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9500, loss[loss=0.09513, simple_loss=0.1147, pruned_loss=0.02744, audio_tagging_loss=0.01034, over 15206.00 frames. ], tot_loss[loss=0.1073, simple_loss=0.1204, pruned_loss=0.0353, audio_tagging_loss=0.01174, over 3038301.50 frames. ], batch size: 56, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:06:04,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=303786.6666666667, ans=0.0 2023-11-18 16:06:08,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=303853.3333333333, ans=0.125 2023-11-18 16:06:26,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=303920.0, ans=0.5 2023-11-18 16:06:52,134 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9550, loss[loss=0.05565, simple_loss=0.05407, pruned_loss=0.01418, audio_tagging_loss=0.01444, over 14251.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.1202, pruned_loss=0.03504, audio_tagging_loss=0.0118, over 3034983.68 frames. ], batch size: 54, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:06:53,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=304120.0, ans=0.0 2023-11-18 16:07:08,530 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.631e+01 9.602e+01 1.044e+02 1.160e+02 1.697e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 16:07:48,098 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9600, loss[loss=0.1065, simple_loss=0.1266, pruned_loss=0.03366, audio_tagging_loss=0.009522, over 16646.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1209, pruned_loss=0.03517, audio_tagging_loss=0.01175, over 3043629.27 frames. ], batch size: 63, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:07:50,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=304453.3333333333, ans=0.0 2023-11-18 16:08:07,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=304520.0, ans=0.0 2023-11-18 16:08:11,790 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.29 vs. limit=15.0 2023-11-18 16:08:31,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=304720.0, ans=0.125 2023-11-18 16:08:44,188 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9650, loss[loss=0.1102, simple_loss=0.1308, pruned_loss=0.03447, audio_tagging_loss=0.0103, over 15796.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1206, pruned_loss=0.03514, audio_tagging_loss=0.01166, over 3042200.87 frames. ], batch size: 59, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:08:46,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=304786.6666666667, ans=0.125 2023-11-18 16:09:00,538 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.907e+01 9.312e+01 1.013e+02 1.091e+02 1.612e+02, threshold=2.027e+02, percent-clipped=0.0 2023-11-18 16:09:00,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=304853.3333333333, ans=0.125 2023-11-18 16:09:12,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=304920.0, ans=0.1 2023-11-18 16:09:18,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=304986.6666666667, ans=0.125 2023-11-18 16:09:20,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=304986.6666666667, ans=0.2 2023-11-18 16:09:28,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=305053.3333333333, ans=0.5 2023-11-18 16:09:32,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=305053.3333333333, ans=0.125 2023-11-18 16:09:32,417 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.81 vs. limit=15.0 2023-11-18 16:09:32,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.48 vs. limit=15.0 2023-11-18 16:09:39,310 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9700, loss[loss=0.1043, simple_loss=0.1256, pruned_loss=0.0331, audio_tagging_loss=0.008441, over 15151.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.1213, pruned_loss=0.03552, audio_tagging_loss=0.0115, over 3039409.44 frames. ], batch size: 56, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:10:12,050 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.32 vs. limit=15.0 2023-11-18 16:10:19,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=305320.0, ans=0.125 2023-11-18 16:10:20,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=305320.0, ans=0.1 2023-11-18 16:10:29,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=305386.6666666667, ans=0.125 2023-11-18 16:10:29,727 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.63 vs. limit=15.0 2023-11-18 16:10:35,412 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9750, loss[loss=0.1263, simple_loss=0.149, pruned_loss=0.0417, audio_tagging_loss=0.01017, over 16226.00 frames. ], tot_loss[loss=0.1066, simple_loss=0.1202, pruned_loss=0.035, audio_tagging_loss=0.01147, over 3042129.67 frames. ], batch size: 58, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:10:35,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=305453.3333333333, ans=0.125 2023-11-18 16:10:47,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=305520.0, ans=0.0 2023-11-18 16:10:53,089 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 9.337e+01 1.028e+02 1.130e+02 1.491e+02, threshold=2.056e+02, percent-clipped=0.0 2023-11-18 16:11:11,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=305653.3333333333, ans=0.125 2023-11-18 16:11:17,723 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.62 vs. limit=15.0 2023-11-18 16:11:23,339 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.46 vs. limit=15.0 2023-11-18 16:11:30,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=305720.0, ans=10.0 2023-11-18 16:11:32,500 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9800, loss[loss=0.1143, simple_loss=0.1134, pruned_loss=0.0428, audio_tagging_loss=0.01477, over 16017.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.1213, pruned_loss=0.03548, audio_tagging_loss=0.01151, over 3046043.18 frames. ], batch size: 59, lr: 1.58e-02, grad_scale: 32.0 2023-11-18 16:11:58,120 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.40 vs. limit=15.0 2023-11-18 16:12:04,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=305986.6666666667, ans=0.125 2023-11-18 16:12:23,797 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:12:28,126 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9850, loss[loss=0.08356, simple_loss=0.09123, pruned_loss=0.02997, audio_tagging_loss=0.007983, over 15168.00 frames. ], tot_loss[loss=0.1074, simple_loss=0.1212, pruned_loss=0.03548, audio_tagging_loss=0.01133, over 3043210.06 frames. ], batch size: 60, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:12:42,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=306186.6666666667, ans=0.05 2023-11-18 16:12:44,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=306186.6666666667, ans=0.125 2023-11-18 16:12:44,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=306186.6666666667, ans=0.05 2023-11-18 16:12:45,060 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.089e+01 9.433e+01 1.029e+02 1.148e+02 1.487e+02, threshold=2.058e+02, percent-clipped=0.0 2023-11-18 16:12:50,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.23 vs. limit=15.0 2023-11-18 16:12:52,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=306253.3333333333, ans=10.0 2023-11-18 16:12:56,168 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.97 vs. limit=10.0 2023-11-18 16:12:59,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=306253.3333333333, ans=0.125 2023-11-18 16:13:06,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=306320.0, ans=0.125 2023-11-18 16:13:19,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=6.12 vs. limit=6.0 2023-11-18 16:13:23,949 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9900, loss[loss=0.1109, simple_loss=0.1346, pruned_loss=0.03209, audio_tagging_loss=0.01149, over 14404.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.1217, pruned_loss=0.03567, audio_tagging_loss=0.01132, over 3040329.60 frames. ], batch size: 53, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:13:24,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=306453.3333333333, ans=0.2 2023-11-18 16:13:39,900 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.11 vs. limit=15.0 2023-11-18 16:13:43,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=306520.0, ans=0.125 2023-11-18 16:13:47,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.48 vs. limit=10.0 2023-11-18 16:14:20,552 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 9950, loss[loss=0.1209, simple_loss=0.141, pruned_loss=0.04241, audio_tagging_loss=0.007961, over 15401.00 frames. ], tot_loss[loss=0.1075, simple_loss=0.1213, pruned_loss=0.03554, audio_tagging_loss=0.01129, over 3042350.05 frames. ], batch size: 56, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:14:31,434 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.757e-01 2023-11-18 16:14:32,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=306853.3333333333, ans=0.5 2023-11-18 16:14:34,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=306853.3333333333, ans=0.125 2023-11-18 16:14:36,439 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.909e+01 9.576e+01 1.088e+02 1.219e+02 1.506e+02, threshold=2.175e+02, percent-clipped=0.0 2023-11-18 16:15:02,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=306986.6666666667, ans=0.125 2023-11-18 16:15:07,845 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2023-11-18 16:15:15,745 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10000, loss[loss=0.1268, simple_loss=0.1457, pruned_loss=0.04126, audio_tagging_loss=0.01266, over 14625.00 frames. ], tot_loss[loss=0.1075, simple_loss=0.1212, pruned_loss=0.03555, audio_tagging_loss=0.01135, over 3041949.30 frames. ], batch size: 53, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:15:21,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=307120.0, ans=0.125 2023-11-18 16:15:24,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=307120.0, ans=0.125 2023-11-18 16:16:11,316 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10050, loss[loss=0.0858, simple_loss=0.09924, pruned_loss=0.02383, audio_tagging_loss=0.01236, over 14507.00 frames. ], tot_loss[loss=0.107, simple_loss=0.1206, pruned_loss=0.03528, audio_tagging_loss=0.01141, over 3037297.59 frames. ], batch size: 56, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:16:23,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=307520.0, ans=0.125 2023-11-18 16:16:29,455 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.409e+01 9.427e+01 1.040e+02 1.141e+02 1.376e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-18 16:16:32,160 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2023-11-18 16:17:04,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=307720.0, ans=0.1 2023-11-18 16:17:04,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=307720.0, ans=0.0 2023-11-18 16:17:08,298 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10100, loss[loss=0.1115, simple_loss=0.1266, pruned_loss=0.03722, audio_tagging_loss=0.01096, over 15697.00 frames. ], tot_loss[loss=0.1069, simple_loss=0.1206, pruned_loss=0.03519, audio_tagging_loss=0.01144, over 3047585.70 frames. ], batch size: 59, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:17:10,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=16.36 vs. limit=15.0 2023-11-18 16:17:12,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=307786.6666666667, ans=0.125 2023-11-18 16:17:19,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=307853.3333333333, ans=0.2 2023-11-18 16:17:30,737 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.42 vs. limit=15.0 2023-11-18 16:17:36,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=307920.0, ans=0.125 2023-11-18 16:17:53,790 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.90 vs. limit=10.0 2023-11-18 16:17:55,290 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:17:59,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_na.min_abs, batch_count=308053.3333333333, ans=0.02 2023-11-18 16:18:03,729 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10150, loss[loss=0.106, simple_loss=0.1195, pruned_loss=0.03361, audio_tagging_loss=0.01259, over 15512.00 frames. ], tot_loss[loss=0.1078, simple_loss=0.1215, pruned_loss=0.03548, audio_tagging_loss=0.01158, over 3047641.76 frames. ], batch size: 56, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:18:19,761 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 9.614e+01 1.045e+02 1.146e+02 1.690e+02, threshold=2.090e+02, percent-clipped=0.0 2023-11-18 16:18:30,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.87 vs. limit=15.0 2023-11-18 16:18:31,565 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:18:55,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=308386.6666666667, ans=0.1 2023-11-18 16:18:59,161 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10200, loss[loss=0.1232, simple_loss=0.1289, pruned_loss=0.04385, audio_tagging_loss=0.01495, over 16612.00 frames. ], tot_loss[loss=0.1088, simple_loss=0.1226, pruned_loss=0.03591, audio_tagging_loss=0.01158, over 3049214.70 frames. ], batch size: 63, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:19:21,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=308586.6666666667, ans=0.2 2023-11-18 16:19:22,791 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:19:45,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=308720.0, ans=0.125 2023-11-18 16:19:55,125 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10250, loss[loss=0.1049, simple_loss=0.1095, pruned_loss=0.03511, audio_tagging_loss=0.01505, over 16054.00 frames. ], tot_loss[loss=0.1092, simple_loss=0.1234, pruned_loss=0.03599, audio_tagging_loss=0.01154, over 3058120.76 frames. ], batch size: 61, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:20:12,687 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 9.476e+01 1.039e+02 1.199e+02 1.617e+02, threshold=2.078e+02, percent-clipped=0.0 2023-11-18 16:20:21,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.18 vs. limit=15.0 2023-11-18 16:20:25,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=308920.0, ans=0.125 2023-11-18 16:20:30,880 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.91 vs. limit=15.0 2023-11-18 16:20:34,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=308986.6666666667, ans=0.125 2023-11-18 16:20:42,671 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.88 vs. limit=10.0 2023-11-18 16:20:45,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=309053.3333333333, ans=10.0 2023-11-18 16:20:50,432 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.30 vs. limit=15.0 2023-11-18 16:20:51,940 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10300, loss[loss=0.07345, simple_loss=0.08033, pruned_loss=0.02081, audio_tagging_loss=0.01247, over 15521.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1223, pruned_loss=0.03561, audio_tagging_loss=0.01162, over 3053717.02 frames. ], batch size: 59, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:20:55,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=309120.0, ans=0.125 2023-11-18 16:20:57,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=309120.0, ans=0.0 2023-11-18 16:21:18,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=309253.3333333333, ans=0.125 2023-11-18 16:21:35,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=309320.0, ans=0.1 2023-11-18 16:21:38,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=309386.6666666667, ans=0.04949747468305833 2023-11-18 16:21:40,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=309386.6666666667, ans=0.2 2023-11-18 16:21:43,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=309386.6666666667, ans=0.125 2023-11-18 16:21:47,527 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10350, loss[loss=0.09273, simple_loss=0.1008, pruned_loss=0.02933, audio_tagging_loss=0.01299, over 15562.00 frames. ], tot_loss[loss=0.1089, simple_loss=0.1227, pruned_loss=0.03578, audio_tagging_loss=0.01177, over 3051545.41 frames. ], batch size: 59, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:21:52,212 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2023-11-18 16:22:04,391 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.642e+01 9.661e+01 1.063e+02 1.175e+02 1.992e+02, threshold=2.126e+02, percent-clipped=0.0 2023-11-18 16:22:09,574 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:22:34,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=309720.0, ans=0.125 2023-11-18 16:22:35,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=309720.0, ans=0.125 2023-11-18 16:22:38,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=309720.0, ans=0.2 2023-11-18 16:22:43,327 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10400, loss[loss=0.1252, simple_loss=0.1351, pruned_loss=0.04016, audio_tagging_loss=0.01749, over 14967.00 frames. ], tot_loss[loss=0.1087, simple_loss=0.1221, pruned_loss=0.03564, audio_tagging_loss=0.01199, over 3051138.87 frames. ], batch size: 55, lr: 1.57e-02, grad_scale: 32.0 2023-11-18 16:22:55,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=309853.3333333333, ans=0.1 2023-11-18 16:23:16,474 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.57 vs. limit=15.0 2023-11-18 16:23:19,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=309986.6666666667, ans=0.125 2023-11-18 16:23:34,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=310053.3333333333, ans=0.0 2023-11-18 16:23:34,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=310053.3333333333, ans=0.0 2023-11-18 16:23:39,768 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10450, loss[loss=0.1037, simple_loss=0.1137, pruned_loss=0.03526, audio_tagging_loss=0.01163, over 14549.00 frames. ], tot_loss[loss=0.1081, simple_loss=0.1214, pruned_loss=0.03553, audio_tagging_loss=0.01183, over 3054729.74 frames. ], batch size: 55, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:23:49,693 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.46 vs. limit=10.0 2023-11-18 16:23:56,252 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.612e+01 9.086e+01 9.811e+01 1.148e+02 1.710e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-18 16:23:58,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=310186.6666666667, ans=0.125 2023-11-18 16:24:04,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=310253.3333333333, ans=0.125 2023-11-18 16:24:22,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=310320.0, ans=0.2 2023-11-18 16:24:31,862 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.44 vs. limit=15.0 2023-11-18 16:24:32,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=310386.6666666667, ans=0.2 2023-11-18 16:24:35,561 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10500, loss[loss=0.06794, simple_loss=0.07479, pruned_loss=0.01974, audio_tagging_loss=0.01081, over 14948.00 frames. ], tot_loss[loss=0.1068, simple_loss=0.1202, pruned_loss=0.0351, audio_tagging_loss=0.01159, over 3053145.58 frames. ], batch size: 59, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:25:08,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=310653.3333333333, ans=0.1 2023-11-18 16:25:17,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=310653.3333333333, ans=0.125 2023-11-18 16:25:25,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=310720.0, ans=0.015 2023-11-18 16:25:26,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.53 vs. limit=15.0 2023-11-18 16:25:31,998 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10550, loss[loss=0.1351, simple_loss=0.1543, pruned_loss=0.04633, audio_tagging_loss=0.01159, over 15000.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1209, pruned_loss=0.03515, audio_tagging_loss=0.01148, over 3051678.29 frames. ], batch size: 54, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:25:33,738 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.84 vs. limit=22.5 2023-11-18 16:25:35,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=310786.6666666667, ans=0.125 2023-11-18 16:25:35,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=310786.6666666667, ans=0.0 2023-11-18 16:25:40,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=310786.6666666667, ans=0.0 2023-11-18 16:25:49,163 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 9.120e+01 1.006e+02 1.112e+02 1.547e+02, threshold=2.011e+02, percent-clipped=0.0 2023-11-18 16:26:16,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=311053.3333333333, ans=0.0 2023-11-18 16:26:18,438 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.99 vs. limit=15.0 2023-11-18 16:26:28,617 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10600, loss[loss=0.1232, simple_loss=0.1459, pruned_loss=0.04032, audio_tagging_loss=0.009974, over 14604.00 frames. ], tot_loss[loss=0.1067, simple_loss=0.1207, pruned_loss=0.03498, audio_tagging_loss=0.01139, over 3047917.67 frames. ], batch size: 56, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:26:42,484 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.38 vs. limit=15.0 2023-11-18 16:26:43,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=311186.6666666667, ans=0.035 2023-11-18 16:27:00,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=311320.0, ans=0.125 2023-11-18 16:27:00,854 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:27:03,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=311320.0, ans=0.1 2023-11-18 16:27:11,829 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.78 vs. limit=15.0 2023-11-18 16:27:24,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.33 vs. limit=22.5 2023-11-18 16:27:24,509 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10650, loss[loss=0.1342, simple_loss=0.1598, pruned_loss=0.0424, audio_tagging_loss=0.01193, over 14926.00 frames. ], tot_loss[loss=0.107, simple_loss=0.121, pruned_loss=0.03509, audio_tagging_loss=0.01143, over 3043306.62 frames. ], batch size: 56, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:27:40,912 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.701e+01 9.767e+01 1.078e+02 1.173e+02 1.612e+02, threshold=2.157e+02, percent-clipped=0.0 2023-11-18 16:27:52,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=311586.6666666667, ans=0.2 2023-11-18 16:27:55,250 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.40 vs. limit=22.5 2023-11-18 16:27:55,468 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.99 vs. limit=22.5 2023-11-18 16:28:03,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=311653.3333333333, ans=0.0 2023-11-18 16:28:09,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=311720.0, ans=0.125 2023-11-18 16:28:11,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.85 vs. limit=15.0 2023-11-18 16:28:20,385 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10700, loss[loss=0.1471, simple_loss=0.1519, pruned_loss=0.05669, audio_tagging_loss=0.0145, over 14564.00 frames. ], tot_loss[loss=0.1057, simple_loss=0.1198, pruned_loss=0.03451, audio_tagging_loss=0.01129, over 3039991.24 frames. ], batch size: 54, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:28:30,258 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=311786.6666666667, ans=0.05 2023-11-18 16:28:32,486 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.75 vs. limit=22.5 2023-11-18 16:28:49,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=311920.0, ans=0.125 2023-11-18 16:28:52,194 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.79 vs. limit=22.5 2023-11-18 16:28:52,360 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.10 vs. limit=10.0 2023-11-18 16:28:53,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=311986.6666666667, ans=0.1 2023-11-18 16:29:09,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=312053.3333333333, ans=0.07 2023-11-18 16:29:17,077 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10750, loss[loss=0.1136, simple_loss=0.1277, pruned_loss=0.03758, audio_tagging_loss=0.01218, over 16056.00 frames. ], tot_loss[loss=0.1055, simple_loss=0.1198, pruned_loss=0.03438, audio_tagging_loss=0.01128, over 3038775.86 frames. ], batch size: 61, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:29:33,575 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 9.141e+01 9.911e+01 1.128e+02 1.714e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-18 16:29:34,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2023-11-18 16:29:40,513 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.29 vs. limit=22.5 2023-11-18 16:29:42,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=312253.3333333333, ans=0.07 2023-11-18 16:29:44,399 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.65 vs. limit=22.5 2023-11-18 16:30:10,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=312386.6666666667, ans=0.0 2023-11-18 16:30:10,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=312386.6666666667, ans=0.0 2023-11-18 16:30:12,488 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10800, loss[loss=0.08423, simple_loss=0.1051, pruned_loss=0.02228, audio_tagging_loss=0.009392, over 15142.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1202, pruned_loss=0.0348, audio_tagging_loss=0.01122, over 3039037.30 frames. ], batch size: 58, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:30:43,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=312586.6666666667, ans=0.125 2023-11-18 16:30:49,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=312653.3333333333, ans=0.125 2023-11-18 16:30:49,274 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.61 vs. limit=15.0 2023-11-18 16:30:55,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=312653.3333333333, ans=0.0 2023-11-18 16:31:05,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=312720.0, ans=0.1 2023-11-18 16:31:08,846 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10850, loss[loss=0.1417, simple_loss=0.1586, pruned_loss=0.04983, audio_tagging_loss=0.01254, over 15511.00 frames. ], tot_loss[loss=0.1055, simple_loss=0.1194, pruned_loss=0.0344, audio_tagging_loss=0.01141, over 3038855.75 frames. ], batch size: 55, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:31:25,300 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.405e+01 9.301e+01 1.024e+02 1.166e+02 1.801e+02, threshold=2.048e+02, percent-clipped=0.0 2023-11-18 16:31:48,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=312986.6666666667, ans=0.0 2023-11-18 16:31:58,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.18 vs. limit=15.0 2023-11-18 16:32:02,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=313053.3333333333, ans=0.1 2023-11-18 16:32:03,389 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:32:04,477 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10900, loss[loss=0.1097, simple_loss=0.1195, pruned_loss=0.03827, audio_tagging_loss=0.01168, over 13983.00 frames. ], tot_loss[loss=0.1056, simple_loss=0.1192, pruned_loss=0.03457, audio_tagging_loss=0.01147, over 3038978.37 frames. ], batch size: 55, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:32:13,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=313120.0, ans=0.04949747468305833 2023-11-18 16:32:14,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=313186.6666666667, ans=10.0 2023-11-18 16:32:19,775 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.08 vs. limit=22.5 2023-11-18 16:32:44,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=313320.0, ans=0.0 2023-11-18 16:32:49,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=313386.6666666667, ans=0.2 2023-11-18 16:32:49,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=313386.6666666667, ans=0.2 2023-11-18 16:32:49,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=313386.6666666667, ans=0.1 2023-11-18 16:32:51,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=313386.6666666667, ans=0.125 2023-11-18 16:32:56,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=313386.6666666667, ans=0.2 2023-11-18 16:32:59,376 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 10950, loss[loss=0.1048, simple_loss=0.1148, pruned_loss=0.03692, audio_tagging_loss=0.01045, over 14734.00 frames. ], tot_loss[loss=0.1053, simple_loss=0.1185, pruned_loss=0.0345, audio_tagging_loss=0.01154, over 3037638.60 frames. ], batch size: 55, lr: 1.56e-02, grad_scale: 32.0 2023-11-18 16:33:03,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=313453.3333333333, ans=0.1 2023-11-18 16:33:12,254 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.48 vs. limit=15.0 2023-11-18 16:33:16,522 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.810e+01 9.324e+01 1.025e+02 1.137e+02 1.491e+02, threshold=2.050e+02, percent-clipped=0.0 2023-11-18 16:33:24,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=313586.6666666667, ans=0.125 2023-11-18 16:33:41,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=313653.3333333333, ans=0.1 2023-11-18 16:33:54,815 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11000, loss[loss=0.08372, simple_loss=0.08958, pruned_loss=0.02748, audio_tagging_loss=0.01145, over 15354.00 frames. ], tot_loss[loss=0.1043, simple_loss=0.1175, pruned_loss=0.03386, audio_tagging_loss=0.01168, over 3036619.28 frames. ], batch size: 59, lr: 1.56e-02, grad_scale: 64.0 2023-11-18 16:34:05,955 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:34:14,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=313853.3333333333, ans=0.125 2023-11-18 16:34:15,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=313853.3333333333, ans=0.125 2023-11-18 16:34:16,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=313920.0, ans=0.1 2023-11-18 16:34:40,076 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.24 vs. limit=15.0 2023-11-18 16:34:41,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=314053.3333333333, ans=0.0 2023-11-18 16:34:50,188 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11050, loss[loss=0.1188, simple_loss=0.141, pruned_loss=0.03679, audio_tagging_loss=0.01149, over 15179.00 frames. ], tot_loss[loss=0.1054, simple_loss=0.1186, pruned_loss=0.03438, audio_tagging_loss=0.01171, over 3046272.52 frames. ], batch size: 56, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:35:04,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=314186.6666666667, ans=0.125 2023-11-18 16:35:06,684 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.121e+01 9.418e+01 1.036e+02 1.168e+02 1.751e+02, threshold=2.073e+02, percent-clipped=0.0 2023-11-18 16:35:31,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=314320.0, ans=0.1 2023-11-18 16:35:37,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=314386.6666666667, ans=0.0 2023-11-18 16:35:37,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=314386.6666666667, ans=0.1 2023-11-18 16:35:42,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=314386.6666666667, ans=0.125 2023-11-18 16:35:42,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=314386.6666666667, ans=0.125 2023-11-18 16:35:45,679 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11100, loss[loss=0.1335, simple_loss=0.1488, pruned_loss=0.04786, audio_tagging_loss=0.01129, over 15210.00 frames. ], tot_loss[loss=0.1057, simple_loss=0.1189, pruned_loss=0.03449, audio_tagging_loss=0.01178, over 3049610.13 frames. ], batch size: 56, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:35:54,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.04 vs. limit=10.0 2023-11-18 16:35:56,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=314520.0, ans=0.125 2023-11-18 16:35:57,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=314520.0, ans=0.1 2023-11-18 16:35:59,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=314520.0, ans=0.125 2023-11-18 16:36:15,336 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.62 vs. limit=12.0 2023-11-18 16:36:38,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.61 vs. limit=6.0 2023-11-18 16:36:40,807 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11150, loss[loss=0.08889, simple_loss=0.1016, pruned_loss=0.02619, audio_tagging_loss=0.01188, over 15260.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1193, pruned_loss=0.0346, audio_tagging_loss=0.01188, over 3051211.85 frames. ], batch size: 59, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:36:51,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=314853.3333333333, ans=0.0 2023-11-18 16:36:58,979 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.048e+01 9.570e+01 1.059e+02 1.181e+02 1.990e+02, threshold=2.118e+02, percent-clipped=0.0 2023-11-18 16:37:11,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=314920.0, ans=0.1 2023-11-18 16:37:18,599 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.44 vs. limit=6.0 2023-11-18 16:37:27,488 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.70 vs. limit=15.0 2023-11-18 16:37:34,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=315053.3333333333, ans=10.0 2023-11-18 16:37:37,282 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11200, loss[loss=0.1004, simple_loss=0.1248, pruned_loss=0.02908, audio_tagging_loss=0.008935, over 16546.00 frames. ], tot_loss[loss=0.1065, simple_loss=0.12, pruned_loss=0.03466, audio_tagging_loss=0.01186, over 3049462.03 frames. ], batch size: 63, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:37:42,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=315120.0, ans=0.0 2023-11-18 16:38:07,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=315253.3333333333, ans=0.0 2023-11-18 16:38:13,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=315320.0, ans=0.125 2023-11-18 16:38:31,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=315453.3333333333, ans=0.125 2023-11-18 16:38:32,602 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11250, loss[loss=0.109, simple_loss=0.1265, pruned_loss=0.03652, audio_tagging_loss=0.009223, over 15846.00 frames. ], tot_loss[loss=0.1062, simple_loss=0.1201, pruned_loss=0.03442, audio_tagging_loss=0.01172, over 3049377.37 frames. ], batch size: 56, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:38:38,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=315453.3333333333, ans=0.125 2023-11-18 16:38:42,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=11.23 vs. limit=12.0 2023-11-18 16:38:48,487 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.341e+01 9.211e+01 1.045e+02 1.164e+02 1.761e+02, threshold=2.090e+02, percent-clipped=0.0 2023-11-18 16:38:48,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=315520.0, ans=0.125 2023-11-18 16:38:55,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=315586.6666666667, ans=0.2 2023-11-18 16:39:12,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=315653.3333333333, ans=0.0 2023-11-18 16:39:23,195 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:39:27,252 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11300, loss[loss=0.1005, simple_loss=0.1099, pruned_loss=0.03634, audio_tagging_loss=0.009162, over 14916.00 frames. ], tot_loss[loss=0.1063, simple_loss=0.1201, pruned_loss=0.03445, audio_tagging_loss=0.01175, over 3054315.10 frames. ], batch size: 57, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:39:36,829 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.73 vs. limit=15.0 2023-11-18 16:39:54,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=315920.0, ans=0.05 2023-11-18 16:40:06,923 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.95 vs. limit=15.0 2023-11-18 16:40:07,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=315986.6666666667, ans=0.125 2023-11-18 16:40:22,802 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11350, loss[loss=0.07357, simple_loss=0.07595, pruned_loss=0.02159, audio_tagging_loss=0.014, over 14738.00 frames. ], tot_loss[loss=0.106, simple_loss=0.12, pruned_loss=0.03441, audio_tagging_loss=0.01156, over 3045240.91 frames. ], batch size: 56, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:40:26,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=316120.0, ans=0.02 2023-11-18 16:40:37,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=316186.6666666667, ans=0.07 2023-11-18 16:40:39,332 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.711e+01 9.469e+01 1.052e+02 1.138e+02 1.718e+02, threshold=2.104e+02, percent-clipped=0.0 2023-11-18 16:40:44,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=316253.3333333333, ans=0.1 2023-11-18 16:40:59,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=316320.0, ans=0.125 2023-11-18 16:41:18,423 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11400, loss[loss=0.08645, simple_loss=0.1015, pruned_loss=0.0244, audio_tagging_loss=0.0113, over 14885.00 frames. ], tot_loss[loss=0.1053, simple_loss=0.1193, pruned_loss=0.0343, audio_tagging_loss=0.0114, over 3045953.31 frames. ], batch size: 56, lr: 1.55e-02, grad_scale: 64.0 2023-11-18 16:41:40,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=316586.6666666667, ans=0.125 2023-11-18 16:41:50,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=316653.3333333333, ans=0.125 2023-11-18 16:41:51,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=316653.3333333333, ans=0.2 2023-11-18 16:41:57,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=316653.3333333333, ans=0.125 2023-11-18 16:42:07,338 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.88 vs. limit=10.0 2023-11-18 16:42:13,244 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11450, loss[loss=0.09695, simple_loss=0.1149, pruned_loss=0.026, audio_tagging_loss=0.01351, over 14089.00 frames. ], tot_loss[loss=0.1055, simple_loss=0.1195, pruned_loss=0.03432, audio_tagging_loss=0.01141, over 3049548.07 frames. ], batch size: 54, lr: 1.55e-02, grad_scale: 32.0 2023-11-18 16:42:14,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=316786.6666666667, ans=10.0 2023-11-18 16:42:26,994 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.24 vs. limit=15.0 2023-11-18 16:42:30,665 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.176e+01 9.649e+01 1.077e+02 1.207e+02 1.681e+02, threshold=2.154e+02, percent-clipped=0.0 2023-11-18 16:42:35,442 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.65 vs. limit=15.0 2023-11-18 16:42:49,907 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.99 vs. limit=15.0 2023-11-18 16:43:08,887 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11500, loss[loss=0.1233, simple_loss=0.1419, pruned_loss=0.04356, audio_tagging_loss=0.008718, over 16383.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1202, pruned_loss=0.03456, audio_tagging_loss=0.0114, over 3048962.13 frames. ], batch size: 61, lr: 1.55e-02, grad_scale: 32.0 2023-11-18 16:43:19,168 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=12.0 2023-11-18 16:43:21,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2023-11-18 16:43:27,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=317186.6666666667, ans=0.04949747468305833 2023-11-18 16:44:05,436 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11550, loss[loss=0.09034, simple_loss=0.104, pruned_loss=0.02902, audio_tagging_loss=0.009317, over 14441.00 frames. ], tot_loss[loss=0.1053, simple_loss=0.1191, pruned_loss=0.03429, audio_tagging_loss=0.01146, over 3052962.64 frames. ], batch size: 56, lr: 1.55e-02, grad_scale: 16.0 2023-11-18 16:44:20,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=317520.0, ans=0.07 2023-11-18 16:44:23,386 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.506e+01 9.289e+01 1.045e+02 1.175e+02 1.806e+02, threshold=2.091e+02, percent-clipped=0.0 2023-11-18 16:44:32,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=317586.6666666667, ans=0.125 2023-11-18 16:44:41,084 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 16:44:51,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=317720.0, ans=0.0 2023-11-18 16:44:54,033 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.47 vs. limit=15.0 2023-11-18 16:45:00,898 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11600, loss[loss=0.09625, simple_loss=0.1135, pruned_loss=0.0302, audio_tagging_loss=0.009316, over 15439.00 frames. ], tot_loss[loss=0.1059, simple_loss=0.1197, pruned_loss=0.03463, audio_tagging_loss=0.0114, over 3058034.30 frames. ], batch size: 56, lr: 1.55e-02, grad_scale: 32.0 2023-11-18 16:45:12,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=317853.3333333333, ans=0.1 2023-11-18 16:45:22,602 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=16.54 vs. limit=22.5 2023-11-18 16:45:37,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=317986.6666666667, ans=0.125 2023-11-18 16:45:41,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=317986.6666666667, ans=0.1 2023-11-18 16:45:54,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=318053.3333333333, ans=0.125 2023-11-18 16:45:56,545 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11650, loss[loss=0.1184, simple_loss=0.1214, pruned_loss=0.04389, audio_tagging_loss=0.01382, over 14711.00 frames. ], tot_loss[loss=0.1053, simple_loss=0.1188, pruned_loss=0.03446, audio_tagging_loss=0.01142, over 3048850.69 frames. ], batch size: 56, lr: 1.55e-02, grad_scale: 32.0 2023-11-18 16:45:56,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=318120.0, ans=0.0 2023-11-18 16:46:06,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=318186.6666666667, ans=0.2 2023-11-18 16:46:08,315 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.05 vs. limit=15.0 2023-11-18 16:46:15,785 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.644e+01 9.455e+01 1.056e+02 1.163e+02 1.452e+02, threshold=2.111e+02, percent-clipped=0.0 2023-11-18 16:46:22,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=318253.3333333333, ans=0.0 2023-11-18 16:46:38,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=318320.0, ans=0.125 2023-11-18 16:46:51,861 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11700, loss[loss=0.1312, simple_loss=0.1413, pruned_loss=0.04694, audio_tagging_loss=0.0136, over 15315.00 frames. ], tot_loss[loss=0.106, simple_loss=0.1194, pruned_loss=0.03476, audio_tagging_loss=0.01154, over 3053389.47 frames. ], batch size: 56, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:46:52,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=318453.3333333333, ans=0.125 2023-11-18 16:47:00,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=318453.3333333333, ans=0.125 2023-11-18 16:47:08,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=318520.0, ans=0.0 2023-11-18 16:47:09,118 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.25 vs. limit=10.0 2023-11-18 16:47:11,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=318520.0, ans=0.125 2023-11-18 16:47:13,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=318586.6666666667, ans=0.2 2023-11-18 16:47:44,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=318720.0, ans=0.125 2023-11-18 16:47:47,746 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11750, loss[loss=0.07162, simple_loss=0.08466, pruned_loss=0.01762, audio_tagging_loss=0.01167, over 14357.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1196, pruned_loss=0.03479, audio_tagging_loss=0.01156, over 3049961.96 frames. ], batch size: 54, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:47:53,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=318786.6666666667, ans=0.125 2023-11-18 16:47:54,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=318786.6666666667, ans=0.1 2023-11-18 16:48:03,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=318853.3333333333, ans=0.1 2023-11-18 16:48:05,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=318853.3333333333, ans=0.0 2023-11-18 16:48:05,734 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.51 vs. limit=12.0 2023-11-18 16:48:06,358 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.155e+01 9.788e+01 1.106e+02 1.226e+02 1.834e+02, threshold=2.212e+02, percent-clipped=0.0 2023-11-18 16:48:30,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=318986.6666666667, ans=0.125 2023-11-18 16:48:31,803 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.86 vs. limit=15.0 2023-11-18 16:48:43,589 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11800, loss[loss=0.1005, simple_loss=0.1164, pruned_loss=0.02759, audio_tagging_loss=0.01469, over 14765.00 frames. ], tot_loss[loss=0.1052, simple_loss=0.1181, pruned_loss=0.03438, audio_tagging_loss=0.0118, over 3043401.39 frames. ], batch size: 54, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:48:55,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=319186.6666666667, ans=0.125 2023-11-18 16:49:39,009 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.375e-01 2023-11-18 16:49:39,793 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11850, loss[loss=0.1147, simple_loss=0.1294, pruned_loss=0.03834, audio_tagging_loss=0.01169, over 15301.00 frames. ], tot_loss[loss=0.1058, simple_loss=0.1191, pruned_loss=0.03442, audio_tagging_loss=0.0118, over 3042573.56 frames. ], batch size: 58, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:49:41,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=319453.3333333333, ans=0.1 2023-11-18 16:49:42,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=319453.3333333333, ans=0.0 2023-11-18 16:49:42,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=319453.3333333333, ans=0.1 2023-11-18 16:49:49,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=319520.0, ans=0.0 2023-11-18 16:49:54,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=319520.0, ans=0.125 2023-11-18 16:49:57,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=319520.0, ans=0.125 2023-11-18 16:49:58,407 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 9.740e+01 1.079e+02 1.230e+02 2.254e+02, threshold=2.157e+02, percent-clipped=1.0 2023-11-18 16:50:00,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=319586.6666666667, ans=0.125 2023-11-18 16:50:07,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2.whitening_limit, batch_count=319586.6666666667, ans=15.0 2023-11-18 16:50:23,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=319720.0, ans=0.125 2023-11-18 16:50:25,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=319720.0, ans=0.0 2023-11-18 16:50:27,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer_na.min_abs, batch_count=319720.0, ans=0.02 2023-11-18 16:50:30,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=319720.0, ans=0.0 2023-11-18 16:50:31,207 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.43 vs. limit=10.0 2023-11-18 16:50:35,001 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11900, loss[loss=0.08982, simple_loss=0.09913, pruned_loss=0.02955, audio_tagging_loss=0.0107, over 15339.00 frames. ], tot_loss[loss=0.1051, simple_loss=0.1184, pruned_loss=0.03403, audio_tagging_loss=0.01193, over 3046676.37 frames. ], batch size: 58, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:50:40,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=319786.6666666667, ans=0.1 2023-11-18 16:51:01,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=319920.0, ans=0.0 2023-11-18 16:51:09,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.46 vs. limit=15.0 2023-11-18 16:51:10,396 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-48000.pt 2023-11-18 16:51:16,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=319986.6666666667, ans=0.125 2023-11-18 16:51:17,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=319986.6666666667, ans=0.125 2023-11-18 16:51:21,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=320053.3333333333, ans=0.2 2023-11-18 16:51:32,916 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 11950, loss[loss=0.1101, simple_loss=0.1281, pruned_loss=0.03304, audio_tagging_loss=0.01295, over 14558.00 frames. ], tot_loss[loss=0.1056, simple_loss=0.1188, pruned_loss=0.03431, audio_tagging_loss=0.01194, over 3048708.05 frames. ], batch size: 55, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:51:52,478 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.985e+01 9.226e+01 1.013e+02 1.097e+02 1.681e+02, threshold=2.026e+02, percent-clipped=0.0 2023-11-18 16:52:27,183 INFO [train_asr.py:1115] (0/4) Epoch 4, batch 12000, loss[loss=0.1122, simple_loss=0.1293, pruned_loss=0.03976, audio_tagging_loss=0.007789, over 14714.00 frames. ], tot_loss[loss=0.1076, simple_loss=0.1209, pruned_loss=0.03523, audio_tagging_loss=0.01187, over 3054632.46 frames. ], batch size: 54, lr: 1.54e-02, grad_scale: 32.0 2023-11-18 16:52:27,185 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 16:52:47,833 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8778, 4.9263, 4.8183, 4.9382], device='cuda:0') 2023-11-18 16:52:48,358 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.1922, 1.9651, 5.2068, 2.2730], device='cuda:0') 2023-11-18 16:53:00,118 INFO [train_asr.py:1147] (0/4) Epoch 4, validation: loss=0.07553, simple_loss=0.06151, pruned_loss=0.009833, audio_tagging_loss=0.03495, over 4681554.00 frames. 2023-11-18 16:53:00,119 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 16:53:04,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=320453.3333333333, ans=0.125 2023-11-18 16:53:08,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=320453.3333333333, ans=0.0 2023-11-18 16:53:10,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=320520.0, ans=0.125 2023-11-18 16:53:27,626 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-4.pt 2023-11-18 16:54:03,927 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 0, loss[loss=0.1256, simple_loss=0.1325, pruned_loss=0.0337, audio_tagging_loss=0.02564, over 15597.00 frames. ], tot_loss[loss=0.1256, simple_loss=0.1325, pruned_loss=0.0337, audio_tagging_loss=0.02564, over 15597.00 frames. ], batch size: 56, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:54:03,929 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 16:54:35,516 INFO [train_asr.py:1147] (0/4) Epoch 5, validation: loss=0.07399, simple_loss=0.06162, pruned_loss=0.009934, audio_tagging_loss=0.03325, over 4681554.00 frames. 2023-11-18 16:54:35,516 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 16:54:39,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=320626.6666666667, ans=0.0 2023-11-18 16:54:43,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=320626.6666666667, ans=0.125 2023-11-18 16:54:59,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=320760.0, ans=0.125 2023-11-18 16:55:18,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=320826.6666666667, ans=0.0 2023-11-18 16:55:21,603 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.951e+01 9.535e+01 1.056e+02 1.198e+02 1.542e+02, threshold=2.112e+02, percent-clipped=0.0 2023-11-18 16:55:31,308 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 50, loss[loss=0.1117, simple_loss=0.1232, pruned_loss=0.02971, audio_tagging_loss=0.0204, over 15530.00 frames. ], tot_loss[loss=0.1192, simple_loss=0.1227, pruned_loss=0.0352, audio_tagging_loss=0.02269, over 686094.45 frames. ], batch size: 58, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:55:39,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=320960.0, ans=0.0 2023-11-18 16:55:55,363 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.99 vs. limit=12.0 2023-11-18 16:55:58,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=321093.3333333333, ans=0.125 2023-11-18 16:56:13,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=321160.0, ans=0.125 2023-11-18 16:56:26,677 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 100, loss[loss=0.09748, simple_loss=0.09905, pruned_loss=0.02679, audio_tagging_loss=0.02116, over 14765.00 frames. ], tot_loss[loss=0.1169, simple_loss=0.1213, pruned_loss=0.03459, audio_tagging_loss=0.02168, over 1204976.27 frames. ], batch size: 55, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:56:30,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=321293.3333333333, ans=0.0 2023-11-18 16:56:38,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=321360.0, ans=0.125 2023-11-18 16:56:49,773 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:56:51,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=321426.6666666667, ans=0.0 2023-11-18 16:56:52,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=321426.6666666667, ans=0.125 2023-11-18 16:57:12,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 9.566e+01 1.064e+02 1.154e+02 1.620e+02, threshold=2.127e+02, percent-clipped=0.0 2023-11-18 16:57:22,394 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 150, loss[loss=0.1403, simple_loss=0.1624, pruned_loss=0.0486, audio_tagging_loss=0.01052, over 15873.00 frames. ], tot_loss[loss=0.1125, simple_loss=0.1189, pruned_loss=0.03378, audio_tagging_loss=0.01931, over 1607645.32 frames. ], batch size: 59, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:57:23,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=321626.6666666667, ans=0.0 2023-11-18 16:57:40,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=321693.3333333333, ans=0.1 2023-11-18 16:57:43,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=321760.0, ans=0.5 2023-11-18 16:57:46,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=321760.0, ans=0.125 2023-11-18 16:57:48,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=321760.0, ans=0.0 2023-11-18 16:57:49,100 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 16:58:13,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=321893.3333333333, ans=0.125 2023-11-18 16:58:17,773 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 200, loss[loss=0.1055, simple_loss=0.1203, pruned_loss=0.03209, audio_tagging_loss=0.01322, over 14692.00 frames. ], tot_loss[loss=0.1104, simple_loss=0.1191, pruned_loss=0.03375, audio_tagging_loss=0.01708, over 1924704.83 frames. ], batch size: 56, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:58:18,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=321960.0, ans=0.125 2023-11-18 16:58:39,920 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.81 vs. limit=22.5 2023-11-18 16:59:03,728 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.692e+01 9.231e+01 1.044e+02 1.147e+02 1.591e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 16:59:05,317 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.36 vs. limit=22.5 2023-11-18 16:59:12,796 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.34 vs. limit=15.0 2023-11-18 16:59:14,447 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 250, loss[loss=0.09864, simple_loss=0.1095, pruned_loss=0.03047, audio_tagging_loss=0.01344, over 14773.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.1183, pruned_loss=0.03353, audio_tagging_loss=0.01554, over 2175130.56 frames. ], batch size: 56, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 16:59:41,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=322426.6666666667, ans=0.125 2023-11-18 17:00:09,721 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 300, loss[loss=0.07896, simple_loss=0.08485, pruned_loss=0.02535, audio_tagging_loss=0.01119, over 12968.00 frames. ], tot_loss[loss=0.1077, simple_loss=0.1188, pruned_loss=0.03398, audio_tagging_loss=0.01431, over 2370855.59 frames. ], batch size: 50, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:00:49,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=322826.6666666667, ans=0.025 2023-11-18 17:00:56,003 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 9.144e+01 1.032e+02 1.177e+02 1.892e+02, threshold=2.064e+02, percent-clipped=0.0 2023-11-18 17:01:02,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=322893.3333333333, ans=0.1 2023-11-18 17:01:06,802 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 350, loss[loss=0.104, simple_loss=0.1222, pruned_loss=0.03286, audio_tagging_loss=0.01009, over 15199.00 frames. ], tot_loss[loss=0.1071, simple_loss=0.1192, pruned_loss=0.03405, audio_tagging_loss=0.01345, over 2516887.45 frames. ], batch size: 56, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:01:06,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=322960.0, ans=0.125 2023-11-18 17:01:09,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=322960.0, ans=0.0 2023-11-18 17:01:09,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=322960.0, ans=0.1 2023-11-18 17:01:17,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=323026.6666666667, ans=0.1 2023-11-18 17:01:17,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=323026.6666666667, ans=0.125 2023-11-18 17:01:21,208 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=22.5 2023-11-18 17:01:27,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=323026.6666666667, ans=0.0 2023-11-18 17:02:03,611 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 400, loss[loss=0.1072, simple_loss=0.133, pruned_loss=0.03173, audio_tagging_loss=0.008932, over 15070.00 frames. ], tot_loss[loss=0.1065, simple_loss=0.1192, pruned_loss=0.03394, audio_tagging_loss=0.01297, over 2639208.18 frames. ], batch size: 53, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:02:10,510 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2023-11-18 17:02:35,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=323493.3333333333, ans=0.2 2023-11-18 17:02:41,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=323493.3333333333, ans=0.0 2023-11-18 17:02:44,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=323493.3333333333, ans=0.125 2023-11-18 17:02:49,149 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 9.956e+01 1.111e+02 1.271e+02 1.658e+02, threshold=2.223e+02, percent-clipped=0.0 2023-11-18 17:02:54,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=323560.0, ans=0.0 2023-11-18 17:02:58,840 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 450, loss[loss=0.09232, simple_loss=0.08967, pruned_loss=0.03484, audio_tagging_loss=0.01265, over 14374.00 frames. ], tot_loss[loss=0.1058, simple_loss=0.119, pruned_loss=0.03376, audio_tagging_loss=0.01257, over 2728335.89 frames. ], batch size: 55, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:03:13,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=323693.3333333333, ans=0.125 2023-11-18 17:03:28,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=323760.0, ans=0.2 2023-11-18 17:03:32,536 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.87 vs. limit=15.0 2023-11-18 17:03:54,617 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 500, loss[loss=0.07861, simple_loss=0.08798, pruned_loss=0.02103, audio_tagging_loss=0.01359, over 15594.00 frames. ], tot_loss[loss=0.1056, simple_loss=0.1191, pruned_loss=0.03374, audio_tagging_loss=0.01231, over 2798126.10 frames. ], batch size: 60, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:03:55,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=323960.0, ans=0.0 2023-11-18 17:04:13,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=324026.6666666667, ans=0.09899494936611666 2023-11-18 17:04:15,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=324026.6666666667, ans=0.0 2023-11-18 17:04:22,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=324093.3333333333, ans=0.2 2023-11-18 17:04:28,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=324160.0, ans=0.0 2023-11-18 17:04:40,508 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.977e+01 9.068e+01 9.771e+01 1.090e+02 1.763e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-18 17:04:49,499 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.42 vs. limit=15.0 2023-11-18 17:04:51,724 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 550, loss[loss=0.07305, simple_loss=0.07772, pruned_loss=0.02062, audio_tagging_loss=0.01357, over 14405.00 frames. ], tot_loss[loss=0.1048, simple_loss=0.1178, pruned_loss=0.03357, audio_tagging_loss=0.01227, over 2846732.61 frames. ], batch size: 55, lr: 1.43e-02, grad_scale: 32.0 2023-11-18 17:04:56,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=324293.3333333333, ans=0.125 2023-11-18 17:05:26,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=324493.3333333333, ans=0.05 2023-11-18 17:05:40,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=324560.0, ans=0.09899494936611666 2023-11-18 17:05:41,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=324560.0, ans=0.125 2023-11-18 17:05:46,824 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 600, loss[loss=0.1197, simple_loss=0.1312, pruned_loss=0.04056, audio_tagging_loss=0.01353, over 15355.00 frames. ], tot_loss[loss=0.1051, simple_loss=0.1185, pruned_loss=0.0337, audio_tagging_loss=0.01212, over 2898208.64 frames. ], batch size: 57, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:05:54,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=324626.6666666667, ans=0.125 2023-11-18 17:06:01,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=324693.3333333333, ans=0.2 2023-11-18 17:06:10,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=324760.0, ans=0.125 2023-11-18 17:06:13,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=324760.0, ans=0.0 2023-11-18 17:06:13,949 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.304e-02 2023-11-18 17:06:17,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=324760.0, ans=0.1 2023-11-18 17:06:22,362 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.77 vs. limit=15.0 2023-11-18 17:06:31,966 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.91 vs. limit=15.0 2023-11-18 17:06:32,198 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.460e+01 9.095e+01 1.023e+02 1.155e+02 1.808e+02, threshold=2.046e+02, percent-clipped=0.0 2023-11-18 17:06:42,454 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 650, loss[loss=0.09739, simple_loss=0.11, pruned_loss=0.03126, audio_tagging_loss=0.0111, over 16058.00 frames. ], tot_loss[loss=0.1053, simple_loss=0.1188, pruned_loss=0.03386, audio_tagging_loss=0.01204, over 2931753.57 frames. ], batch size: 63, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:07:00,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=325026.6666666667, ans=0.0 2023-11-18 17:07:12,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=325093.3333333333, ans=0.0 2023-11-18 17:07:12,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=325093.3333333333, ans=0.125 2023-11-18 17:07:28,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=325226.6666666667, ans=0.1 2023-11-18 17:07:37,856 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 700, loss[loss=0.1448, simple_loss=0.1639, pruned_loss=0.05042, audio_tagging_loss=0.01247, over 15429.00 frames. ], tot_loss[loss=0.1054, simple_loss=0.1192, pruned_loss=0.03392, audio_tagging_loss=0.01192, over 2964947.46 frames. ], batch size: 57, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:07:52,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=325360.0, ans=0.125 2023-11-18 17:08:12,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=325493.3333333333, ans=0.125 2023-11-18 17:08:13,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.72 vs. limit=22.5 2023-11-18 17:08:24,350 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 9.264e+01 9.972e+01 1.138e+02 1.580e+02, threshold=1.994e+02, percent-clipped=0.0 2023-11-18 17:08:25,069 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.96 vs. limit=15.0 2023-11-18 17:08:28,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=325560.0, ans=0.1 2023-11-18 17:08:33,998 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 750, loss[loss=0.1554, simple_loss=0.1777, pruned_loss=0.05699, audio_tagging_loss=0.009579, over 14418.00 frames. ], tot_loss[loss=0.1059, simple_loss=0.1198, pruned_loss=0.03419, audio_tagging_loss=0.01183, over 2983732.43 frames. ], batch size: 53, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:08:39,683 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.90 vs. limit=6.0 2023-11-18 17:08:43,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=325693.3333333333, ans=0.125 2023-11-18 17:09:05,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=325760.0, ans=0.125 2023-11-18 17:09:13,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=325826.6666666667, ans=0.125 2023-11-18 17:09:29,748 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 800, loss[loss=0.1107, simple_loss=0.1275, pruned_loss=0.03709, audio_tagging_loss=0.009813, over 16557.00 frames. ], tot_loss[loss=0.1055, simple_loss=0.1194, pruned_loss=0.03389, audio_tagging_loss=0.01189, over 2996148.53 frames. ], batch size: 60, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:09:56,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=326093.3333333333, ans=0.1 2023-11-18 17:10:03,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=326160.0, ans=0.125 2023-11-18 17:10:15,469 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.292e+01 9.696e+01 1.116e+02 1.261e+02 1.745e+02, threshold=2.231e+02, percent-clipped=0.0 2023-11-18 17:10:24,969 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 850, loss[loss=0.1066, simple_loss=0.1131, pruned_loss=0.03892, audio_tagging_loss=0.01116, over 14734.00 frames. ], tot_loss[loss=0.1057, simple_loss=0.1193, pruned_loss=0.034, audio_tagging_loss=0.01202, over 3004837.40 frames. ], batch size: 56, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:10:42,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=326360.0, ans=0.0 2023-11-18 17:11:14,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=326560.0, ans=0.2 2023-11-18 17:11:16,818 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.000e-01 2023-11-18 17:11:17,166 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2023-11-18 17:11:19,094 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=15.0 2023-11-18 17:11:21,921 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 900, loss[loss=0.1169, simple_loss=0.1433, pruned_loss=0.03692, audio_tagging_loss=0.00837, over 14689.00 frames. ], tot_loss[loss=0.1048, simple_loss=0.1181, pruned_loss=0.0336, audio_tagging_loss=0.01211, over 3011384.21 frames. ], batch size: 54, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:11:50,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=326760.0, ans=0.125 2023-11-18 17:11:59,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=326826.6666666667, ans=0.125 2023-11-18 17:12:04,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.84 vs. limit=22.5 2023-11-18 17:12:08,184 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 9.412e+01 1.033e+02 1.138e+02 1.840e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 17:12:17,653 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 950, loss[loss=0.1082, simple_loss=0.1251, pruned_loss=0.03299, audio_tagging_loss=0.01266, over 15961.00 frames. ], tot_loss[loss=0.1052, simple_loss=0.119, pruned_loss=0.0339, audio_tagging_loss=0.01183, over 3018189.25 frames. ], batch size: 60, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:12:20,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=326960.0, ans=0.2 2023-11-18 17:12:25,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=326960.0, ans=0.0 2023-11-18 17:12:33,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=327026.6666666667, ans=0.1 2023-11-18 17:12:36,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=327026.6666666667, ans=0.05 2023-11-18 17:12:50,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=327160.0, ans=0.5 2023-11-18 17:12:52,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=327160.0, ans=0.05 2023-11-18 17:13:04,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=327226.6666666667, ans=0.1 2023-11-18 17:13:12,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=327293.3333333333, ans=0.015 2023-11-18 17:13:13,621 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1000, loss[loss=0.1088, simple_loss=0.1272, pruned_loss=0.03552, audio_tagging_loss=0.0097, over 15358.00 frames. ], tot_loss[loss=0.106, simple_loss=0.1205, pruned_loss=0.03435, audio_tagging_loss=0.01144, over 3020862.44 frames. ], batch size: 57, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:13:14,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=327293.3333333333, ans=0.2 2023-11-18 17:13:29,145 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.32 vs. limit=15.0 2023-11-18 17:13:38,758 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:13:43,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=327426.6666666667, ans=0.0 2023-11-18 17:14:00,045 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.575e+01 9.245e+01 1.040e+02 1.129e+02 1.708e+02, threshold=2.081e+02, percent-clipped=0.0 2023-11-18 17:14:10,347 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1050, loss[loss=0.1069, simple_loss=0.1273, pruned_loss=0.03376, audio_tagging_loss=0.009471, over 15299.00 frames. ], tot_loss[loss=0.1061, simple_loss=0.1209, pruned_loss=0.03432, audio_tagging_loss=0.01132, over 3025851.80 frames. ], batch size: 57, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:14:28,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=327693.3333333333, ans=0.0 2023-11-18 17:14:31,150 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2023-11-18 17:14:40,001 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2023-11-18 17:14:59,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=327893.3333333333, ans=0.125 2023-11-18 17:15:03,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=327893.3333333333, ans=0.2 2023-11-18 17:15:06,213 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1100, loss[loss=0.08578, simple_loss=0.08041, pruned_loss=0.0301, audio_tagging_loss=0.01548, over 14022.00 frames. ], tot_loss[loss=0.1048, simple_loss=0.1195, pruned_loss=0.03371, audio_tagging_loss=0.01135, over 3032659.53 frames. ], batch size: 56, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:15:09,919 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:15:12,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=327960.0, ans=0.125 2023-11-18 17:15:26,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=328026.6666666667, ans=0.09899494936611666 2023-11-18 17:15:44,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=328160.0, ans=0.0 2023-11-18 17:15:47,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=328160.0, ans=0.5 2023-11-18 17:15:52,625 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.960e+01 8.982e+01 9.877e+01 1.122e+02 1.591e+02, threshold=1.975e+02, percent-clipped=0.0 2023-11-18 17:15:56,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=328226.6666666667, ans=0.09899494936611666 2023-11-18 17:16:02,178 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1150, loss[loss=0.1078, simple_loss=0.1312, pruned_loss=0.03421, audio_tagging_loss=0.007964, over 14616.00 frames. ], tot_loss[loss=0.1053, simple_loss=0.12, pruned_loss=0.03406, audio_tagging_loss=0.01127, over 3034902.91 frames. ], batch size: 56, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:16:06,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=328293.3333333333, ans=0.1 2023-11-18 17:16:14,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=328360.0, ans=0.05 2023-11-18 17:16:15,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=328360.0, ans=0.1 2023-11-18 17:16:24,175 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.06 vs. limit=22.5 2023-11-18 17:16:26,670 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.91 vs. limit=22.5 2023-11-18 17:16:31,828 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.624e-01 2023-11-18 17:16:46,552 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.84 vs. limit=15.0 2023-11-18 17:16:47,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=328560.0, ans=0.125 2023-11-18 17:16:50,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=328560.0, ans=0.125 2023-11-18 17:16:58,679 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1200, loss[loss=0.09306, simple_loss=0.1054, pruned_loss=0.02754, audio_tagging_loss=0.01283, over 16184.00 frames. ], tot_loss[loss=0.105, simple_loss=0.12, pruned_loss=0.03383, audio_tagging_loss=0.01117, over 3037897.91 frames. ], batch size: 61, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:17:05,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=328626.6666666667, ans=0.95 2023-11-18 17:17:11,457 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.89 vs. limit=22.5 2023-11-18 17:17:16,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=328693.3333333333, ans=0.125 2023-11-18 17:17:26,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=328760.0, ans=0.125 2023-11-18 17:17:44,334 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.555e+01 9.512e+01 1.070e+02 1.237e+02 2.001e+02, threshold=2.140e+02, percent-clipped=1.0 2023-11-18 17:17:44,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=328893.3333333333, ans=0.09899494936611666 2023-11-18 17:17:54,559 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1250, loss[loss=0.09221, simple_loss=0.1073, pruned_loss=0.02629, audio_tagging_loss=0.01228, over 14868.00 frames. ], tot_loss[loss=0.1039, simple_loss=0.1184, pruned_loss=0.03339, audio_tagging_loss=0.01128, over 3034571.51 frames. ], batch size: 55, lr: 1.42e-02, grad_scale: 32.0 2023-11-18 17:18:11,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=329026.6666666667, ans=0.125 2023-11-18 17:18:29,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=329160.0, ans=0.125 2023-11-18 17:18:35,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=329160.0, ans=0.2 2023-11-18 17:18:50,346 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1300, loss[loss=0.1393, simple_loss=0.1514, pruned_loss=0.0503, audio_tagging_loss=0.01328, over 15184.00 frames. ], tot_loss[loss=0.1041, simple_loss=0.1189, pruned_loss=0.03349, audio_tagging_loss=0.01122, over 3032237.44 frames. ], batch size: 57, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:18:54,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=329293.3333333333, ans=0.0 2023-11-18 17:19:08,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=329360.0, ans=10.0 2023-11-18 17:19:10,109 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.92 vs. limit=12.0 2023-11-18 17:19:14,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=329426.6666666667, ans=0.0 2023-11-18 17:19:14,502 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.75 vs. limit=22.5 2023-11-18 17:19:27,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=329493.3333333333, ans=0.0 2023-11-18 17:19:33,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=329493.3333333333, ans=0.125 2023-11-18 17:19:35,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=329560.0, ans=0.0 2023-11-18 17:19:36,209 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 9.287e+01 1.028e+02 1.154e+02 1.858e+02, threshold=2.056e+02, percent-clipped=0.0 2023-11-18 17:19:46,960 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1350, loss[loss=0.08248, simple_loss=0.08791, pruned_loss=0.02522, audio_tagging_loss=0.01329, over 14615.00 frames. ], tot_loss[loss=0.1039, simple_loss=0.1186, pruned_loss=0.03333, audio_tagging_loss=0.01125, over 3038284.50 frames. ], batch size: 56, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:20:04,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=329693.3333333333, ans=0.1 2023-11-18 17:20:14,160 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2023-11-18 17:20:15,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.21 vs. limit=15.0 2023-11-18 17:20:20,805 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.17 vs. limit=6.0 2023-11-18 17:20:22,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=329826.6666666667, ans=0.125 2023-11-18 17:20:22,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=329826.6666666667, ans=0.1 2023-11-18 17:20:24,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=329826.6666666667, ans=0.125 2023-11-18 17:20:26,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=329826.6666666667, ans=0.0 2023-11-18 17:20:28,584 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:20:34,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=329893.3333333333, ans=0.0 2023-11-18 17:20:42,467 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1400, loss[loss=0.08112, simple_loss=0.0945, pruned_loss=0.02259, audio_tagging_loss=0.01128, over 17004.00 frames. ], tot_loss[loss=0.103, simple_loss=0.1177, pruned_loss=0.03283, audio_tagging_loss=0.01138, over 3036791.49 frames. ], batch size: 62, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:20:43,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=329960.0, ans=0.125 2023-11-18 17:21:10,259 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.17 vs. limit=6.0 2023-11-18 17:21:17,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=330160.0, ans=0.0 2023-11-18 17:21:24,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=330160.0, ans=0.0 2023-11-18 17:21:27,664 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.572e+01 9.034e+01 1.046e+02 1.162e+02 2.116e+02, threshold=2.092e+02, percent-clipped=1.0 2023-11-18 17:21:38,701 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1450, loss[loss=0.08943, simple_loss=0.09348, pruned_loss=0.02914, audio_tagging_loss=0.01355, over 15184.00 frames. ], tot_loss[loss=0.1042, simple_loss=0.1187, pruned_loss=0.0333, audio_tagging_loss=0.01156, over 3038695.24 frames. ], batch size: 56, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:21:50,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=330360.0, ans=0.125 2023-11-18 17:22:15,001 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.40 vs. limit=10.0 2023-11-18 17:22:17,603 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.56 vs. limit=6.0 2023-11-18 17:22:18,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=330493.3333333333, ans=0.125 2023-11-18 17:22:19,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=330493.3333333333, ans=0.1 2023-11-18 17:22:34,987 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1500, loss[loss=0.1022, simple_loss=0.1136, pruned_loss=0.0318, audio_tagging_loss=0.01361, over 16647.00 frames. ], tot_loss[loss=0.1042, simple_loss=0.1187, pruned_loss=0.03334, audio_tagging_loss=0.01157, over 3040042.55 frames. ], batch size: 65, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:22:38,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=330626.6666666667, ans=0.125 2023-11-18 17:22:47,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=330693.3333333333, ans=0.1 2023-11-18 17:23:02,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=330760.0, ans=0.1 2023-11-18 17:23:08,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=330826.6666666667, ans=0.125 2023-11-18 17:23:10,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=330826.6666666667, ans=0.125 2023-11-18 17:23:18,436 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.25 vs. limit=15.0 2023-11-18 17:23:20,979 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.810e+01 9.325e+01 1.048e+02 1.222e+02 1.829e+02, threshold=2.095e+02, percent-clipped=0.0 2023-11-18 17:23:25,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=330893.3333333333, ans=0.125 2023-11-18 17:23:30,605 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1550, loss[loss=0.129, simple_loss=0.1599, pruned_loss=0.04227, audio_tagging_loss=0.006839, over 15328.00 frames. ], tot_loss[loss=0.1039, simple_loss=0.1184, pruned_loss=0.03315, audio_tagging_loss=0.01156, over 3040632.40 frames. ], batch size: 53, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:23:51,451 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.52 vs. limit=6.0 2023-11-18 17:24:00,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=331093.3333333333, ans=0.125 2023-11-18 17:24:11,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=331160.0, ans=0.1 2023-11-18 17:24:26,285 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1600, loss[loss=0.08825, simple_loss=0.1007, pruned_loss=0.02638, audio_tagging_loss=0.01152, over 15164.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1178, pruned_loss=0.03312, audio_tagging_loss=0.01163, over 3043649.79 frames. ], batch size: 57, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:25:02,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=331493.3333333333, ans=0.0 2023-11-18 17:25:12,768 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.226e+01 9.229e+01 1.020e+02 1.137e+02 1.712e+02, threshold=2.040e+02, percent-clipped=0.0 2023-11-18 17:25:19,765 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2023-11-18 17:25:21,287 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.79 vs. limit=22.5 2023-11-18 17:25:23,400 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1650, loss[loss=0.09574, simple_loss=0.1212, pruned_loss=0.02575, audio_tagging_loss=0.009386, over 15186.00 frames. ], tot_loss[loss=0.1047, simple_loss=0.1189, pruned_loss=0.03358, audio_tagging_loss=0.01171, over 3052576.03 frames. ], batch size: 56, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:25:24,971 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.33 vs. limit=15.0 2023-11-18 17:25:34,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=331693.3333333333, ans=0.5 2023-11-18 17:25:37,383 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=331693.3333333333, ans=0.2 2023-11-18 17:25:42,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=331693.3333333333, ans=0.125 2023-11-18 17:25:48,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=331760.0, ans=0.125 2023-11-18 17:26:10,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=331893.3333333333, ans=0.125 2023-11-18 17:26:15,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=331893.3333333333, ans=0.1 2023-11-18 17:26:18,879 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1700, loss[loss=0.1047, simple_loss=0.1268, pruned_loss=0.02943, audio_tagging_loss=0.01187, over 15743.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1176, pruned_loss=0.03311, audio_tagging_loss=0.01181, over 3051782.51 frames. ], batch size: 57, lr: 1.41e-02, grad_scale: 64.0 2023-11-18 17:26:23,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=331960.0, ans=0.07 2023-11-18 17:26:59,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=332160.0, ans=0.0 2023-11-18 17:27:00,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=332160.0, ans=0.0 2023-11-18 17:27:06,290 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.601e+01 9.403e+01 1.012e+02 1.120e+02 1.645e+02, threshold=2.024e+02, percent-clipped=0.0 2023-11-18 17:27:08,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=332226.6666666667, ans=0.0 2023-11-18 17:27:10,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=332226.6666666667, ans=0.2 2023-11-18 17:27:13,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=332226.6666666667, ans=0.1 2023-11-18 17:27:15,292 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1750, loss[loss=0.09706, simple_loss=0.1193, pruned_loss=0.02825, audio_tagging_loss=0.009169, over 14477.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.117, pruned_loss=0.03281, audio_tagging_loss=0.01158, over 3045675.34 frames. ], batch size: 55, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:27:34,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=332360.0, ans=0.125 2023-11-18 17:27:57,656 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.41 vs. limit=15.0 2023-11-18 17:28:05,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=332560.0, ans=0.125 2023-11-18 17:28:11,841 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1800, loss[loss=0.1143, simple_loss=0.1479, pruned_loss=0.03337, audio_tagging_loss=0.006948, over 15475.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1172, pruned_loss=0.03274, audio_tagging_loss=0.01149, over 3047081.67 frames. ], batch size: 58, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:28:17,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.05 vs. limit=22.5 2023-11-18 17:28:27,708 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=12.23 vs. limit=22.5 2023-11-18 17:28:29,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=332693.3333333333, ans=0.1 2023-11-18 17:28:30,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=332693.3333333333, ans=0.0 2023-11-18 17:28:37,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=332760.0, ans=0.0 2023-11-18 17:28:37,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=332760.0, ans=0.2 2023-11-18 17:28:57,579 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.54 vs. limit=22.5 2023-11-18 17:28:59,147 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.466e+01 9.003e+01 9.997e+01 1.070e+02 1.437e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-18 17:29:03,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=332893.3333333333, ans=0.125 2023-11-18 17:29:07,653 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1850, loss[loss=0.1333, simple_loss=0.1586, pruned_loss=0.04315, audio_tagging_loss=0.01085, over 14842.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.1179, pruned_loss=0.03283, audio_tagging_loss=0.01146, over 3043371.36 frames. ], batch size: 58, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:29:13,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=332960.0, ans=0.125 2023-11-18 17:29:21,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=333026.6666666667, ans=0.125 2023-11-18 17:29:49,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.22 vs. limit=22.5 2023-11-18 17:29:52,335 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.36 vs. limit=22.5 2023-11-18 17:29:59,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=333226.6666666667, ans=0.125 2023-11-18 17:30:03,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=333293.3333333333, ans=0.0 2023-11-18 17:30:04,071 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1900, loss[loss=0.1016, simple_loss=0.1199, pruned_loss=0.03157, audio_tagging_loss=0.01005, over 14755.00 frames. ], tot_loss[loss=0.1036, simple_loss=0.1184, pruned_loss=0.03306, audio_tagging_loss=0.01137, over 3044688.70 frames. ], batch size: 55, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:30:10,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2023-11-18 17:30:13,472 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.62 vs. limit=15.0 2023-11-18 17:30:19,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=333360.0, ans=0.1 2023-11-18 17:30:24,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=333360.0, ans=0.125 2023-11-18 17:30:33,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=333426.6666666667, ans=0.0 2023-11-18 17:30:35,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=333426.6666666667, ans=0.05 2023-11-18 17:30:37,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=333493.3333333333, ans=0.04949747468305833 2023-11-18 17:30:37,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.01 vs. limit=22.5 2023-11-18 17:30:40,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=333493.3333333333, ans=0.1 2023-11-18 17:30:50,669 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.294e+01 8.811e+01 9.778e+01 1.068e+02 1.411e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-18 17:30:59,233 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 1950, loss[loss=0.07369, simple_loss=0.07475, pruned_loss=0.02203, audio_tagging_loss=0.01428, over 14556.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1168, pruned_loss=0.0326, audio_tagging_loss=0.01149, over 3030775.56 frames. ], batch size: 56, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:31:04,421 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.78 vs. limit=10.0 2023-11-18 17:31:15,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=333693.3333333333, ans=0.125 2023-11-18 17:31:22,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=333760.0, ans=0.0 2023-11-18 17:31:35,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=333826.6666666667, ans=0.125 2023-11-18 17:31:36,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=333826.6666666667, ans=0.2 2023-11-18 17:31:40,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=333826.6666666667, ans=0.2 2023-11-18 17:31:50,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=333893.3333333333, ans=0.1 2023-11-18 17:31:56,038 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2000, loss[loss=0.0858, simple_loss=0.0996, pruned_loss=0.02656, audio_tagging_loss=0.009443, over 15544.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1157, pruned_loss=0.03249, audio_tagging_loss=0.01152, over 3031148.21 frames. ], batch size: 61, lr: 1.41e-02, grad_scale: 32.0 2023-11-18 17:31:59,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=333960.0, ans=0.2 2023-11-18 17:32:03,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=333960.0, ans=0.95 2023-11-18 17:32:31,703 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.40 vs. limit=15.0 2023-11-18 17:32:32,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=334160.0, ans=0.2 2023-11-18 17:32:42,816 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 9.028e+01 9.681e+01 1.107e+02 1.913e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-18 17:32:47,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=334226.6666666667, ans=0.125 2023-11-18 17:32:51,958 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2050, loss[loss=0.09747, simple_loss=0.1074, pruned_loss=0.02829, audio_tagging_loss=0.01549, over 15612.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1164, pruned_loss=0.03273, audio_tagging_loss=0.01151, over 3037301.85 frames. ], batch size: 57, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:32:54,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=334293.3333333333, ans=10.0 2023-11-18 17:33:24,126 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.05 vs. limit=15.0 2023-11-18 17:33:28,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=334493.3333333333, ans=0.125 2023-11-18 17:33:29,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=334493.3333333333, ans=0.95 2023-11-18 17:33:37,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.74 vs. limit=6.0 2023-11-18 17:33:47,587 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2100, loss[loss=0.1277, simple_loss=0.1409, pruned_loss=0.0458, audio_tagging_loss=0.01145, over 14924.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1174, pruned_loss=0.03303, audio_tagging_loss=0.0114, over 3039763.84 frames. ], batch size: 57, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:33:47,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=334626.6666666667, ans=0.125 2023-11-18 17:33:54,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=334626.6666666667, ans=0.0 2023-11-18 17:34:19,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=334760.0, ans=0.2 2023-11-18 17:34:22,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=334826.6666666667, ans=0.07 2023-11-18 17:34:23,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=334826.6666666667, ans=0.0 2023-11-18 17:34:26,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=334826.6666666667, ans=0.04949747468305833 2023-11-18 17:34:32,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=334893.3333333333, ans=0.125 2023-11-18 17:34:34,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=334893.3333333333, ans=0.125 2023-11-18 17:34:34,905 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.950e+01 9.682e+01 1.081e+02 1.226e+02 1.656e+02, threshold=2.162e+02, percent-clipped=0.0 2023-11-18 17:34:36,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=334893.3333333333, ans=0.125 2023-11-18 17:34:37,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=334893.3333333333, ans=0.125 2023-11-18 17:34:44,639 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2150, loss[loss=0.1047, simple_loss=0.113, pruned_loss=0.03463, audio_tagging_loss=0.01355, over 15147.00 frames. ], tot_loss[loss=0.1036, simple_loss=0.1179, pruned_loss=0.0332, audio_tagging_loss=0.01144, over 3045266.99 frames. ], batch size: 57, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:34:45,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=334960.0, ans=0.0 2023-11-18 17:34:51,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=334960.0, ans=0.2 2023-11-18 17:34:56,797 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.73 vs. limit=22.5 2023-11-18 17:34:58,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=335026.6666666667, ans=0.125 2023-11-18 17:34:58,887 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-11-18 17:35:07,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=335093.3333333333, ans=0.125 2023-11-18 17:35:07,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=335093.3333333333, ans=0.2 2023-11-18 17:35:08,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=335093.3333333333, ans=10.0 2023-11-18 17:35:17,636 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:35:19,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=335160.0, ans=0.05 2023-11-18 17:35:21,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=335160.0, ans=0.1 2023-11-18 17:35:35,886 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:35:39,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=335293.3333333333, ans=0.125 2023-11-18 17:35:39,926 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2200, loss[loss=0.09235, simple_loss=0.1032, pruned_loss=0.02921, audio_tagging_loss=0.01152, over 14926.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1181, pruned_loss=0.03327, audio_tagging_loss=0.0114, over 3038012.81 frames. ], batch size: 56, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:35:52,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=335360.0, ans=0.0 2023-11-18 17:36:16,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=335493.3333333333, ans=0.125 2023-11-18 17:36:26,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=335560.0, ans=0.125 2023-11-18 17:36:27,475 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.184e+01 9.433e+01 1.069e+02 1.154e+02 1.802e+02, threshold=2.138e+02, percent-clipped=0.0 2023-11-18 17:36:36,112 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2250, loss[loss=0.1536, simple_loss=0.1678, pruned_loss=0.06097, audio_tagging_loss=0.008798, over 15868.00 frames. ], tot_loss[loss=0.1053, simple_loss=0.1199, pruned_loss=0.03392, audio_tagging_loss=0.0114, over 3045058.82 frames. ], batch size: 57, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:36:42,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=335626.6666666667, ans=0.125 2023-11-18 17:36:42,441 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.50 vs. limit=22.5 2023-11-18 17:37:08,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=335760.0, ans=0.125 2023-11-18 17:37:15,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=335826.6666666667, ans=0.0 2023-11-18 17:37:15,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=335826.6666666667, ans=0.0 2023-11-18 17:37:22,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=335893.3333333333, ans=0.0 2023-11-18 17:37:24,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.82 vs. limit=10.0 2023-11-18 17:37:31,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=335893.3333333333, ans=0.125 2023-11-18 17:37:33,143 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2300, loss[loss=0.1069, simple_loss=0.1252, pruned_loss=0.03354, audio_tagging_loss=0.01072, over 14378.00 frames. ], tot_loss[loss=0.1044, simple_loss=0.1188, pruned_loss=0.03339, audio_tagging_loss=0.01158, over 3040390.27 frames. ], batch size: 53, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:38:01,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=336093.3333333333, ans=0.125 2023-11-18 17:38:01,733 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.34 vs. limit=15.0 2023-11-18 17:38:20,664 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.556e+01 9.497e+01 1.027e+02 1.187e+02 1.652e+02, threshold=2.054e+02, percent-clipped=0.0 2023-11-18 17:38:22,767 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:38:25,006 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.95 vs. limit=6.0 2023-11-18 17:38:29,675 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2350, loss[loss=0.1123, simple_loss=0.1318, pruned_loss=0.03724, audio_tagging_loss=0.009104, over 15979.00 frames. ], tot_loss[loss=0.1042, simple_loss=0.1185, pruned_loss=0.03331, audio_tagging_loss=0.01162, over 3039521.31 frames. ], batch size: 59, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:38:36,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=336293.3333333333, ans=0.125 2023-11-18 17:38:41,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=336360.0, ans=0.0 2023-11-18 17:38:50,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=336426.6666666667, ans=0.2 2023-11-18 17:38:56,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=336426.6666666667, ans=0.2 2023-11-18 17:39:02,568 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.94 vs. limit=10.0 2023-11-18 17:39:10,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=336493.3333333333, ans=0.125 2023-11-18 17:39:23,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=336560.0, ans=0.0 2023-11-18 17:39:25,496 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2400, loss[loss=0.09813, simple_loss=0.1138, pruned_loss=0.02734, audio_tagging_loss=0.01388, over 13495.00 frames. ], tot_loss[loss=0.1041, simple_loss=0.1181, pruned_loss=0.03335, audio_tagging_loss=0.01167, over 3039145.02 frames. ], batch size: 52, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:39:25,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=336626.6666666667, ans=0.125 2023-11-18 17:39:25,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=336626.6666666667, ans=0.0 2023-11-18 17:39:30,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=336626.6666666667, ans=0.0 2023-11-18 17:39:33,731 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.91 vs. limit=15.0 2023-11-18 17:39:49,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=336760.0, ans=0.0 2023-11-18 17:40:13,220 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.544e+01 9.087e+01 9.616e+01 1.102e+02 1.303e+02, threshold=1.923e+02, percent-clipped=0.0 2023-11-18 17:40:21,846 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2450, loss[loss=0.1028, simple_loss=0.1154, pruned_loss=0.03256, audio_tagging_loss=0.01248, over 13866.00 frames. ], tot_loss[loss=0.1039, simple_loss=0.1178, pruned_loss=0.03332, audio_tagging_loss=0.01168, over 3040002.48 frames. ], batch size: 54, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:40:32,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.80 vs. limit=22.5 2023-11-18 17:40:57,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=337160.0, ans=0.04949747468305833 2023-11-18 17:41:12,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=337226.6666666667, ans=0.1 2023-11-18 17:41:17,252 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2500, loss[loss=0.1161, simple_loss=0.1222, pruned_loss=0.04158, audio_tagging_loss=0.01337, over 15510.00 frames. ], tot_loss[loss=0.1044, simple_loss=0.1186, pruned_loss=0.0335, audio_tagging_loss=0.01163, over 3045634.06 frames. ], batch size: 57, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:41:26,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=337293.3333333333, ans=0.0 2023-11-18 17:41:33,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.99 vs. limit=15.0 2023-11-18 17:41:36,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=337360.0, ans=0.2 2023-11-18 17:41:44,500 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.94 vs. limit=15.0 2023-11-18 17:41:46,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=337426.6666666667, ans=0.125 2023-11-18 17:42:05,674 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 9.278e+01 1.050e+02 1.171e+02 1.497e+02, threshold=2.099e+02, percent-clipped=0.0 2023-11-18 17:42:13,641 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2550, loss[loss=0.1006, simple_loss=0.1074, pruned_loss=0.0359, audio_tagging_loss=0.01103, over 15326.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1186, pruned_loss=0.03322, audio_tagging_loss=0.01152, over 3044165.95 frames. ], batch size: 58, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:42:13,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=337626.6666666667, ans=0.0 2023-11-18 17:42:22,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=337626.6666666667, ans=0.125 2023-11-18 17:42:27,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=337693.3333333333, ans=0.035 2023-11-18 17:42:44,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.07 vs. limit=22.5 2023-11-18 17:42:58,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=337893.3333333333, ans=0.125 2023-11-18 17:43:01,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=12.0 2023-11-18 17:43:10,224 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2600, loss[loss=0.1077, simple_loss=0.1264, pruned_loss=0.03339, audio_tagging_loss=0.01107, over 15302.00 frames. ], tot_loss[loss=0.1039, simple_loss=0.1188, pruned_loss=0.03313, audio_tagging_loss=0.01135, over 3052741.75 frames. ], batch size: 59, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:43:15,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.10 vs. limit=15.0 2023-11-18 17:43:24,307 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=338026.6666666667, ans=0.125 2023-11-18 17:43:57,881 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.138e+01 8.896e+01 9.646e+01 1.065e+02 1.578e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-18 17:43:59,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=338226.6666666667, ans=0.1 2023-11-18 17:44:03,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=338226.6666666667, ans=0.125 2023-11-18 17:44:05,257 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2650, loss[loss=0.0896, simple_loss=0.1034, pruned_loss=0.02678, audio_tagging_loss=0.01112, over 15219.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1185, pruned_loss=0.03311, audio_tagging_loss=0.01133, over 3052839.77 frames. ], batch size: 59, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:44:05,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=338293.3333333333, ans=0.125 2023-11-18 17:44:21,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=338360.0, ans=0.125 2023-11-18 17:44:56,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=338560.0, ans=0.0 2023-11-18 17:45:01,127 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2700, loss[loss=0.1184, simple_loss=0.1371, pruned_loss=0.03914, audio_tagging_loss=0.01068, over 15711.00 frames. ], tot_loss[loss=0.1038, simple_loss=0.1186, pruned_loss=0.03318, audio_tagging_loss=0.01128, over 3061263.19 frames. ], batch size: 56, lr: 1.40e-02, grad_scale: 32.0 2023-11-18 17:45:08,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=338626.6666666667, ans=0.125 2023-11-18 17:45:10,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=338626.6666666667, ans=0.0 2023-11-18 17:45:17,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=338693.3333333333, ans=0.0 2023-11-18 17:45:28,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=338760.0, ans=0.05 2023-11-18 17:45:33,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=338826.6666666667, ans=0.0 2023-11-18 17:45:49,092 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.160e+01 8.914e+01 9.942e+01 1.124e+02 1.692e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-18 17:45:52,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=338893.3333333333, ans=0.0 2023-11-18 17:45:57,620 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2750, loss[loss=0.08719, simple_loss=0.1079, pruned_loss=0.02303, audio_tagging_loss=0.01022, over 14354.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1172, pruned_loss=0.03281, audio_tagging_loss=0.01131, over 3056930.33 frames. ], batch size: 53, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:45:57,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=338960.0, ans=0.1 2023-11-18 17:45:59,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=338960.0, ans=0.125 2023-11-18 17:46:15,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=339026.6666666667, ans=0.0 2023-11-18 17:46:18,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=339093.3333333333, ans=0.0 2023-11-18 17:46:22,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=339093.3333333333, ans=0.125 2023-11-18 17:46:40,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=339226.6666666667, ans=0.2 2023-11-18 17:46:45,084 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:46:46,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=339226.6666666667, ans=0.0 2023-11-18 17:46:52,445 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2800, loss[loss=0.06324, simple_loss=0.06725, pruned_loss=0.01882, audio_tagging_loss=0.01079, over 14486.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1174, pruned_loss=0.03298, audio_tagging_loss=0.01124, over 3049580.46 frames. ], batch size: 56, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:47:03,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=339360.0, ans=0.0 2023-11-18 17:47:29,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=339493.3333333333, ans=0.04949747468305833 2023-11-18 17:47:35,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=339560.0, ans=0.0 2023-11-18 17:47:39,927 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.932e+01 9.210e+01 1.044e+02 1.186e+02 2.162e+02, threshold=2.088e+02, percent-clipped=1.0 2023-11-18 17:47:47,224 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.01 vs. limit=15.0 2023-11-18 17:47:47,891 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2850, loss[loss=0.08556, simple_loss=0.08002, pruned_loss=0.03142, audio_tagging_loss=0.01413, over 15438.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.117, pruned_loss=0.03292, audio_tagging_loss=0.0113, over 3053792.26 frames. ], batch size: 59, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:47:51,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=339626.6666666667, ans=0.125 2023-11-18 17:48:04,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=339693.3333333333, ans=0.09899494936611666 2023-11-18 17:48:10,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=339760.0, ans=0.125 2023-11-18 17:48:23,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=339826.6666666667, ans=0.1 2023-11-18 17:48:27,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=339826.6666666667, ans=0.05 2023-11-18 17:48:35,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=339893.3333333333, ans=0.05 2023-11-18 17:48:44,330 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2900, loss[loss=0.08712, simple_loss=0.08957, pruned_loss=0.02901, audio_tagging_loss=0.01332, over 13600.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1167, pruned_loss=0.03283, audio_tagging_loss=0.01134, over 3053205.25 frames. ], batch size: 54, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:48:59,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=340026.6666666667, ans=0.2 2023-11-18 17:49:06,857 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.27 vs. limit=15.0 2023-11-18 17:49:08,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=340093.3333333333, ans=0.125 2023-11-18 17:49:13,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=340093.3333333333, ans=0.125 2023-11-18 17:49:15,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=340093.3333333333, ans=0.0 2023-11-18 17:49:23,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=340160.0, ans=0.0 2023-11-18 17:49:24,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=340160.0, ans=0.125 2023-11-18 17:49:33,079 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.551e+01 9.214e+01 1.048e+02 1.170e+02 1.772e+02, threshold=2.096e+02, percent-clipped=0.0 2023-11-18 17:49:40,491 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 2950, loss[loss=0.1049, simple_loss=0.1189, pruned_loss=0.03151, audio_tagging_loss=0.01396, over 14434.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.1179, pruned_loss=0.03308, audio_tagging_loss=0.01129, over 3046728.92 frames. ], batch size: 54, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:50:08,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=340426.6666666667, ans=0.0 2023-11-18 17:50:16,193 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=340493.3333333333, ans=0.125 2023-11-18 17:50:18,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=340493.3333333333, ans=0.125 2023-11-18 17:50:28,315 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.10 vs. limit=22.5 2023-11-18 17:50:32,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=340560.0, ans=0.125 2023-11-18 17:50:36,778 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3000, loss[loss=0.1323, simple_loss=0.1442, pruned_loss=0.04999, audio_tagging_loss=0.01015, over 14650.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1187, pruned_loss=0.03333, audio_tagging_loss=0.01136, over 3042665.41 frames. ], batch size: 54, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:50:36,780 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 17:50:50,528 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([1.2997, 2.2754, 2.1077, 1.2791, 2.1814, 2.3320, 2.3386, 2.0070], device='cuda:0') 2023-11-18 17:51:09,283 INFO [train_asr.py:1147] (0/4) Epoch 5, validation: loss=0.07345, simple_loss=0.06093, pruned_loss=0.009446, audio_tagging_loss=0.03354, over 4681554.00 frames. 2023-11-18 17:51:09,284 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 17:51:20,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=340693.3333333333, ans=0.1 2023-11-18 17:51:43,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=340826.6666666667, ans=0.0 2023-11-18 17:51:55,032 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=340893.3333333333, ans=0.125 2023-11-18 17:51:56,992 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 9.087e+01 9.878e+01 1.115e+02 1.743e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-18 17:52:04,471 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3050, loss[loss=0.1273, simple_loss=0.1505, pruned_loss=0.04471, audio_tagging_loss=0.00735, over 15351.00 frames. ], tot_loss[loss=0.1047, simple_loss=0.1196, pruned_loss=0.0335, audio_tagging_loss=0.01139, over 3050304.86 frames. ], batch size: 54, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:52:13,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=340960.0, ans=0.125 2023-11-18 17:52:14,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=341026.6666666667, ans=0.1 2023-11-18 17:52:20,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=341026.6666666667, ans=0.1 2023-11-18 17:52:24,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=341026.6666666667, ans=0.0 2023-11-18 17:52:35,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=341093.3333333333, ans=0.125 2023-11-18 17:52:36,792 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 17:52:59,731 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3100, loss[loss=0.117, simple_loss=0.131, pruned_loss=0.04371, audio_tagging_loss=0.007798, over 14775.00 frames. ], tot_loss[loss=0.1044, simple_loss=0.1193, pruned_loss=0.03339, audio_tagging_loss=0.01138, over 3051431.77 frames. ], batch size: 54, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:53:00,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.05 vs. limit=22.5 2023-11-18 17:53:03,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=341293.3333333333, ans=0.0 2023-11-18 17:53:03,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.43 vs. limit=15.0 2023-11-18 17:53:14,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=15.0 2023-11-18 17:53:17,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=341360.0, ans=0.125 2023-11-18 17:53:21,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=341426.6666666667, ans=0.04949747468305833 2023-11-18 17:53:32,682 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.86 vs. limit=15.0 2023-11-18 17:53:36,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=341493.3333333333, ans=0.0 2023-11-18 17:53:44,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=341560.0, ans=0.125 2023-11-18 17:53:45,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=341560.0, ans=0.0 2023-11-18 17:53:47,380 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.507e+01 9.303e+01 9.886e+01 1.114e+02 1.331e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-18 17:53:53,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.19 vs. limit=15.0 2023-11-18 17:53:55,385 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3150, loss[loss=0.1044, simple_loss=0.117, pruned_loss=0.03358, audio_tagging_loss=0.01235, over 15747.00 frames. ], tot_loss[loss=0.1043, simple_loss=0.1193, pruned_loss=0.0332, audio_tagging_loss=0.01148, over 3059253.98 frames. ], batch size: 57, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:53:57,880 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:54:34,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=341826.6666666667, ans=0.125 2023-11-18 17:54:34,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=341826.6666666667, ans=0.125 2023-11-18 17:54:43,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=341893.3333333333, ans=0.0 2023-11-18 17:54:51,848 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3200, loss[loss=0.08885, simple_loss=0.0918, pruned_loss=0.02658, audio_tagging_loss=0.01636, over 14977.00 frames. ], tot_loss[loss=0.1045, simple_loss=0.1194, pruned_loss=0.03322, audio_tagging_loss=0.01155, over 3057573.03 frames. ], batch size: 55, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:55:01,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=342026.6666666667, ans=0.125 2023-11-18 17:55:05,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=342026.6666666667, ans=0.2 2023-11-18 17:55:09,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=342026.6666666667, ans=0.0 2023-11-18 17:55:32,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=342160.0, ans=0.0 2023-11-18 17:55:39,558 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.324e+01 9.174e+01 9.896e+01 1.084e+02 1.894e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-18 17:55:43,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=342226.6666666667, ans=0.0 2023-11-18 17:55:47,526 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3250, loss[loss=0.1127, simple_loss=0.1258, pruned_loss=0.03833, audio_tagging_loss=0.0115, over 15897.00 frames. ], tot_loss[loss=0.1038, simple_loss=0.1181, pruned_loss=0.03303, audio_tagging_loss=0.01175, over 3062710.47 frames. ], batch size: 59, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:55:52,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=342293.3333333333, ans=0.0 2023-11-18 17:55:53,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=342293.3333333333, ans=0.1 2023-11-18 17:56:00,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.05 vs. limit=22.5 2023-11-18 17:56:06,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=342360.0, ans=0.125 2023-11-18 17:56:13,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=342426.6666666667, ans=0.125 2023-11-18 17:56:16,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=342426.6666666667, ans=0.125 2023-11-18 17:56:42,534 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3300, loss[loss=0.1331, simple_loss=0.1518, pruned_loss=0.04724, audio_tagging_loss=0.009962, over 15651.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1171, pruned_loss=0.03261, audio_tagging_loss=0.01172, over 3063166.90 frames. ], batch size: 60, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:56:45,453 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-18 17:56:47,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=342626.6666666667, ans=0.2 2023-11-18 17:57:03,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=342693.3333333333, ans=0.125 2023-11-18 17:57:07,227 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=3.97 vs. limit=12.0 2023-11-18 17:57:24,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=342826.6666666667, ans=0.2 2023-11-18 17:57:31,971 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.617e+01 9.162e+01 1.022e+02 1.144e+02 1.543e+02, threshold=2.045e+02, percent-clipped=0.0 2023-11-18 17:57:39,986 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3350, loss[loss=0.08902, simple_loss=0.1087, pruned_loss=0.02702, audio_tagging_loss=0.007651, over 16192.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1162, pruned_loss=0.03205, audio_tagging_loss=0.01161, over 3061922.40 frames. ], batch size: 60, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:57:45,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=342960.0, ans=0.0 2023-11-18 17:57:49,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=343026.6666666667, ans=0.125 2023-11-18 17:58:20,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=343160.0, ans=0.1 2023-11-18 17:58:26,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=343226.6666666667, ans=0.1 2023-11-18 17:58:31,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=343226.6666666667, ans=0.1 2023-11-18 17:58:35,866 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3400, loss[loss=0.1156, simple_loss=0.1317, pruned_loss=0.0367, audio_tagging_loss=0.01304, over 15883.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.117, pruned_loss=0.03233, audio_tagging_loss=0.01145, over 3067055.62 frames. ], batch size: 58, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:58:43,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=343293.3333333333, ans=0.0 2023-11-18 17:58:53,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=343360.0, ans=0.0 2023-11-18 17:59:12,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=343493.3333333333, ans=0.125 2023-11-18 17:59:23,778 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.573e+01 9.786e+01 1.073e+02 1.222e+02 1.705e+02, threshold=2.147e+02, percent-clipped=0.0 2023-11-18 17:59:31,127 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3450, loss[loss=0.1258, simple_loss=0.1379, pruned_loss=0.04446, audio_tagging_loss=0.01241, over 14380.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1169, pruned_loss=0.03235, audio_tagging_loss=0.01147, over 3060511.60 frames. ], batch size: 54, lr: 1.39e-02, grad_scale: 32.0 2023-11-18 17:59:35,392 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.88 vs. limit=15.0 2023-11-18 17:59:58,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=343760.0, ans=0.0 2023-11-18 18:00:27,471 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3500, loss[loss=0.1048, simple_loss=0.1131, pruned_loss=0.03499, audio_tagging_loss=0.01331, over 14908.00 frames. ], tot_loss[loss=0.1016, simple_loss=0.1163, pruned_loss=0.0321, audio_tagging_loss=0.0113, over 3050968.93 frames. ], batch size: 57, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:00:29,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=343960.0, ans=0.1 2023-11-18 18:00:39,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=344026.6666666667, ans=0.0 2023-11-18 18:00:40,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=344026.6666666667, ans=0.125 2023-11-18 18:00:49,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=344093.3333333333, ans=0.0 2023-11-18 18:00:55,476 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:01:15,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.44 vs. limit=15.0 2023-11-18 18:01:16,267 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.245e+01 9.216e+01 1.044e+02 1.195e+02 1.654e+02, threshold=2.089e+02, percent-clipped=0.0 2023-11-18 18:01:23,673 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3550, loss[loss=0.08214, simple_loss=0.08566, pruned_loss=0.02533, audio_tagging_loss=0.01398, over 14117.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1164, pruned_loss=0.03218, audio_tagging_loss=0.01129, over 3052813.52 frames. ], batch size: 54, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:01:25,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=344293.3333333333, ans=22.5 2023-11-18 18:01:48,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=344426.6666666667, ans=0.125 2023-11-18 18:01:56,617 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.36 vs. limit=15.0 2023-11-18 18:02:03,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=344493.3333333333, ans=0.04949747468305833 2023-11-18 18:02:19,416 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3600, loss[loss=0.07026, simple_loss=0.08363, pruned_loss=0.02002, audio_tagging_loss=0.008433, over 13965.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1172, pruned_loss=0.03256, audio_tagging_loss=0.01134, over 3059735.46 frames. ], batch size: 54, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:02:20,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=344626.6666666667, ans=0.125 2023-11-18 18:02:21,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=344626.6666666667, ans=0.1 2023-11-18 18:02:45,043 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.31 vs. limit=10.0 2023-11-18 18:02:45,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=344760.0, ans=0.125 2023-11-18 18:02:58,673 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.24 vs. limit=15.0 2023-11-18 18:03:08,090 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.942e+01 9.105e+01 1.020e+02 1.125e+02 1.503e+02, threshold=2.039e+02, percent-clipped=0.0 2023-11-18 18:03:16,174 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3650, loss[loss=0.05894, simple_loss=0.06821, pruned_loss=0.01338, audio_tagging_loss=0.01145, over 14951.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1166, pruned_loss=0.03252, audio_tagging_loss=0.0113, over 3054723.80 frames. ], batch size: 56, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:03:24,425 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.10 vs. limit=22.5 2023-11-18 18:03:28,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=345026.6666666667, ans=0.125 2023-11-18 18:03:28,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=345026.6666666667, ans=0.5 2023-11-18 18:03:31,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=345026.6666666667, ans=0.125 2023-11-18 18:04:02,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=345226.6666666667, ans=0.0 2023-11-18 18:04:09,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=345226.6666666667, ans=0.1 2023-11-18 18:04:11,794 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3700, loss[loss=0.1272, simple_loss=0.1365, pruned_loss=0.04551, audio_tagging_loss=0.01345, over 15540.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1174, pruned_loss=0.03274, audio_tagging_loss=0.01126, over 3058554.46 frames. ], batch size: 59, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:04:16,122 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.72 vs. limit=10.0 2023-11-18 18:04:40,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=345426.6666666667, ans=0.0 2023-11-18 18:05:00,335 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.024e+01 9.436e+01 1.012e+02 1.107e+02 1.712e+02, threshold=2.024e+02, percent-clipped=0.0 2023-11-18 18:05:01,050 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.45 vs. limit=6.0 2023-11-18 18:05:07,831 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3750, loss[loss=0.08438, simple_loss=0.08799, pruned_loss=0.02533, audio_tagging_loss=0.01505, over 14058.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1173, pruned_loss=0.03272, audio_tagging_loss=0.0114, over 3057383.28 frames. ], batch size: 56, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:05:07,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=345626.6666666667, ans=0.125 2023-11-18 18:05:16,112 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=12.0 2023-11-18 18:05:31,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=345760.0, ans=0.07 2023-11-18 18:05:35,275 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.76 vs. limit=15.0 2023-11-18 18:05:41,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.00 vs. limit=15.0 2023-11-18 18:05:42,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=345826.6666666667, ans=0.125 2023-11-18 18:05:46,202 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:06:04,356 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3800, loss[loss=0.1007, simple_loss=0.1158, pruned_loss=0.03052, audio_tagging_loss=0.01228, over 14973.00 frames. ], tot_loss[loss=0.103, simple_loss=0.1175, pruned_loss=0.0329, audio_tagging_loss=0.01139, over 3054842.74 frames. ], batch size: 57, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:06:05,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=345960.0, ans=15.0 2023-11-18 18:06:38,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=346160.0, ans=0.0 2023-11-18 18:06:41,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=346160.0, ans=0.125 2023-11-18 18:06:46,239 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.78 vs. limit=22.5 2023-11-18 18:06:52,215 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.441e+01 9.330e+01 1.017e+02 1.159e+02 1.442e+02, threshold=2.034e+02, percent-clipped=0.0 2023-11-18 18:06:59,710 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3850, loss[loss=0.1109, simple_loss=0.131, pruned_loss=0.0339, audio_tagging_loss=0.01148, over 14907.00 frames. ], tot_loss[loss=0.1043, simple_loss=0.1187, pruned_loss=0.03346, audio_tagging_loss=0.01146, over 3056494.25 frames. ], batch size: 54, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:07:09,302 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.79 vs. limit=22.5 2023-11-18 18:07:28,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=346426.6666666667, ans=0.125 2023-11-18 18:07:36,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=346493.3333333333, ans=0.125 2023-11-18 18:07:43,052 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.272e-01 2023-11-18 18:07:55,526 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3900, loss[loss=0.08831, simple_loss=0.09765, pruned_loss=0.02943, audio_tagging_loss=0.01005, over 15853.00 frames. ], tot_loss[loss=0.1042, simple_loss=0.1186, pruned_loss=0.03346, audio_tagging_loss=0.01139, over 3056791.76 frames. ], batch size: 62, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:08:01,298 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-52000.pt 2023-11-18 18:08:05,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=346626.6666666667, ans=0.025 2023-11-18 18:08:21,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=346760.0, ans=0.125 2023-11-18 18:08:42,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=346893.3333333333, ans=0.05 2023-11-18 18:08:45,462 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.301e+01 9.163e+01 1.014e+02 1.129e+02 1.556e+02, threshold=2.028e+02, percent-clipped=0.0 2023-11-18 18:08:53,955 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 3950, loss[loss=0.06366, simple_loss=0.0625, pruned_loss=0.01497, audio_tagging_loss=0.01744, over 14323.00 frames. ], tot_loss[loss=0.1052, simple_loss=0.1199, pruned_loss=0.03371, audio_tagging_loss=0.01151, over 3054493.56 frames. ], batch size: 57, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:09:11,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=347026.6666666667, ans=0.07 2023-11-18 18:09:49,369 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4000, loss[loss=0.07284, simple_loss=0.07995, pruned_loss=0.02062, audio_tagging_loss=0.01225, over 14846.00 frames. ], tot_loss[loss=0.1033, simple_loss=0.1174, pruned_loss=0.03283, audio_tagging_loss=0.01177, over 3051200.56 frames. ], batch size: 58, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:09:49,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=347293.3333333333, ans=0.125 2023-11-18 18:09:51,712 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:09:55,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=347293.3333333333, ans=0.125 2023-11-18 18:09:55,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=347293.3333333333, ans=0.0 2023-11-18 18:09:58,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=347293.3333333333, ans=0.125 2023-11-18 18:10:04,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=347360.0, ans=0.09899494936611666 2023-11-18 18:10:23,105 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.75 vs. limit=15.0 2023-11-18 18:10:25,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=347493.3333333333, ans=0.1 2023-11-18 18:10:26,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=347493.3333333333, ans=0.1 2023-11-18 18:10:37,049 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.53 vs. limit=22.5 2023-11-18 18:10:37,502 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.524e+01 9.209e+01 1.022e+02 1.112e+02 1.476e+02, threshold=2.045e+02, percent-clipped=0.0 2023-11-18 18:10:42,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=347560.0, ans=0.0 2023-11-18 18:10:42,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer_ff2.min_abs, batch_count=347560.0, ans=0.1 2023-11-18 18:10:46,044 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4050, loss[loss=0.09569, simple_loss=0.1096, pruned_loss=0.02802, audio_tagging_loss=0.01287, over 16387.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.1175, pruned_loss=0.03269, audio_tagging_loss=0.01176, over 3048419.35 frames. ], batch size: 61, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:10:47,153 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:10:55,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=347626.6666666667, ans=0.125 2023-11-18 18:10:57,685 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.38 vs. limit=6.0 2023-11-18 18:11:36,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=347893.3333333333, ans=0.125 2023-11-18 18:11:42,710 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4100, loss[loss=0.1277, simple_loss=0.142, pruned_loss=0.04539, audio_tagging_loss=0.01135, over 15518.00 frames. ], tot_loss[loss=0.1034, simple_loss=0.118, pruned_loss=0.03269, audio_tagging_loss=0.01169, over 3050429.50 frames. ], batch size: 57, lr: 1.38e-02, grad_scale: 32.0 2023-11-18 18:11:51,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=347960.0, ans=0.125 2023-11-18 18:12:32,087 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.596e+01 9.111e+01 1.020e+02 1.147e+02 2.406e+02, threshold=2.040e+02, percent-clipped=1.0 2023-11-18 18:12:36,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=348226.6666666667, ans=0.125 2023-11-18 18:12:38,566 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4150, loss[loss=0.1229, simple_loss=0.135, pruned_loss=0.04583, audio_tagging_loss=0.009632, over 15568.00 frames. ], tot_loss[loss=0.1034, simple_loss=0.1181, pruned_loss=0.0329, audio_tagging_loss=0.01146, over 3043576.51 frames. ], batch size: 61, lr: 1.38e-02, grad_scale: 16.0 2023-11-18 18:12:51,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=348360.0, ans=0.125 2023-11-18 18:12:56,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=348360.0, ans=0.0 2023-11-18 18:13:00,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=348426.6666666667, ans=0.0 2023-11-18 18:13:03,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=348426.6666666667, ans=0.2 2023-11-18 18:13:06,351 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.52 vs. limit=15.0 2023-11-18 18:13:10,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=348426.6666666667, ans=0.0 2023-11-18 18:13:17,894 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:13:18,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=348493.3333333333, ans=0.125 2023-11-18 18:13:21,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=348493.3333333333, ans=0.0 2023-11-18 18:13:26,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=348560.0, ans=0.04949747468305833 2023-11-18 18:13:34,492 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4200, loss[loss=0.148, simple_loss=0.164, pruned_loss=0.05534, audio_tagging_loss=0.01062, over 15428.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1176, pruned_loss=0.03273, audio_tagging_loss=0.01141, over 3049759.60 frames. ], batch size: 55, lr: 1.38e-02, grad_scale: 16.0 2023-11-18 18:13:34,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=348626.6666666667, ans=0.07 2023-11-18 18:13:41,907 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.40 vs. limit=10.0 2023-11-18 18:13:42,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=348626.6666666667, ans=0.125 2023-11-18 18:13:42,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.84 vs. limit=6.0 2023-11-18 18:13:48,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=348693.3333333333, ans=0.125 2023-11-18 18:13:54,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=15.0 2023-11-18 18:13:55,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=348693.3333333333, ans=0.2 2023-11-18 18:13:59,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=348760.0, ans=0.04949747468305833 2023-11-18 18:14:05,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=348760.0, ans=0.2 2023-11-18 18:14:09,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=348826.6666666667, ans=0.0 2023-11-18 18:14:14,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=348826.6666666667, ans=0.0 2023-11-18 18:14:20,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=348893.3333333333, ans=0.125 2023-11-18 18:14:20,638 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.09 vs. limit=22.5 2023-11-18 18:14:23,314 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.393e+01 9.025e+01 9.781e+01 1.065e+02 1.508e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-18 18:14:30,696 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4250, loss[loss=0.0831, simple_loss=0.09776, pruned_loss=0.02508, audio_tagging_loss=0.009132, over 15828.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1174, pruned_loss=0.03246, audio_tagging_loss=0.01139, over 3044696.28 frames. ], batch size: 58, lr: 1.38e-02, grad_scale: 16.0 2023-11-18 18:14:45,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=349026.6666666667, ans=0.1 2023-11-18 18:14:47,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=349026.6666666667, ans=0.05 2023-11-18 18:15:06,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=349160.0, ans=0.0 2023-11-18 18:15:17,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=349226.6666666667, ans=0.125 2023-11-18 18:15:26,402 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4300, loss[loss=0.09715, simple_loss=0.1071, pruned_loss=0.0338, audio_tagging_loss=0.009812, over 14802.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1182, pruned_loss=0.03246, audio_tagging_loss=0.01129, over 3045540.09 frames. ], batch size: 56, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:15:36,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=349360.0, ans=0.125 2023-11-18 18:15:55,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=349426.6666666667, ans=0.125 2023-11-18 18:16:09,015 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=1.795e-01 2023-11-18 18:16:12,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=349560.0, ans=0.125 2023-11-18 18:16:15,085 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.043e+01 8.907e+01 1.023e+02 1.143e+02 1.661e+02, threshold=2.046e+02, percent-clipped=0.0 2023-11-18 18:16:21,904 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4350, loss[loss=0.09891, simple_loss=0.1142, pruned_loss=0.03093, audio_tagging_loss=0.01087, over 15402.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1181, pruned_loss=0.03238, audio_tagging_loss=0.01134, over 3040432.14 frames. ], batch size: 59, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:16:29,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=349626.6666666667, ans=0.125 2023-11-18 18:16:35,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=349693.3333333333, ans=0.2 2023-11-18 18:16:48,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=349760.0, ans=0.125 2023-11-18 18:16:50,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=349760.0, ans=0.125 2023-11-18 18:17:02,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.67 vs. limit=15.0 2023-11-18 18:17:03,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=349826.6666666667, ans=0.125 2023-11-18 18:17:12,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=349893.3333333333, ans=0.0 2023-11-18 18:17:13,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=349893.3333333333, ans=0.125 2023-11-18 18:17:14,738 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.79 vs. limit=15.0 2023-11-18 18:17:17,987 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4400, loss[loss=0.09253, simple_loss=0.1003, pruned_loss=0.03049, audio_tagging_loss=0.01188, over 16163.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1174, pruned_loss=0.03238, audio_tagging_loss=0.0114, over 3038486.25 frames. ], batch size: 61, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:17:25,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=349960.0, ans=0.125 2023-11-18 18:17:28,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=350026.6666666667, ans=0.125 2023-11-18 18:17:34,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=350026.6666666667, ans=0.125 2023-11-18 18:17:39,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=350093.3333333333, ans=0.125 2023-11-18 18:17:57,008 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=6.05 vs. limit=12.0 2023-11-18 18:18:07,459 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 9.072e+01 1.020e+02 1.136e+02 1.526e+02, threshold=2.040e+02, percent-clipped=0.0 2023-11-18 18:18:13,846 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4450, loss[loss=0.1051, simple_loss=0.1297, pruned_loss=0.02944, audio_tagging_loss=0.01082, over 16712.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1166, pruned_loss=0.03224, audio_tagging_loss=0.01133, over 3042470.87 frames. ], batch size: 62, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:18:26,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=350360.0, ans=0.125 2023-11-18 18:18:29,797 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.44 vs. limit=12.0 2023-11-18 18:18:30,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=350360.0, ans=0.035 2023-11-18 18:18:32,954 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.52 vs. limit=15.0 2023-11-18 18:18:33,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=350360.0, ans=0.2 2023-11-18 18:18:40,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=15.0 2023-11-18 18:18:49,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=350493.3333333333, ans=0.2 2023-11-18 18:19:00,933 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2023-11-18 18:19:08,908 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4500, loss[loss=0.1088, simple_loss=0.1316, pruned_loss=0.03592, audio_tagging_loss=0.007034, over 15450.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1167, pruned_loss=0.03251, audio_tagging_loss=0.01133, over 3051017.82 frames. ], batch size: 55, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:19:11,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.67 vs. limit=15.0 2023-11-18 18:19:14,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=350626.6666666667, ans=6.0 2023-11-18 18:19:19,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=350693.3333333333, ans=0.0 2023-11-18 18:19:29,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=350693.3333333333, ans=0.0 2023-11-18 18:19:30,609 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.94 vs. limit=22.5 2023-11-18 18:19:48,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=350826.6666666667, ans=0.125 2023-11-18 18:19:49,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=350826.6666666667, ans=0.125 2023-11-18 18:19:58,691 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.001e+01 9.243e+01 1.009e+02 1.115e+02 1.767e+02, threshold=2.018e+02, percent-clipped=0.0 2023-11-18 18:20:01,383 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.98 vs. limit=22.5 2023-11-18 18:20:05,096 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4550, loss[loss=0.1196, simple_loss=0.1459, pruned_loss=0.03746, audio_tagging_loss=0.009159, over 14267.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1164, pruned_loss=0.03236, audio_tagging_loss=0.0113, over 3044333.11 frames. ], batch size: 56, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:20:05,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=350960.0, ans=0.0 2023-11-18 18:20:23,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=351026.6666666667, ans=10.0 2023-11-18 18:20:24,981 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=8.159e-01 2023-11-18 18:20:29,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=351093.3333333333, ans=0.125 2023-11-18 18:20:46,545 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:21:02,025 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4600, loss[loss=0.1074, simple_loss=0.1292, pruned_loss=0.03362, audio_tagging_loss=0.009228, over 15819.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1171, pruned_loss=0.03267, audio_tagging_loss=0.01142, over 3052144.76 frames. ], batch size: 59, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:21:02,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=351293.3333333333, ans=0.125 2023-11-18 18:21:03,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=351293.3333333333, ans=0.125 2023-11-18 18:21:11,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=351360.0, ans=0.0 2023-11-18 18:21:21,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=351360.0, ans=0.2 2023-11-18 18:21:46,393 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.86 vs. limit=6.0 2023-11-18 18:21:51,003 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.856e+01 9.307e+01 1.009e+02 1.129e+02 1.665e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-18 18:21:53,562 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.43 vs. limit=15.0 2023-11-18 18:21:54,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=351560.0, ans=0.035 2023-11-18 18:21:56,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=351626.6666666667, ans=0.0 2023-11-18 18:21:57,346 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4650, loss[loss=0.1204, simple_loss=0.151, pruned_loss=0.0328, audio_tagging_loss=0.01207, over 15708.00 frames. ], tot_loss[loss=0.1035, simple_loss=0.1185, pruned_loss=0.0328, audio_tagging_loss=0.01149, over 3049513.62 frames. ], batch size: 56, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:22:31,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=351826.6666666667, ans=0.0 2023-11-18 18:22:34,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=351826.6666666667, ans=0.0 2023-11-18 18:22:52,877 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4700, loss[loss=0.09784, simple_loss=0.124, pruned_loss=0.02746, audio_tagging_loss=0.008392, over 14674.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1173, pruned_loss=0.03247, audio_tagging_loss=0.01161, over 3052389.07 frames. ], batch size: 53, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:23:26,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=352160.0, ans=0.0 2023-11-18 18:23:42,226 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.568e+01 9.156e+01 9.801e+01 1.107e+02 1.485e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-18 18:23:49,070 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4750, loss[loss=0.1191, simple_loss=0.1316, pruned_loss=0.04095, audio_tagging_loss=0.0124, over 14987.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1171, pruned_loss=0.03242, audio_tagging_loss=0.01157, over 3050611.80 frames. ], batch size: 55, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:23:54,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=352293.3333333333, ans=0.07 2023-11-18 18:24:00,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=352360.0, ans=15.0 2023-11-18 18:24:12,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=352426.6666666667, ans=0.07 2023-11-18 18:24:16,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=352426.6666666667, ans=0.125 2023-11-18 18:24:20,307 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:24:20,711 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.52 vs. limit=10.0 2023-11-18 18:24:27,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=352493.3333333333, ans=0.2 2023-11-18 18:24:30,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=352493.3333333333, ans=0.07 2023-11-18 18:24:33,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=352560.0, ans=0.0 2023-11-18 18:24:38,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=352560.0, ans=0.125 2023-11-18 18:24:45,206 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4800, loss[loss=0.0907, simple_loss=0.1057, pruned_loss=0.02407, audio_tagging_loss=0.01376, over 15165.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1167, pruned_loss=0.03229, audio_tagging_loss=0.01167, over 3047481.50 frames. ], batch size: 57, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:24:49,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=352626.6666666667, ans=0.125 2023-11-18 18:24:52,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=352626.6666666667, ans=0.5 2023-11-18 18:25:10,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=352760.0, ans=0.125 2023-11-18 18:25:14,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-11-18 18:25:20,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=352826.6666666667, ans=0.125 2023-11-18 18:25:30,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=352893.3333333333, ans=0.125 2023-11-18 18:25:31,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=352893.3333333333, ans=0.5 2023-11-18 18:25:33,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=352893.3333333333, ans=0.125 2023-11-18 18:25:34,877 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.158e+01 9.461e+01 1.064e+02 1.235e+02 1.881e+02, threshold=2.128e+02, percent-clipped=0.0 2023-11-18 18:25:35,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.01 vs. limit=12.0 2023-11-18 18:25:40,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=352960.0, ans=0.125 2023-11-18 18:25:41,305 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4850, loss[loss=0.09186, simple_loss=0.1106, pruned_loss=0.02638, audio_tagging_loss=0.01017, over 15678.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1168, pruned_loss=0.03219, audio_tagging_loss=0.01176, over 3043797.32 frames. ], batch size: 57, lr: 1.37e-02, grad_scale: 32.0 2023-11-18 18:25:46,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=352960.0, ans=0.125 2023-11-18 18:25:55,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=353026.6666666667, ans=0.125 2023-11-18 18:26:10,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=353093.3333333333, ans=0.2 2023-11-18 18:26:37,600 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4900, loss[loss=0.09393, simple_loss=0.1051, pruned_loss=0.02759, audio_tagging_loss=0.01377, over 14462.00 frames. ], tot_loss[loss=0.1024, simple_loss=0.117, pruned_loss=0.03222, audio_tagging_loss=0.01166, over 3042210.17 frames. ], batch size: 59, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:26:43,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=353293.3333333333, ans=0.0 2023-11-18 18:26:54,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=18.60 vs. limit=22.5 2023-11-18 18:26:55,140 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=15.36 vs. limit=15.0 2023-11-18 18:27:09,244 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.76 vs. limit=22.5 2023-11-18 18:27:17,946 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.70 vs. limit=10.0 2023-11-18 18:27:28,032 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.657e+01 9.441e+01 1.051e+02 1.165e+02 1.612e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-18 18:27:33,416 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 4950, loss[loss=0.092, simple_loss=0.1023, pruned_loss=0.02996, audio_tagging_loss=0.01089, over 14026.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1172, pruned_loss=0.03225, audio_tagging_loss=0.01136, over 3041222.47 frames. ], batch size: 53, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:27:50,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=353693.3333333333, ans=0.125 2023-11-18 18:27:57,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=353760.0, ans=0.2 2023-11-18 18:28:00,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=353760.0, ans=0.07 2023-11-18 18:28:30,009 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5000, loss[loss=0.09358, simple_loss=0.1069, pruned_loss=0.02775, audio_tagging_loss=0.01238, over 14385.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.118, pruned_loss=0.03242, audio_tagging_loss=0.01122, over 3033899.48 frames. ], batch size: 56, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:28:32,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=353960.0, ans=0.125 2023-11-18 18:28:33,705 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.74 vs. limit=22.5 2023-11-18 18:29:07,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2023-11-18 18:29:08,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=354160.0, ans=0.125 2023-11-18 18:29:17,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=354226.6666666667, ans=0.125 2023-11-18 18:29:19,885 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.888e+01 9.416e+01 1.033e+02 1.125e+02 1.808e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 18:29:23,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=354226.6666666667, ans=0.125 2023-11-18 18:29:26,422 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5050, loss[loss=0.1291, simple_loss=0.149, pruned_loss=0.04469, audio_tagging_loss=0.00991, over 15772.00 frames. ], tot_loss[loss=0.102, simple_loss=0.1171, pruned_loss=0.03214, audio_tagging_loss=0.01127, over 3041542.66 frames. ], batch size: 56, lr: 1.37e-02, grad_scale: 16.0 2023-11-18 18:29:27,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=354293.3333333333, ans=0.125 2023-11-18 18:29:35,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=354293.3333333333, ans=0.125 2023-11-18 18:29:39,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=354360.0, ans=0.1 2023-11-18 18:29:41,788 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.03 vs. limit=22.5 2023-11-18 18:30:10,748 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=11.06 vs. limit=15.0 2023-11-18 18:30:16,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=8.56 vs. limit=15.0 2023-11-18 18:30:17,143 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.52 vs. limit=15.0 2023-11-18 18:30:19,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=354560.0, ans=0.05 2023-11-18 18:30:21,621 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5100, loss[loss=0.08982, simple_loss=0.1012, pruned_loss=0.02754, audio_tagging_loss=0.01168, over 15161.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1177, pruned_loss=0.03234, audio_tagging_loss=0.01104, over 3047424.14 frames. ], batch size: 55, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:30:36,879 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:30:47,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=354760.0, ans=0.0 2023-11-18 18:30:55,966 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=9.219e-01 2023-11-18 18:31:05,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=354893.3333333333, ans=0.125 2023-11-18 18:31:06,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=354893.3333333333, ans=0.125 2023-11-18 18:31:11,570 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.005e+01 9.700e+01 1.061e+02 1.154e+02 1.523e+02, threshold=2.123e+02, percent-clipped=0.0 2023-11-18 18:31:15,456 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=354893.3333333333, ans=0.125 2023-11-18 18:31:17,925 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5150, loss[loss=0.08904, simple_loss=0.1012, pruned_loss=0.02302, audio_tagging_loss=0.01542, over 15244.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1168, pruned_loss=0.03193, audio_tagging_loss=0.01113, over 3047534.94 frames. ], batch size: 58, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:31:37,123 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:31:38,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=355026.6666666667, ans=0.1 2023-11-18 18:32:13,656 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5200, loss[loss=0.1366, simple_loss=0.1501, pruned_loss=0.04932, audio_tagging_loss=0.01227, over 15659.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1182, pruned_loss=0.0324, audio_tagging_loss=0.01113, over 3050692.47 frames. ], batch size: 57, lr: 1.36e-02, grad_scale: 32.0 2023-11-18 18:32:24,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=355360.0, ans=0.09899494936611666 2023-11-18 18:32:30,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=355360.0, ans=0.0 2023-11-18 18:33:03,978 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.553e+01 9.196e+01 1.027e+02 1.112e+02 1.442e+02, threshold=2.053e+02, percent-clipped=0.0 2023-11-18 18:33:05,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=355560.0, ans=0.125 2023-11-18 18:33:05,595 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.11 vs. limit=22.5 2023-11-18 18:33:09,312 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5250, loss[loss=0.0975, simple_loss=0.1129, pruned_loss=0.02803, audio_tagging_loss=0.01305, over 15750.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1182, pruned_loss=0.03242, audio_tagging_loss=0.01104, over 3052230.29 frames. ], batch size: 58, lr: 1.36e-02, grad_scale: 32.0 2023-11-18 18:33:16,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=355626.6666666667, ans=0.125 2023-11-18 18:33:16,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=355626.6666666667, ans=0.0 2023-11-18 18:33:23,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=355693.3333333333, ans=0.035 2023-11-18 18:33:35,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=355760.0, ans=0.125 2023-11-18 18:33:44,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=355826.6666666667, ans=15.0 2023-11-18 18:33:49,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=355826.6666666667, ans=0.5 2023-11-18 18:33:51,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=355826.6666666667, ans=0.0 2023-11-18 18:33:53,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=355893.3333333333, ans=0.125 2023-11-18 18:33:55,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=355893.3333333333, ans=0.125 2023-11-18 18:33:58,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=355893.3333333333, ans=0.07 2023-11-18 18:34:04,633 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5300, loss[loss=0.09268, simple_loss=0.1102, pruned_loss=0.02922, audio_tagging_loss=0.008354, over 13731.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1187, pruned_loss=0.03246, audio_tagging_loss=0.01104, over 3043491.74 frames. ], batch size: 53, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:34:09,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=355960.0, ans=0.2 2023-11-18 18:34:27,182 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.05 vs. limit=22.5 2023-11-18 18:34:30,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=356093.3333333333, ans=0.125 2023-11-18 18:34:50,900 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.74 vs. limit=15.0 2023-11-18 18:34:55,614 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 8.052e+01 9.440e+01 1.053e+02 1.166e+02 1.714e+02, threshold=2.106e+02, percent-clipped=0.0 2023-11-18 18:34:56,704 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.23 vs. limit=12.0 2023-11-18 18:35:00,877 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5350, loss[loss=0.08564, simple_loss=0.09551, pruned_loss=0.02463, audio_tagging_loss=0.01326, over 15524.00 frames. ], tot_loss[loss=0.1032, simple_loss=0.119, pruned_loss=0.03265, audio_tagging_loss=0.01106, over 3043603.75 frames. ], batch size: 58, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:35:01,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=356293.3333333333, ans=0.0 2023-11-18 18:35:29,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=356426.6666666667, ans=0.0 2023-11-18 18:35:35,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=356493.3333333333, ans=0.125 2023-11-18 18:35:35,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=356493.3333333333, ans=0.2 2023-11-18 18:35:41,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=356493.3333333333, ans=0.0 2023-11-18 18:35:55,909 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5400, loss[loss=0.08416, simple_loss=0.0936, pruned_loss=0.02114, audio_tagging_loss=0.01622, over 14692.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1182, pruned_loss=0.03231, audio_tagging_loss=0.01124, over 3051771.98 frames. ], batch size: 55, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:36:10,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=356693.3333333333, ans=0.0 2023-11-18 18:36:11,892 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.05 vs. limit=15.0 2023-11-18 18:36:26,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.09 vs. limit=15.0 2023-11-18 18:36:28,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=356826.6666666667, ans=0.0 2023-11-18 18:36:39,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=356893.3333333333, ans=0.125 2023-11-18 18:36:46,630 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.262e+01 8.854e+01 9.931e+01 1.101e+02 1.556e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-18 18:36:51,437 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5450, loss[loss=0.05846, simple_loss=0.06616, pruned_loss=0.01646, audio_tagging_loss=0.008924, over 14530.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1176, pruned_loss=0.03239, audio_tagging_loss=0.01134, over 3051585.96 frames. ], batch size: 57, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:36:54,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=356960.0, ans=0.125 2023-11-18 18:36:54,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=356960.0, ans=0.0 2023-11-18 18:37:09,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=357026.6666666667, ans=0.2 2023-11-18 18:37:29,781 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.18 vs. limit=15.0 2023-11-18 18:37:46,495 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5500, loss[loss=0.1137, simple_loss=0.1273, pruned_loss=0.03943, audio_tagging_loss=0.01059, over 15191.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1171, pruned_loss=0.0322, audio_tagging_loss=0.01135, over 3048298.06 frames. ], batch size: 56, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:37:57,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=357360.0, ans=0.0 2023-11-18 18:38:07,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=357360.0, ans=0.2 2023-11-18 18:38:15,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=357426.6666666667, ans=0.2 2023-11-18 18:38:22,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=357493.3333333333, ans=0.0 2023-11-18 18:38:38,228 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.010e+01 9.009e+01 9.946e+01 1.101e+02 1.690e+02, threshold=1.989e+02, percent-clipped=0.0 2023-11-18 18:38:39,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=357560.0, ans=0.125 2023-11-18 18:38:42,432 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5550, loss[loss=0.1159, simple_loss=0.1173, pruned_loss=0.04388, audio_tagging_loss=0.01335, over 15680.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1176, pruned_loss=0.03245, audio_tagging_loss=0.01149, over 3049969.88 frames. ], batch size: 61, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:38:55,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=357693.3333333333, ans=0.125 2023-11-18 18:39:10,175 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.72 vs. limit=12.0 2023-11-18 18:39:14,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.09 vs. limit=10.0 2023-11-18 18:39:25,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=357826.6666666667, ans=0.09899494936611666 2023-11-18 18:39:32,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=357893.3333333333, ans=0.0 2023-11-18 18:39:34,218 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.65 vs. limit=5.0 2023-11-18 18:39:34,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=357893.3333333333, ans=0.1 2023-11-18 18:39:37,592 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5600, loss[loss=0.06279, simple_loss=0.06623, pruned_loss=0.01698, audio_tagging_loss=0.01269, over 14995.00 frames. ], tot_loss[loss=0.1025, simple_loss=0.1176, pruned_loss=0.03216, audio_tagging_loss=0.01155, over 3049478.51 frames. ], batch size: 57, lr: 1.36e-02, grad_scale: 32.0 2023-11-18 18:39:52,027 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=358026.6666666667, ans=0.0 2023-11-18 18:39:53,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=358026.6666666667, ans=0.125 2023-11-18 18:40:12,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=358160.0, ans=0.0 2023-11-18 18:40:15,561 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:40:20,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=358160.0, ans=0.0 2023-11-18 18:40:24,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=358226.6666666667, ans=0.0 2023-11-18 18:40:29,637 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.690e+01 9.084e+01 1.022e+02 1.205e+02 1.640e+02, threshold=2.044e+02, percent-clipped=0.0 2023-11-18 18:40:32,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=358293.3333333333, ans=0.125 2023-11-18 18:40:32,868 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5650, loss[loss=0.1142, simple_loss=0.1204, pruned_loss=0.04337, audio_tagging_loss=0.0107, over 14219.00 frames. ], tot_loss[loss=0.102, simple_loss=0.1167, pruned_loss=0.03197, audio_tagging_loss=0.01167, over 3047433.06 frames. ], batch size: 53, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:40:36,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=358293.3333333333, ans=0.0 2023-11-18 18:40:42,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=358293.3333333333, ans=0.1 2023-11-18 18:40:44,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=358360.0, ans=0.2 2023-11-18 18:40:52,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=358360.0, ans=0.0 2023-11-18 18:40:57,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=358426.6666666667, ans=0.125 2023-11-18 18:40:58,250 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.77 vs. limit=6.0 2023-11-18 18:41:03,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=358426.6666666667, ans=0.125 2023-11-18 18:41:23,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=358560.0, ans=0.125 2023-11-18 18:41:29,430 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5700, loss[loss=0.07543, simple_loss=0.09015, pruned_loss=0.02006, audio_tagging_loss=0.01029, over 13902.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1156, pruned_loss=0.03172, audio_tagging_loss=0.01171, over 3047428.67 frames. ], batch size: 54, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:41:41,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=358693.3333333333, ans=0.125 2023-11-18 18:41:44,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=358693.3333333333, ans=0.1 2023-11-18 18:41:55,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.81 vs. limit=15.0 2023-11-18 18:42:02,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=358826.6666666667, ans=0.0 2023-11-18 18:42:04,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=358826.6666666667, ans=0.125 2023-11-18 18:42:17,776 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=2.506e-03 2023-11-18 18:42:18,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=358893.3333333333, ans=10.0 2023-11-18 18:42:21,625 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.383e+01 8.993e+01 9.865e+01 1.099e+02 1.758e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 18:42:23,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=358960.0, ans=0.125 2023-11-18 18:42:24,815 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5750, loss[loss=0.104, simple_loss=0.119, pruned_loss=0.03486, audio_tagging_loss=0.009613, over 15444.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1157, pruned_loss=0.03208, audio_tagging_loss=0.0114, over 3049894.26 frames. ], batch size: 56, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:42:35,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=359026.6666666667, ans=0.0 2023-11-18 18:42:56,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=359093.3333333333, ans=0.0 2023-11-18 18:43:18,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=359226.6666666667, ans=0.125 2023-11-18 18:43:20,446 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5800, loss[loss=0.09387, simple_loss=0.1023, pruned_loss=0.02806, audio_tagging_loss=0.01464, over 15409.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1162, pruned_loss=0.03216, audio_tagging_loss=0.01127, over 3046753.29 frames. ], batch size: 58, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:43:20,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=359293.3333333333, ans=0.125 2023-11-18 18:43:23,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=359293.3333333333, ans=0.125 2023-11-18 18:43:28,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=359293.3333333333, ans=0.0 2023-11-18 18:44:12,842 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.377e+01 8.910e+01 9.863e+01 1.080e+02 1.378e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 18:44:16,612 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5850, loss[loss=0.1222, simple_loss=0.151, pruned_loss=0.04099, audio_tagging_loss=0.005707, over 16101.00 frames. ], tot_loss[loss=0.101, simple_loss=0.1156, pruned_loss=0.03194, audio_tagging_loss=0.01123, over 3048217.98 frames. ], batch size: 57, lr: 1.36e-02, grad_scale: 16.0 2023-11-18 18:44:26,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=359693.3333333333, ans=0.125 2023-11-18 18:45:04,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=359893.3333333333, ans=0.125 2023-11-18 18:45:12,058 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5900, loss[loss=0.1054, simple_loss=0.1209, pruned_loss=0.03533, audio_tagging_loss=0.009631, over 14700.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1167, pruned_loss=0.03238, audio_tagging_loss=0.01115, over 3054406.13 frames. ], batch size: 57, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:45:15,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=359960.0, ans=0.1 2023-11-18 18:45:27,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=360026.6666666667, ans=0.95 2023-11-18 18:45:40,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=360093.3333333333, ans=0.125 2023-11-18 18:46:04,526 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 9.214e+01 1.011e+02 1.146e+02 1.411e+02, threshold=2.022e+02, percent-clipped=0.0 2023-11-18 18:46:07,768 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 5950, loss[loss=0.08449, simple_loss=0.09791, pruned_loss=0.02472, audio_tagging_loss=0.01082, over 16001.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1171, pruned_loss=0.03246, audio_tagging_loss=0.01107, over 3055991.30 frames. ], batch size: 62, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:46:18,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=360360.0, ans=0.125 2023-11-18 18:46:48,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=360493.3333333333, ans=0.0 2023-11-18 18:46:59,698 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2023-11-18 18:47:03,814 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6000, loss[loss=0.1034, simple_loss=0.1269, pruned_loss=0.03151, audio_tagging_loss=0.00847, over 14737.00 frames. ], tot_loss[loss=0.1016, simple_loss=0.1168, pruned_loss=0.03214, audio_tagging_loss=0.0111, over 3052942.66 frames. ], batch size: 56, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:47:03,816 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 18:47:36,970 INFO [train_asr.py:1147] (0/4) Epoch 5, validation: loss=0.0732, simple_loss=0.06039, pruned_loss=0.009139, audio_tagging_loss=0.03386, over 4681554.00 frames. 2023-11-18 18:47:36,971 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 18:47:42,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=360626.6666666667, ans=0.125 2023-11-18 18:47:43,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=360626.6666666667, ans=0.125 2023-11-18 18:47:46,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=360693.3333333333, ans=0.125 2023-11-18 18:48:10,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=360826.6666666667, ans=0.125 2023-11-18 18:48:13,917 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 18:48:19,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=360826.6666666667, ans=0.125 2023-11-18 18:48:28,700 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.496e+01 9.155e+01 9.916e+01 1.075e+02 1.410e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-18 18:48:29,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=360893.3333333333, ans=0.125 2023-11-18 18:48:31,970 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6050, loss[loss=0.09039, simple_loss=0.08865, pruned_loss=0.0305, audio_tagging_loss=0.01556, over 14290.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1173, pruned_loss=0.03229, audio_tagging_loss=0.01111, over 3046318.47 frames. ], batch size: 56, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:48:57,068 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2023-11-18 18:49:15,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=361226.6666666667, ans=0.2 2023-11-18 18:49:28,169 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6100, loss[loss=0.09855, simple_loss=0.1104, pruned_loss=0.03224, audio_tagging_loss=0.0111, over 13933.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1168, pruned_loss=0.03229, audio_tagging_loss=0.01105, over 3043249.30 frames. ], batch size: 53, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:49:35,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=361293.3333333333, ans=0.0 2023-11-18 18:49:41,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=361360.0, ans=0.125 2023-11-18 18:49:54,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=361426.6666666667, ans=0.125 2023-11-18 18:50:04,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=361493.3333333333, ans=0.125 2023-11-18 18:50:20,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=361560.0, ans=0.1 2023-11-18 18:50:21,524 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 9.202e+01 1.052e+02 1.142e+02 1.737e+02, threshold=2.103e+02, percent-clipped=0.0 2023-11-18 18:50:23,666 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6150, loss[loss=0.09029, simple_loss=0.1002, pruned_loss=0.02899, audio_tagging_loss=0.01118, over 15678.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1164, pruned_loss=0.0323, audio_tagging_loss=0.01124, over 3040610.84 frames. ], batch size: 63, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:50:23,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=361626.6666666667, ans=0.2 2023-11-18 18:50:38,171 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.86 vs. limit=12.0 2023-11-18 18:50:44,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=361693.3333333333, ans=0.0 2023-11-18 18:50:54,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=361760.0, ans=0.125 2023-11-18 18:50:57,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=361826.6666666667, ans=0.025 2023-11-18 18:50:57,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=361826.6666666667, ans=0.125 2023-11-18 18:51:04,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=361826.6666666667, ans=0.125 2023-11-18 18:51:08,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=361893.3333333333, ans=0.1 2023-11-18 18:51:16,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=361893.3333333333, ans=0.2 2023-11-18 18:51:20,193 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6200, loss[loss=0.1392, simple_loss=0.1699, pruned_loss=0.04699, audio_tagging_loss=0.007296, over 16057.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1158, pruned_loss=0.03194, audio_tagging_loss=0.01137, over 3041508.33 frames. ], batch size: 57, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:51:21,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=361960.0, ans=0.05 2023-11-18 18:51:23,868 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.72 vs. limit=22.5 2023-11-18 18:51:29,972 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.59 vs. limit=15.0 2023-11-18 18:51:39,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.14 vs. limit=22.5 2023-11-18 18:51:48,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=362093.3333333333, ans=0.2 2023-11-18 18:52:02,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=362160.0, ans=0.1 2023-11-18 18:52:04,247 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=14.71 vs. limit=15.0 2023-11-18 18:52:13,687 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=15.0 2023-11-18 18:52:14,243 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.793e+01 9.346e+01 1.036e+02 1.107e+02 1.533e+02, threshold=2.072e+02, percent-clipped=0.0 2023-11-18 18:52:14,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=362226.6666666667, ans=0.125 2023-11-18 18:52:16,402 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6250, loss[loss=0.08583, simple_loss=0.09299, pruned_loss=0.02596, audio_tagging_loss=0.01337, over 14579.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.116, pruned_loss=0.03203, audio_tagging_loss=0.0115, over 3046570.24 frames. ], batch size: 57, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:52:22,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1.whitening_limit, batch_count=362293.3333333333, ans=10.0 2023-11-18 18:52:31,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=362360.0, ans=0.125 2023-11-18 18:52:53,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=362493.3333333333, ans=0.1 2023-11-18 18:53:03,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=362560.0, ans=0.2 2023-11-18 18:53:10,847 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.07 vs. limit=15.0 2023-11-18 18:53:11,487 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6300, loss[loss=0.1143, simple_loss=0.1282, pruned_loss=0.03927, audio_tagging_loss=0.0109, over 14279.00 frames. ], tot_loss[loss=0.1029, simple_loss=0.1179, pruned_loss=0.03244, audio_tagging_loss=0.01152, over 3053354.86 frames. ], batch size: 54, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:53:12,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=362626.6666666667, ans=0.0 2023-11-18 18:53:17,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=362626.6666666667, ans=0.125 2023-11-18 18:53:18,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=362626.6666666667, ans=0.125 2023-11-18 18:54:04,969 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.371e+01 9.079e+01 9.861e+01 1.090e+02 1.541e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-18 18:54:05,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=362893.3333333333, ans=0.1 2023-11-18 18:54:07,085 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6350, loss[loss=0.129, simple_loss=0.1499, pruned_loss=0.04531, audio_tagging_loss=0.008757, over 15414.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1182, pruned_loss=0.03242, audio_tagging_loss=0.01153, over 3054876.47 frames. ], batch size: 55, lr: 1.35e-02, grad_scale: 16.0 2023-11-18 18:54:10,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=362960.0, ans=0.125 2023-11-18 18:54:17,060 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.36 vs. limit=15.0 2023-11-18 18:54:18,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=363026.6666666667, ans=0.125 2023-11-18 18:54:28,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=363026.6666666667, ans=0.125 2023-11-18 18:54:46,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=363160.0, ans=0.0 2023-11-18 18:54:49,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=363160.0, ans=0.125 2023-11-18 18:55:03,933 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6400, loss[loss=0.08386, simple_loss=0.09326, pruned_loss=0.02384, audio_tagging_loss=0.01339, over 14418.00 frames. ], tot_loss[loss=0.104, simple_loss=0.1189, pruned_loss=0.03291, audio_tagging_loss=0.01161, over 3050993.62 frames. ], batch size: 56, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:55:07,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=363293.3333333333, ans=0.0 2023-11-18 18:55:24,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=363426.6666666667, ans=0.0 2023-11-18 18:55:36,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=363493.3333333333, ans=0.125 2023-11-18 18:55:56,641 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.151e+01 9.321e+01 1.035e+02 1.143e+02 1.548e+02, threshold=2.069e+02, percent-clipped=0.0 2023-11-18 18:55:58,779 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6450, loss[loss=0.08491, simple_loss=0.09932, pruned_loss=0.02473, audio_tagging_loss=0.01052, over 16741.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.1183, pruned_loss=0.0328, audio_tagging_loss=0.01177, over 3054299.96 frames. ], batch size: 62, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:56:05,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=363626.6666666667, ans=0.07 2023-11-18 18:56:09,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.57 vs. limit=15.0 2023-11-18 18:56:19,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=363693.3333333333, ans=0.125 2023-11-18 18:56:47,130 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:56:49,415 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.95 vs. limit=10.0 2023-11-18 18:56:51,675 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.71 vs. limit=10.0 2023-11-18 18:56:54,225 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6500, loss[loss=0.08236, simple_loss=0.09426, pruned_loss=0.02395, audio_tagging_loss=0.01128, over 14916.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1177, pruned_loss=0.03248, audio_tagging_loss=0.01176, over 3051377.63 frames. ], batch size: 56, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:57:19,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=364093.3333333333, ans=0.125 2023-11-18 18:57:34,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=364160.0, ans=0.0 2023-11-18 18:57:44,682 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.87 vs. limit=15.0 2023-11-18 18:57:48,371 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.455e+01 9.392e+01 1.004e+02 1.100e+02 1.543e+02, threshold=2.007e+02, percent-clipped=0.0 2023-11-18 18:57:50,508 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6550, loss[loss=0.1104, simple_loss=0.1193, pruned_loss=0.03789, audio_tagging_loss=0.01288, over 14183.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1182, pruned_loss=0.03247, audio_tagging_loss=0.01153, over 3050779.96 frames. ], batch size: 55, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:57:52,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=364293.3333333333, ans=0.125 2023-11-18 18:58:21,680 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 18:58:46,369 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6600, loss[loss=0.13, simple_loss=0.1554, pruned_loss=0.04534, audio_tagging_loss=0.007031, over 16032.00 frames. ], tot_loss[loss=0.1034, simple_loss=0.1186, pruned_loss=0.03277, audio_tagging_loss=0.01135, over 3050025.04 frames. ], batch size: 59, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:58:51,998 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.95 vs. limit=10.0 2023-11-18 18:59:01,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=364693.3333333333, ans=0.125 2023-11-18 18:59:03,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=364693.3333333333, ans=0.0 2023-11-18 18:59:29,351 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.04 vs. limit=15.0 2023-11-18 18:59:30,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=364893.3333333333, ans=0.07 2023-11-18 18:59:31,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=364893.3333333333, ans=0.125 2023-11-18 18:59:39,646 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.947e+01 1.006e+02 1.140e+02 1.601e+02, threshold=2.013e+02, percent-clipped=0.0 2023-11-18 18:59:41,802 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6650, loss[loss=0.1163, simple_loss=0.1338, pruned_loss=0.03902, audio_tagging_loss=0.01038, over 14902.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1183, pruned_loss=0.03268, audio_tagging_loss=0.01125, over 3051304.51 frames. ], batch size: 53, lr: 1.35e-02, grad_scale: 32.0 2023-11-18 18:59:52,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=365026.6666666667, ans=0.125 2023-11-18 18:59:55,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=365026.6666666667, ans=10.0 2023-11-18 19:00:29,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=365226.6666666667, ans=0.0 2023-11-18 19:00:30,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=365226.6666666667, ans=0.0 2023-11-18 19:00:36,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=365293.3333333333, ans=0.125 2023-11-18 19:00:37,641 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6700, loss[loss=0.1095, simple_loss=0.1237, pruned_loss=0.03388, audio_tagging_loss=0.01376, over 15663.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1163, pruned_loss=0.03208, audio_tagging_loss=0.01125, over 3043875.46 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:01:02,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.54 vs. limit=15.0 2023-11-18 19:01:06,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=365426.6666666667, ans=0.0 2023-11-18 19:01:10,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=365493.3333333333, ans=0.125 2023-11-18 19:01:20,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=365493.3333333333, ans=0.125 2023-11-18 19:01:32,112 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.700e+01 9.414e+01 1.042e+02 1.183e+02 1.878e+02, threshold=2.084e+02, percent-clipped=0.0 2023-11-18 19:01:34,264 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6750, loss[loss=0.08275, simple_loss=0.09552, pruned_loss=0.02235, audio_tagging_loss=0.01264, over 14666.00 frames. ], tot_loss[loss=0.1004, simple_loss=0.1148, pruned_loss=0.03176, audio_tagging_loss=0.01123, over 3040258.03 frames. ], batch size: 58, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:01:34,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.51 vs. limit=22.5 2023-11-18 19:01:38,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=365626.6666666667, ans=0.125 2023-11-18 19:01:47,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=365693.3333333333, ans=0.125 2023-11-18 19:01:51,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365693.3333333333, ans=0.1 2023-11-18 19:01:54,134 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:02:08,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=365826.6666666667, ans=0.1 2023-11-18 19:02:22,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=365893.3333333333, ans=0.0 2023-11-18 19:02:24,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=365893.3333333333, ans=0.125 2023-11-18 19:02:26,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=365893.3333333333, ans=0.09899494936611666 2023-11-18 19:02:29,899 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6800, loss[loss=0.09262, simple_loss=0.108, pruned_loss=0.02712, audio_tagging_loss=0.01149, over 14464.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1157, pruned_loss=0.03206, audio_tagging_loss=0.01127, over 3031453.19 frames. ], batch size: 55, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:02:41,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.26 vs. limit=6.0 2023-11-18 19:02:51,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=366093.3333333333, ans=0.125 2023-11-18 19:03:00,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=366093.3333333333, ans=0.1 2023-11-18 19:03:22,850 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.285e+01 8.984e+01 9.907e+01 1.134e+02 1.555e+02, threshold=1.981e+02, percent-clipped=0.0 2023-11-18 19:03:24,932 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6850, loss[loss=0.1141, simple_loss=0.1298, pruned_loss=0.03603, audio_tagging_loss=0.01314, over 16312.00 frames. ], tot_loss[loss=0.102, simple_loss=0.1168, pruned_loss=0.03239, audio_tagging_loss=0.01117, over 3041680.57 frames. ], batch size: 62, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:03:25,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=366293.3333333333, ans=0.2 2023-11-18 19:03:28,066 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.41 vs. limit=12.0 2023-11-18 19:04:01,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=366493.3333333333, ans=0.0 2023-11-18 19:04:05,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=366493.3333333333, ans=0.0 2023-11-18 19:04:21,394 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6900, loss[loss=0.07764, simple_loss=0.08891, pruned_loss=0.02375, audio_tagging_loss=0.009432, over 14391.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.116, pruned_loss=0.03192, audio_tagging_loss=0.01125, over 3043884.18 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:04:25,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=366626.6666666667, ans=0.125 2023-11-18 19:04:33,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=366693.3333333333, ans=0.025 2023-11-18 19:04:44,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=366760.0, ans=0.1 2023-11-18 19:04:51,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=366760.0, ans=0.125 2023-11-18 19:05:02,948 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:05:15,601 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 9.152e+01 1.012e+02 1.130e+02 1.420e+02, threshold=2.024e+02, percent-clipped=0.0 2023-11-18 19:05:16,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.77 vs. limit=12.0 2023-11-18 19:05:17,769 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 6950, loss[loss=0.07878, simple_loss=0.09143, pruned_loss=0.02162, audio_tagging_loss=0.01144, over 16074.00 frames. ], tot_loss[loss=0.1016, simple_loss=0.1166, pruned_loss=0.03201, audio_tagging_loss=0.01131, over 3039126.96 frames. ], batch size: 60, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:05:19,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=366960.0, ans=0.125 2023-11-18 19:05:21,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=366960.0, ans=0.2 2023-11-18 19:05:23,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=366960.0, ans=0.125 2023-11-18 19:05:36,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=367026.6666666667, ans=0.95 2023-11-18 19:05:37,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=367026.6666666667, ans=0.1 2023-11-18 19:05:43,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=367093.3333333333, ans=0.1 2023-11-18 19:05:48,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.04 vs. limit=15.0 2023-11-18 19:05:55,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=367160.0, ans=0.2 2023-11-18 19:06:09,150 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.55 vs. limit=12.0 2023-11-18 19:06:10,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=367226.6666666667, ans=0.025 2023-11-18 19:06:12,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.94 vs. limit=6.0 2023-11-18 19:06:12,707 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7000, loss[loss=0.1169, simple_loss=0.1356, pruned_loss=0.03765, audio_tagging_loss=0.01147, over 16527.00 frames. ], tot_loss[loss=0.102, simple_loss=0.1173, pruned_loss=0.0322, audio_tagging_loss=0.01118, over 3040214.72 frames. ], batch size: 61, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:06:30,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=367360.0, ans=0.125 2023-11-18 19:06:38,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=367426.6666666667, ans=0.125 2023-11-18 19:06:57,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=367560.0, ans=0.125 2023-11-18 19:07:06,681 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.460e+01 9.236e+01 1.008e+02 1.142e+02 1.683e+02, threshold=2.016e+02, percent-clipped=0.0 2023-11-18 19:07:08,811 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7050, loss[loss=0.117, simple_loss=0.133, pruned_loss=0.03956, audio_tagging_loss=0.01094, over 15137.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1173, pruned_loss=0.03224, audio_tagging_loss=0.01122, over 3040906.59 frames. ], batch size: 57, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:07:23,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=367693.3333333333, ans=0.125 2023-11-18 19:07:26,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=367693.3333333333, ans=0.0 2023-11-18 19:07:31,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=367760.0, ans=0.0 2023-11-18 19:07:32,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.98 vs. limit=22.5 2023-11-18 19:07:39,002 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.14 vs. limit=12.0 2023-11-18 19:07:49,177 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=367826.6666666667, ans=0.0 2023-11-18 19:07:54,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=367893.3333333333, ans=0.0 2023-11-18 19:08:04,465 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7100, loss[loss=0.1247, simple_loss=0.1507, pruned_loss=0.04071, audio_tagging_loss=0.008661, over 15497.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1171, pruned_loss=0.03238, audio_tagging_loss=0.01135, over 3037419.90 frames. ], batch size: 57, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:08:13,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=367960.0, ans=0.07 2023-11-18 19:08:57,713 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 9.190e+01 1.032e+02 1.164e+02 1.806e+02, threshold=2.063e+02, percent-clipped=0.0 2023-11-18 19:08:59,844 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7150, loss[loss=0.1233, simple_loss=0.1563, pruned_loss=0.0356, audio_tagging_loss=0.009588, over 14489.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1165, pruned_loss=0.03206, audio_tagging_loss=0.01141, over 3032253.46 frames. ], batch size: 52, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:09:03,427 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=12.0 2023-11-18 19:09:05,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=368293.3333333333, ans=0.125 2023-11-18 19:09:26,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=368426.6666666667, ans=0.125 2023-11-18 19:09:32,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=368426.6666666667, ans=22.5 2023-11-18 19:09:38,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=368493.3333333333, ans=0.125 2023-11-18 19:09:42,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=368493.3333333333, ans=0.125 2023-11-18 19:09:42,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=368493.3333333333, ans=0.0 2023-11-18 19:09:55,903 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7200, loss[loss=0.101, simple_loss=0.1208, pruned_loss=0.03174, audio_tagging_loss=0.008926, over 14957.00 frames. ], tot_loss[loss=0.102, simple_loss=0.1168, pruned_loss=0.032, audio_tagging_loss=0.0116, over 3037589.77 frames. ], batch size: 55, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:09:57,654 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.99 vs. limit=15.0 2023-11-18 19:10:37,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=368826.6666666667, ans=0.0 2023-11-18 19:10:42,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=368893.3333333333, ans=0.0 2023-11-18 19:10:49,019 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.565e+01 9.156e+01 1.033e+02 1.136e+02 1.885e+02, threshold=2.065e+02, percent-clipped=0.0 2023-11-18 19:10:51,152 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7250, loss[loss=0.1033, simple_loss=0.1171, pruned_loss=0.03276, audio_tagging_loss=0.01194, over 15747.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.116, pruned_loss=0.03169, audio_tagging_loss=0.01173, over 3037293.86 frames. ], batch size: 58, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:10:52,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=368960.0, ans=0.125 2023-11-18 19:11:10,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.95 vs. limit=15.0 2023-11-18 19:11:11,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_ff3.min_abs, batch_count=369026.6666666667, ans=0.2 2023-11-18 19:11:11,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=369026.6666666667, ans=0.125 2023-11-18 19:11:20,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=369093.3333333333, ans=0.125 2023-11-18 19:11:31,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=369160.0, ans=0.0 2023-11-18 19:11:31,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=369160.0, ans=0.1 2023-11-18 19:11:32,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=369160.0, ans=0.1 2023-11-18 19:11:47,514 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7300, loss[loss=0.1004, simple_loss=0.1148, pruned_loss=0.03378, audio_tagging_loss=0.009216, over 15549.00 frames. ], tot_loss[loss=0.102, simple_loss=0.1171, pruned_loss=0.0319, audio_tagging_loss=0.01156, over 3041751.24 frames. ], batch size: 59, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:12:00,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=369360.0, ans=0.1 2023-11-18 19:12:01,182 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.74 vs. limit=15.0 2023-11-18 19:12:20,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=369493.3333333333, ans=0.125 2023-11-18 19:12:21,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=369493.3333333333, ans=0.0 2023-11-18 19:12:23,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.98 vs. limit=15.0 2023-11-18 19:12:27,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=369493.3333333333, ans=0.0 2023-11-18 19:12:27,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=369493.3333333333, ans=0.125 2023-11-18 19:12:41,594 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.861e+01 9.294e+01 1.042e+02 1.203e+02 1.669e+02, threshold=2.084e+02, percent-clipped=0.0 2023-11-18 19:12:44,280 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7350, loss[loss=0.06974, simple_loss=0.0682, pruned_loss=0.01952, audio_tagging_loss=0.01612, over 14937.00 frames. ], tot_loss[loss=0.1016, simple_loss=0.1166, pruned_loss=0.03192, audio_tagging_loss=0.01136, over 3047715.16 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:12:58,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=369693.3333333333, ans=0.125 2023-11-18 19:13:17,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=369826.6666666667, ans=0.125 2023-11-18 19:13:36,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=369893.3333333333, ans=0.1 2023-11-18 19:13:39,074 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7400, loss[loss=0.1123, simple_loss=0.1295, pruned_loss=0.03597, audio_tagging_loss=0.01162, over 15416.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1157, pruned_loss=0.03177, audio_tagging_loss=0.01128, over 3044646.80 frames. ], batch size: 57, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:13:55,318 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.50 vs. limit=15.0 2023-11-18 19:13:59,375 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.20 vs. limit=22.5 2023-11-18 19:14:09,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=370093.3333333333, ans=0.125 2023-11-18 19:14:32,447 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.641e+01 9.039e+01 9.551e+01 1.074e+02 1.292e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-18 19:14:34,626 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7450, loss[loss=0.1236, simple_loss=0.149, pruned_loss=0.0406, audio_tagging_loss=0.008524, over 14620.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.116, pruned_loss=0.0317, audio_tagging_loss=0.01122, over 3051570.27 frames. ], batch size: 53, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:14:52,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=370360.0, ans=0.125 2023-11-18 19:15:02,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=370426.6666666667, ans=0.0 2023-11-18 19:15:03,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.30 vs. limit=22.5 2023-11-18 19:15:05,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=370426.6666666667, ans=0.125 2023-11-18 19:15:09,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=370493.3333333333, ans=0.125 2023-11-18 19:15:15,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=370493.3333333333, ans=0.125 2023-11-18 19:15:20,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=370560.0, ans=0.125 2023-11-18 19:15:29,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=370626.6666666667, ans=0.125 2023-11-18 19:15:30,690 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7500, loss[loss=0.06846, simple_loss=0.07621, pruned_loss=0.01596, audio_tagging_loss=0.01439, over 14615.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1172, pruned_loss=0.03201, audio_tagging_loss=0.01108, over 3050285.86 frames. ], batch size: 56, lr: 1.34e-02, grad_scale: 32.0 2023-11-18 19:15:42,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=370693.3333333333, ans=0.0 2023-11-18 19:15:48,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=370693.3333333333, ans=0.125 2023-11-18 19:15:51,847 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.03 vs. limit=6.0 2023-11-18 19:15:52,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=370760.0, ans=0.1 2023-11-18 19:16:24,550 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.574e+01 9.004e+01 9.847e+01 1.087e+02 1.456e+02, threshold=1.969e+02, percent-clipped=0.0 2023-11-18 19:16:26,727 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7550, loss[loss=0.08445, simple_loss=0.09802, pruned_loss=0.02423, audio_tagging_loss=0.01122, over 14410.00 frames. ], tot_loss[loss=0.1011, simple_loss=0.1167, pruned_loss=0.03178, audio_tagging_loss=0.01102, over 3050148.24 frames. ], batch size: 56, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:17:02,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.29 vs. limit=15.0 2023-11-18 19:17:14,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=371226.6666666667, ans=0.035 2023-11-18 19:17:14,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=371226.6666666667, ans=0.1 2023-11-18 19:17:22,448 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7600, loss[loss=0.1146, simple_loss=0.1318, pruned_loss=0.03736, audio_tagging_loss=0.01137, over 15909.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1175, pruned_loss=0.03213, audio_tagging_loss=0.01101, over 3053285.16 frames. ], batch size: 58, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:17:28,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=371293.3333333333, ans=0.125 2023-11-18 19:17:40,818 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=10.56 vs. limit=15.0 2023-11-18 19:17:51,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=371426.6666666667, ans=0.0 2023-11-18 19:18:15,323 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 9.066e+01 9.750e+01 1.073e+02 2.127e+02, threshold=1.950e+02, percent-clipped=2.0 2023-11-18 19:18:18,594 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7650, loss[loss=0.0901, simple_loss=0.09601, pruned_loss=0.02691, audio_tagging_loss=0.01518, over 16016.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1169, pruned_loss=0.03215, audio_tagging_loss=0.0112, over 3037569.99 frames. ], batch size: 61, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:18:36,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=371693.3333333333, ans=0.0 2023-11-18 19:19:14,364 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7700, loss[loss=0.06147, simple_loss=0.06542, pruned_loss=0.01467, audio_tagging_loss=0.01409, over 13496.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1172, pruned_loss=0.03209, audio_tagging_loss=0.01114, over 3039350.65 frames. ], batch size: 54, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:19:17,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=371960.0, ans=0.1 2023-11-18 19:19:43,265 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:19:49,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=372160.0, ans=0.125 2023-11-18 19:19:50,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=372160.0, ans=0.125 2023-11-18 19:19:52,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=372160.0, ans=0.0 2023-11-18 19:19:58,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=372226.6666666667, ans=0.125 2023-11-18 19:20:04,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=372226.6666666667, ans=0.125 2023-11-18 19:20:08,182 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.371e+01 8.781e+01 9.756e+01 1.085e+02 1.598e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-18 19:20:10,367 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7750, loss[loss=0.05966, simple_loss=0.05747, pruned_loss=0.01696, audio_tagging_loss=0.01396, over 14127.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1156, pruned_loss=0.03177, audio_tagging_loss=0.01136, over 3042624.03 frames. ], batch size: 55, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:20:34,797 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.84 vs. limit=22.5 2023-11-18 19:20:36,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=372426.6666666667, ans=0.0 2023-11-18 19:20:59,501 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=7.971e-03 2023-11-18 19:21:00,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=372560.0, ans=0.2 2023-11-18 19:21:05,695 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7800, loss[loss=0.103, simple_loss=0.1233, pruned_loss=0.03389, audio_tagging_loss=0.007475, over 15299.00 frames. ], tot_loss[loss=0.1008, simple_loss=0.1159, pruned_loss=0.03168, audio_tagging_loss=0.01118, over 3043301.40 frames. ], batch size: 56, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:21:09,154 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.39 vs. limit=22.5 2023-11-18 19:22:00,496 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.238e+01 8.836e+01 9.770e+01 1.067e+02 1.448e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-18 19:22:01,036 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.26 vs. limit=6.0 2023-11-18 19:22:02,620 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7850, loss[loss=0.1485, simple_loss=0.1832, pruned_loss=0.04866, audio_tagging_loss=0.008251, over 14074.00 frames. ], tot_loss[loss=0.1006, simple_loss=0.1155, pruned_loss=0.03158, audio_tagging_loss=0.01128, over 3041890.47 frames. ], batch size: 53, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:22:09,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=372960.0, ans=0.1 2023-11-18 19:22:36,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=373160.0, ans=0.0 2023-11-18 19:22:38,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=373160.0, ans=0.0 2023-11-18 19:22:39,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=373160.0, ans=0.125 2023-11-18 19:22:43,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=373160.0, ans=0.0 2023-11-18 19:22:58,338 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7900, loss[loss=0.09726, simple_loss=0.1135, pruned_loss=0.0308, audio_tagging_loss=0.009697, over 14287.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1171, pruned_loss=0.03185, audio_tagging_loss=0.01138, over 3042704.17 frames. ], batch size: 54, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:23:00,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=373293.3333333333, ans=0.125 2023-11-18 19:23:03,937 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-56000.pt 2023-11-18 19:23:08,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=373293.3333333333, ans=10.0 2023-11-18 19:23:17,732 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.46 vs. limit=22.5 2023-11-18 19:23:21,301 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.75 vs. limit=10.0 2023-11-18 19:23:26,286 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.15 vs. limit=15.0 2023-11-18 19:23:44,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=373560.0, ans=0.2 2023-11-18 19:23:46,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=373560.0, ans=0.025 2023-11-18 19:23:53,283 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.249e+01 9.204e+01 9.997e+01 1.093e+02 1.252e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-18 19:23:55,400 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 7950, loss[loss=0.06578, simple_loss=0.06602, pruned_loss=0.01619, audio_tagging_loss=0.01659, over 14279.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1164, pruned_loss=0.03171, audio_tagging_loss=0.01162, over 3045087.58 frames. ], batch size: 54, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:23:55,938 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.49 vs. limit=15.0 2023-11-18 19:23:57,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=373626.6666666667, ans=0.95 2023-11-18 19:24:07,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=373693.3333333333, ans=0.2 2023-11-18 19:24:08,689 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:24:10,242 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.75 vs. limit=10.0 2023-11-18 19:24:23,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=373760.0, ans=0.1 2023-11-18 19:24:27,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=373760.0, ans=0.0 2023-11-18 19:24:30,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=373826.6666666667, ans=0.125 2023-11-18 19:24:38,458 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=15.0 2023-11-18 19:24:46,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=373893.3333333333, ans=0.0 2023-11-18 19:24:47,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=373893.3333333333, ans=0.2 2023-11-18 19:24:51,738 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8000, loss[loss=0.1259, simple_loss=0.1474, pruned_loss=0.0419, audio_tagging_loss=0.01033, over 15205.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1152, pruned_loss=0.03136, audio_tagging_loss=0.01172, over 3039520.60 frames. ], batch size: 54, lr: 1.33e-02, grad_scale: 32.0 2023-11-18 19:25:00,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=373960.0, ans=0.125 2023-11-18 19:25:04,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=374026.6666666667, ans=0.0 2023-11-18 19:25:21,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=374093.3333333333, ans=0.0 2023-11-18 19:25:41,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=374226.6666666667, ans=0.0 2023-11-18 19:25:46,711 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.072e+01 8.997e+01 9.797e+01 1.056e+02 1.371e+02, threshold=1.959e+02, percent-clipped=0.0 2023-11-18 19:25:46,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=374293.3333333333, ans=0.125 2023-11-18 19:25:47,819 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8050, loss[loss=0.09586, simple_loss=0.1186, pruned_loss=0.02679, audio_tagging_loss=0.009765, over 14368.00 frames. ], tot_loss[loss=0.1011, simple_loss=0.1155, pruned_loss=0.03159, audio_tagging_loss=0.01181, over 3035058.79 frames. ], batch size: 54, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:25:58,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=374360.0, ans=0.125 2023-11-18 19:26:01,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=374360.0, ans=0.0 2023-11-18 19:26:04,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=374360.0, ans=0.0 2023-11-18 19:26:11,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=374426.6666666667, ans=0.125 2023-11-18 19:26:38,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.24 vs. limit=15.0 2023-11-18 19:26:42,904 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8100, loss[loss=0.09695, simple_loss=0.1198, pruned_loss=0.0241, audio_tagging_loss=0.01293, over 14690.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1159, pruned_loss=0.03182, audio_tagging_loss=0.01168, over 3039150.96 frames. ], batch size: 56, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:26:52,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=374626.6666666667, ans=0.125 2023-11-18 19:27:01,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=374693.3333333333, ans=0.05 2023-11-18 19:27:07,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=374760.0, ans=0.125 2023-11-18 19:27:09,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=374760.0, ans=0.125 2023-11-18 19:27:15,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=374760.0, ans=0.125 2023-11-18 19:27:20,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=374826.6666666667, ans=0.125 2023-11-18 19:27:38,042 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.231e+01 9.480e+01 1.051e+02 1.132e+02 1.844e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-18 19:27:39,094 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8150, loss[loss=0.09337, simple_loss=0.1087, pruned_loss=0.02823, audio_tagging_loss=0.01079, over 14574.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1165, pruned_loss=0.03206, audio_tagging_loss=0.01155, over 3039992.40 frames. ], batch size: 55, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:27:44,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=374960.0, ans=0.0 2023-11-18 19:27:58,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=375026.6666666667, ans=0.0 2023-11-18 19:27:58,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=375026.6666666667, ans=0.2 2023-11-18 19:28:05,117 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2023-11-18 19:28:22,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=375226.6666666667, ans=0.125 2023-11-18 19:28:28,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=375226.6666666667, ans=0.125 2023-11-18 19:28:34,123 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:28:35,155 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8200, loss[loss=0.08735, simple_loss=0.109, pruned_loss=0.02515, audio_tagging_loss=0.007709, over 14416.00 frames. ], tot_loss[loss=0.1022, simple_loss=0.1173, pruned_loss=0.03215, audio_tagging_loss=0.01137, over 3038460.47 frames. ], batch size: 57, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:28:37,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=375293.3333333333, ans=0.0 2023-11-18 19:28:38,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375293.3333333333, ans=0.1 2023-11-18 19:28:53,687 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.31 vs. limit=15.0 2023-11-18 19:29:06,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=375426.6666666667, ans=0.125 2023-11-18 19:29:25,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.38 vs. limit=15.0 2023-11-18 19:29:25,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=375560.0, ans=0.125 2023-11-18 19:29:28,969 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 9.797e+01 1.057e+02 1.238e+02 1.453e+02, threshold=2.115e+02, percent-clipped=0.0 2023-11-18 19:29:30,059 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8250, loss[loss=0.07531, simple_loss=0.09155, pruned_loss=0.02018, audio_tagging_loss=0.009361, over 16373.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1164, pruned_loss=0.03176, audio_tagging_loss=0.01127, over 3038534.95 frames. ], batch size: 63, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:29:38,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=375626.6666666667, ans=0.1 2023-11-18 19:29:43,331 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.99 vs. limit=10.0 2023-11-18 19:29:52,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=375760.0, ans=0.0 2023-11-18 19:30:07,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=375826.6666666667, ans=0.1 2023-11-18 19:30:10,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=375826.6666666667, ans=0.125 2023-11-18 19:30:11,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=375826.6666666667, ans=0.2 2023-11-18 19:30:25,320 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8300, loss[loss=0.1112, simple_loss=0.1287, pruned_loss=0.03541, audio_tagging_loss=0.01149, over 16292.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1172, pruned_loss=0.03205, audio_tagging_loss=0.01127, over 3039804.53 frames. ], batch size: 59, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:30:28,967 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.50 vs. limit=15.0 2023-11-18 19:30:41,246 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.33 vs. limit=15.0 2023-11-18 19:30:49,667 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.58 vs. limit=15.0 2023-11-18 19:31:07,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=376160.0, ans=0.125 2023-11-18 19:31:16,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=376226.6666666667, ans=0.125 2023-11-18 19:31:19,572 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.771e+01 9.264e+01 1.007e+02 1.092e+02 1.530e+02, threshold=2.015e+02, percent-clipped=0.0 2023-11-18 19:31:21,240 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8350, loss[loss=0.08176, simple_loss=0.08767, pruned_loss=0.02698, audio_tagging_loss=0.01094, over 13827.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1165, pruned_loss=0.03192, audio_tagging_loss=0.01118, over 3048136.70 frames. ], batch size: 52, lr: 1.33e-02, grad_scale: 16.0 2023-11-18 19:31:24,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=376293.3333333333, ans=0.1 2023-11-18 19:31:36,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=376360.0, ans=0.0 2023-11-18 19:31:46,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=376426.6666666667, ans=0.2 2023-11-18 19:31:51,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=376426.6666666667, ans=0.0 2023-11-18 19:31:51,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=376426.6666666667, ans=0.2 2023-11-18 19:31:57,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=376493.3333333333, ans=0.125 2023-11-18 19:32:13,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.38 vs. limit=12.0 2023-11-18 19:32:14,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=376560.0, ans=10.0 2023-11-18 19:32:16,818 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8400, loss[loss=0.1181, simple_loss=0.1311, pruned_loss=0.03965, audio_tagging_loss=0.01287, over 16292.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1159, pruned_loss=0.03178, audio_tagging_loss=0.01116, over 3044357.15 frames. ], batch size: 60, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:32:18,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=376626.6666666667, ans=0.0 2023-11-18 19:32:20,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2023-11-18 19:32:27,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=376693.3333333333, ans=0.1 2023-11-18 19:32:41,543 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=376760.0, ans=0.125 2023-11-18 19:33:02,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=376893.3333333333, ans=0.2 2023-11-18 19:33:11,381 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.828e+01 9.983e+01 1.109e+02 1.398e+02, threshold=1.997e+02, percent-clipped=0.0 2023-11-18 19:33:13,015 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8450, loss[loss=0.1236, simple_loss=0.1404, pruned_loss=0.04095, audio_tagging_loss=0.01243, over 15176.00 frames. ], tot_loss[loss=0.1011, simple_loss=0.1159, pruned_loss=0.03186, audio_tagging_loss=0.01126, over 3051525.07 frames. ], batch size: 57, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:33:28,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=377026.6666666667, ans=0.125 2023-11-18 19:33:33,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=377093.3333333333, ans=0.125 2023-11-18 19:33:36,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=377093.3333333333, ans=0.1 2023-11-18 19:33:38,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=377093.3333333333, ans=0.2 2023-11-18 19:33:48,409 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.13 vs. limit=22.5 2023-11-18 19:33:57,096 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.36 vs. limit=15.0 2023-11-18 19:34:07,975 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8500, loss[loss=0.08717, simple_loss=0.09271, pruned_loss=0.02515, audio_tagging_loss=0.01567, over 16568.00 frames. ], tot_loss[loss=0.1011, simple_loss=0.1157, pruned_loss=0.032, audio_tagging_loss=0.0113, over 3052280.34 frames. ], batch size: 64, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:34:13,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-18 19:34:16,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=377293.3333333333, ans=0.125 2023-11-18 19:34:16,551 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.76 vs. limit=22.5 2023-11-18 19:34:25,867 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.56 vs. limit=15.0 2023-11-18 19:34:41,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=377493.3333333333, ans=0.125 2023-11-18 19:34:46,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=377493.3333333333, ans=0.125 2023-11-18 19:34:52,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=377560.0, ans=0.125 2023-11-18 19:35:03,281 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.332e+01 8.638e+01 9.723e+01 1.079e+02 1.527e+02, threshold=1.945e+02, percent-clipped=0.0 2023-11-18 19:35:04,375 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8550, loss[loss=0.06565, simple_loss=0.06909, pruned_loss=0.01865, audio_tagging_loss=0.01245, over 14971.00 frames. ], tot_loss[loss=0.1011, simple_loss=0.1157, pruned_loss=0.03193, audio_tagging_loss=0.01128, over 3051796.67 frames. ], batch size: 57, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:35:22,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=377693.3333333333, ans=0.125 2023-11-18 19:35:26,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=377760.0, ans=0.125 2023-11-18 19:35:26,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=377760.0, ans=0.04949747468305833 2023-11-18 19:36:00,030 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8600, loss[loss=0.09867, simple_loss=0.108, pruned_loss=0.03105, audio_tagging_loss=0.01359, over 15455.00 frames. ], tot_loss[loss=0.1002, simple_loss=0.1148, pruned_loss=0.03147, audio_tagging_loss=0.0114, over 3059710.58 frames. ], batch size: 58, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:36:05,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=377960.0, ans=0.1 2023-11-18 19:36:25,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=378093.3333333333, ans=0.125 2023-11-18 19:36:54,546 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.915e+01 9.697e+01 1.106e+02 1.523e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-18 19:36:54,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=378293.3333333333, ans=0.125 2023-11-18 19:36:55,646 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8650, loss[loss=0.08988, simple_loss=0.09255, pruned_loss=0.02725, audio_tagging_loss=0.01635, over 13738.00 frames. ], tot_loss[loss=0.101, simple_loss=0.1154, pruned_loss=0.03169, audio_tagging_loss=0.01154, over 3053023.44 frames. ], batch size: 54, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:37:25,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=378426.6666666667, ans=0.2 2023-11-18 19:37:51,204 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8700, loss[loss=0.1485, simple_loss=0.1634, pruned_loss=0.05702, audio_tagging_loss=0.009741, over 16039.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1169, pruned_loss=0.03214, audio_tagging_loss=0.01151, over 3049410.78 frames. ], batch size: 56, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:38:01,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_abs, batch_count=378693.3333333333, ans=0.5 2023-11-18 19:38:03,275 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=12.0 2023-11-18 19:38:27,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=378826.6666666667, ans=0.125 2023-11-18 19:38:35,528 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.29 vs. limit=15.0 2023-11-18 19:38:36,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=378893.3333333333, ans=0.125 2023-11-18 19:38:43,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=378893.3333333333, ans=0.0 2023-11-18 19:38:46,595 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.813e+01 9.292e+01 1.037e+02 1.144e+02 1.707e+02, threshold=2.074e+02, percent-clipped=0.0 2023-11-18 19:38:47,729 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8750, loss[loss=0.07691, simple_loss=0.08929, pruned_loss=0.01936, audio_tagging_loss=0.0129, over 14121.00 frames. ], tot_loss[loss=0.1023, simple_loss=0.1172, pruned_loss=0.03216, audio_tagging_loss=0.01156, over 3045050.17 frames. ], batch size: 54, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:39:13,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=379093.3333333333, ans=0.0 2023-11-18 19:39:24,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=379160.0, ans=0.125 2023-11-18 19:39:39,423 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.52 vs. limit=15.0 2023-11-18 19:39:43,197 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8800, loss[loss=0.09069, simple_loss=0.1092, pruned_loss=0.02529, audio_tagging_loss=0.01081, over 15552.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1179, pruned_loss=0.03214, audio_tagging_loss=0.01157, over 3043522.23 frames. ], batch size: 59, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:39:50,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=379293.3333333333, ans=0.1 2023-11-18 19:40:00,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=19.88 vs. limit=22.5 2023-11-18 19:40:05,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=379426.6666666667, ans=0.125 2023-11-18 19:40:19,191 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=379493.3333333333, ans=0.0 2023-11-18 19:40:24,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=379493.3333333333, ans=0.125 2023-11-18 19:40:37,500 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.493e+01 9.218e+01 1.050e+02 1.133e+02 1.971e+02, threshold=2.101e+02, percent-clipped=0.0 2023-11-18 19:40:37,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=379626.6666666667, ans=0.2 2023-11-18 19:40:38,553 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8850, loss[loss=0.1308, simple_loss=0.1531, pruned_loss=0.04599, audio_tagging_loss=0.008283, over 15472.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1169, pruned_loss=0.03181, audio_tagging_loss=0.01158, over 3049624.81 frames. ], batch size: 57, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:40:44,351 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.11 vs. limit=15.0 2023-11-18 19:40:47,037 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:41:03,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=379760.0, ans=0.125 2023-11-18 19:41:07,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_abs, batch_count=379760.0, ans=0.5 2023-11-18 19:41:15,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=379826.6666666667, ans=0.125 2023-11-18 19:41:16,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=379826.6666666667, ans=0.125 2023-11-18 19:41:17,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=379826.6666666667, ans=0.125 2023-11-18 19:41:28,309 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.95 vs. limit=10.0 2023-11-18 19:41:33,623 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8900, loss[loss=0.06803, simple_loss=0.0758, pruned_loss=0.01811, audio_tagging_loss=0.01202, over 14854.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.118, pruned_loss=0.03239, audio_tagging_loss=0.01139, over 3054653.49 frames. ], batch size: 58, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:41:40,726 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.82 vs. limit=15.0 2023-11-18 19:41:46,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=380026.6666666667, ans=0.125 2023-11-18 19:41:47,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=380026.6666666667, ans=0.125 2023-11-18 19:42:08,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=380160.0, ans=0.125 2023-11-18 19:42:09,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=380160.0, ans=0.125 2023-11-18 19:42:10,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=380160.0, ans=0.0 2023-11-18 19:42:11,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=380160.0, ans=0.125 2023-11-18 19:42:22,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=380226.6666666667, ans=0.0 2023-11-18 19:42:28,691 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.416e+01 9.192e+01 1.013e+02 1.118e+02 1.605e+02, threshold=2.026e+02, percent-clipped=0.0 2023-11-18 19:42:29,778 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 8950, loss[loss=0.09149, simple_loss=0.1042, pruned_loss=0.02923, audio_tagging_loss=0.01016, over 15027.00 frames. ], tot_loss[loss=0.1027, simple_loss=0.1179, pruned_loss=0.03252, audio_tagging_loss=0.0112, over 3057625.93 frames. ], batch size: 57, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:42:40,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.62 vs. limit=22.5 2023-11-18 19:43:02,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=380493.3333333333, ans=0.1 2023-11-18 19:43:21,012 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.72 vs. limit=10.0 2023-11-18 19:43:25,558 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9000, loss[loss=0.1012, simple_loss=0.119, pruned_loss=0.03385, audio_tagging_loss=0.007839, over 15543.00 frames. ], tot_loss[loss=0.102, simple_loss=0.1172, pruned_loss=0.0323, audio_tagging_loss=0.01109, over 3053483.14 frames. ], batch size: 56, lr: 1.32e-02, grad_scale: 16.0 2023-11-18 19:43:25,560 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 19:43:41,128 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3195, 2.8553, 4.7046, 2.8257], device='cuda:0') 2023-11-18 19:43:58,401 INFO [train_asr.py:1147] (0/4) Epoch 5, validation: loss=0.07332, simple_loss=0.06001, pruned_loss=0.008857, audio_tagging_loss=0.03446, over 4681554.00 frames. 2023-11-18 19:43:58,402 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 19:44:02,906 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.03 vs. limit=22.5 2023-11-18 19:44:09,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=380693.3333333333, ans=0.1 2023-11-18 19:44:46,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=380893.3333333333, ans=0.0 2023-11-18 19:44:54,106 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 9.240e+01 1.024e+02 1.108e+02 1.437e+02, threshold=2.047e+02, percent-clipped=0.0 2023-11-18 19:44:54,142 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9050, loss[loss=0.08742, simple_loss=0.1051, pruned_loss=0.02495, audio_tagging_loss=0.009942, over 15130.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1172, pruned_loss=0.03208, audio_tagging_loss=0.01106, over 3057669.37 frames. ], batch size: 58, lr: 1.32e-02, grad_scale: 16.0 2023-11-18 19:45:03,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=380960.0, ans=0.125 2023-11-18 19:45:36,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=381160.0, ans=0.0 2023-11-18 19:45:37,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=381226.6666666667, ans=0.0 2023-11-18 19:45:39,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=381226.6666666667, ans=0.1 2023-11-18 19:45:39,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=381226.6666666667, ans=0.95 2023-11-18 19:45:41,558 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.90 vs. limit=22.5 2023-11-18 19:45:47,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.whiten.whitening_limit, batch_count=381226.6666666667, ans=12.0 2023-11-18 19:45:49,530 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9100, loss[loss=0.1009, simple_loss=0.1189, pruned_loss=0.03159, audio_tagging_loss=0.009898, over 15995.00 frames. ], tot_loss[loss=0.1011, simple_loss=0.1166, pruned_loss=0.03183, audio_tagging_loss=0.01095, over 3059066.44 frames. ], batch size: 58, lr: 1.32e-02, grad_scale: 16.0 2023-11-18 19:45:58,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=381293.3333333333, ans=0.125 2023-11-18 19:46:45,536 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.962e+01 1.000e+02 1.098e+02 1.318e+02, threshold=2.000e+02, percent-clipped=0.0 2023-11-18 19:46:45,564 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9150, loss[loss=0.1021, simple_loss=0.1127, pruned_loss=0.03299, audio_tagging_loss=0.01281, over 15436.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1175, pruned_loss=0.0322, audio_tagging_loss=0.011, over 3062795.27 frames. ], batch size: 59, lr: 1.32e-02, grad_scale: 16.0 2023-11-18 19:47:00,412 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.05 vs. limit=6.0 2023-11-18 19:47:20,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=381826.6666666667, ans=0.0 2023-11-18 19:47:24,181 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.03 vs. limit=15.0 2023-11-18 19:47:42,476 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9200, loss[loss=0.1126, simple_loss=0.1323, pruned_loss=0.03387, audio_tagging_loss=0.01256, over 15246.00 frames. ], tot_loss[loss=0.1018, simple_loss=0.1173, pruned_loss=0.03209, audio_tagging_loss=0.01106, over 3059284.95 frames. ], batch size: 56, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:47:43,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=381960.0, ans=0.125 2023-11-18 19:47:47,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=381960.0, ans=0.125 2023-11-18 19:47:47,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=381960.0, ans=0.125 2023-11-18 19:47:48,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=381960.0, ans=0.2 2023-11-18 19:47:52,969 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.23 vs. limit=15.0 2023-11-18 19:48:03,600 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.77 vs. limit=15.0 2023-11-18 19:48:15,343 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.42 vs. limit=15.0 2023-11-18 19:48:27,231 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=10.22 vs. limit=22.5 2023-11-18 19:48:37,890 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 9.282e+01 1.040e+02 1.122e+02 1.499e+02, threshold=2.080e+02, percent-clipped=0.0 2023-11-18 19:48:37,918 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9250, loss[loss=0.1098, simple_loss=0.1284, pruned_loss=0.03636, audio_tagging_loss=0.009208, over 15027.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1169, pruned_loss=0.03189, audio_tagging_loss=0.01107, over 3064163.87 frames. ], batch size: 54, lr: 1.32e-02, grad_scale: 32.0 2023-11-18 19:48:44,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=382293.3333333333, ans=0.07 2023-11-18 19:48:54,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=382360.0, ans=0.125 2023-11-18 19:49:20,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.91 vs. limit=15.0 2023-11-18 19:49:30,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=382560.0, ans=0.1 2023-11-18 19:49:33,082 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9300, loss[loss=0.07932, simple_loss=0.07979, pruned_loss=0.02304, audio_tagging_loss=0.01638, over 14769.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1165, pruned_loss=0.03159, audio_tagging_loss=0.01112, over 3059880.89 frames. ], batch size: 58, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:49:59,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=382760.0, ans=0.125 2023-11-18 19:50:00,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=382760.0, ans=0.0 2023-11-18 19:50:01,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=382760.0, ans=0.0 2023-11-18 19:50:03,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=382760.0, ans=0.05 2023-11-18 19:50:12,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=382826.6666666667, ans=0.2 2023-11-18 19:50:12,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=382826.6666666667, ans=0.125 2023-11-18 19:50:14,493 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.84 vs. limit=6.0 2023-11-18 19:50:23,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-11-18 19:50:29,717 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.203e+01 9.054e+01 9.801e+01 1.113e+02 1.567e+02, threshold=1.960e+02, percent-clipped=0.0 2023-11-18 19:50:29,749 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9350, loss[loss=0.1003, simple_loss=0.1234, pruned_loss=0.031, audio_tagging_loss=0.007617, over 15452.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1169, pruned_loss=0.03165, audio_tagging_loss=0.01114, over 3058752.06 frames. ], batch size: 55, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:50:32,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=382960.0, ans=0.09899494936611666 2023-11-18 19:50:54,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=383093.3333333333, ans=0.0 2023-11-18 19:51:06,995 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:51:09,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=383160.0, ans=0.0 2023-11-18 19:51:24,476 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:51:25,404 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9400, loss[loss=0.08711, simple_loss=0.08635, pruned_loss=0.032, audio_tagging_loss=0.01194, over 15828.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1168, pruned_loss=0.03177, audio_tagging_loss=0.01132, over 3054517.83 frames. ], batch size: 61, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:51:31,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=383293.3333333333, ans=0.125 2023-11-18 19:51:34,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=383293.3333333333, ans=0.0 2023-11-18 19:51:44,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=383360.0, ans=0.0 2023-11-18 19:51:58,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=383493.3333333333, ans=0.125 2023-11-18 19:52:09,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=383560.0, ans=0.125 2023-11-18 19:52:17,526 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:52:20,607 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.578e+01 8.864e+01 9.867e+01 1.096e+02 1.502e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 19:52:20,636 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9450, loss[loss=0.1041, simple_loss=0.1145, pruned_loss=0.03207, audio_tagging_loss=0.01472, over 15829.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1169, pruned_loss=0.03179, audio_tagging_loss=0.01148, over 3055075.08 frames. ], batch size: 59, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:52:23,418 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.34 vs. limit=15.0 2023-11-18 19:52:49,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=383760.0, ans=0.125 2023-11-18 19:53:12,942 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.93 vs. limit=15.0 2023-11-18 19:53:16,799 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9500, loss[loss=0.1041, simple_loss=0.1195, pruned_loss=0.03357, audio_tagging_loss=0.01082, over 15308.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1169, pruned_loss=0.03207, audio_tagging_loss=0.01155, over 3050464.66 frames. ], batch size: 57, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:53:28,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.61 vs. limit=5.0 2023-11-18 19:53:33,846 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2023-11-18 19:53:38,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=384093.3333333333, ans=0.2 2023-11-18 19:53:51,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=384160.0, ans=0.125 2023-11-18 19:53:53,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=384160.0, ans=0.1 2023-11-18 19:53:55,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=384160.0, ans=0.125 2023-11-18 19:54:13,472 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 9.208e+01 1.015e+02 1.091e+02 1.477e+02, threshold=2.029e+02, percent-clipped=0.0 2023-11-18 19:54:13,498 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9550, loss[loss=0.08162, simple_loss=0.09186, pruned_loss=0.02284, audio_tagging_loss=0.01285, over 15340.00 frames. ], tot_loss[loss=0.1019, simple_loss=0.1168, pruned_loss=0.03189, audio_tagging_loss=0.01165, over 3049415.12 frames. ], batch size: 59, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:54:20,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=384293.3333333333, ans=0.1 2023-11-18 19:54:46,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=384493.3333333333, ans=0.1 2023-11-18 19:55:03,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=384560.0, ans=0.1 2023-11-18 19:55:06,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=384560.0, ans=0.125 2023-11-18 19:55:08,353 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9600, loss[loss=0.1007, simple_loss=0.1185, pruned_loss=0.03086, audio_tagging_loss=0.01053, over 14084.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1168, pruned_loss=0.03197, audio_tagging_loss=0.0117, over 3042451.68 frames. ], batch size: 54, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:55:14,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=384626.6666666667, ans=0.2 2023-11-18 19:55:54,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=384893.3333333333, ans=0.025 2023-11-18 19:56:00,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=384893.3333333333, ans=0.125 2023-11-18 19:56:04,728 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9650, loss[loss=0.121, simple_loss=0.1459, pruned_loss=0.04069, audio_tagging_loss=0.007343, over 16949.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1162, pruned_loss=0.03159, audio_tagging_loss=0.01157, over 3042637.43 frames. ], batch size: 62, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:56:05,739 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.450e+01 8.741e+01 9.505e+01 1.064e+02 1.391e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-18 19:56:15,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=385026.6666666667, ans=0.125 2023-11-18 19:56:21,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=385026.6666666667, ans=0.125 2023-11-18 19:56:25,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=385026.6666666667, ans=0.2 2023-11-18 19:56:27,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=385093.3333333333, ans=0.05 2023-11-18 19:57:00,515 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9700, loss[loss=0.1046, simple_loss=0.1204, pruned_loss=0.03469, audio_tagging_loss=0.009762, over 16622.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1155, pruned_loss=0.03152, audio_tagging_loss=0.01139, over 3047994.72 frames. ], batch size: 62, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:57:20,607 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 19:57:37,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=385493.3333333333, ans=0.1 2023-11-18 19:57:45,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.62 vs. limit=10.0 2023-11-18 19:57:45,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=385560.0, ans=0.1 2023-11-18 19:57:56,529 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9750, loss[loss=0.09273, simple_loss=0.1047, pruned_loss=0.02826, audio_tagging_loss=0.0121, over 14323.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1162, pruned_loss=0.03159, audio_tagging_loss=0.01117, over 3046698.31 frames. ], batch size: 53, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:57:57,521 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.008e+01 9.015e+01 1.026e+02 1.125e+02 1.667e+02, threshold=2.051e+02, percent-clipped=0.0 2023-11-18 19:57:58,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=385626.6666666667, ans=0.0 2023-11-18 19:58:02,468 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.98 vs. limit=15.0 2023-11-18 19:58:04,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=385626.6666666667, ans=0.125 2023-11-18 19:58:16,272 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2023-11-18 19:58:16,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.59 vs. limit=12.0 2023-11-18 19:58:17,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=9.37 vs. limit=15.0 2023-11-18 19:58:26,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.07 vs. limit=10.0 2023-11-18 19:58:27,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.33 vs. limit=15.0 2023-11-18 19:58:30,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=385826.6666666667, ans=0.125 2023-11-18 19:58:52,969 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9800, loss[loss=0.09498, simple_loss=0.1033, pruned_loss=0.02665, audio_tagging_loss=0.01671, over 15557.00 frames. ], tot_loss[loss=0.1013, simple_loss=0.1167, pruned_loss=0.03177, audio_tagging_loss=0.01115, over 3045717.57 frames. ], batch size: 60, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:58:56,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=11.47 vs. limit=15.0 2023-11-18 19:58:59,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=385960.0, ans=0.0 2023-11-18 19:59:01,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=385960.0, ans=0.1 2023-11-18 19:59:13,278 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.83 vs. limit=22.5 2023-11-18 19:59:20,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=386093.3333333333, ans=0.125 2023-11-18 19:59:22,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=386093.3333333333, ans=0.1 2023-11-18 19:59:40,934 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 19:59:48,946 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9850, loss[loss=0.09896, simple_loss=0.1061, pruned_loss=0.03356, audio_tagging_loss=0.01234, over 14379.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.118, pruned_loss=0.032, audio_tagging_loss=0.0111, over 3040796.19 frames. ], batch size: 55, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 19:59:49,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=386293.3333333333, ans=0.125 2023-11-18 19:59:49,998 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.077e+01 9.044e+01 9.858e+01 1.082e+02 1.412e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-18 19:59:53,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=386293.3333333333, ans=0.0 2023-11-18 20:00:44,509 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9900, loss[loss=0.09123, simple_loss=0.1021, pruned_loss=0.0307, audio_tagging_loss=0.00946, over 15181.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1173, pruned_loss=0.03175, audio_tagging_loss=0.0111, over 3043687.84 frames. ], batch size: 59, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:00:51,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=386626.6666666667, ans=0.125 2023-11-18 20:01:10,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=386760.0, ans=0.125 2023-11-18 20:01:13,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=386760.0, ans=0.125 2023-11-18 20:01:19,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=386826.6666666667, ans=0.125 2023-11-18 20:01:20,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.47 vs. limit=6.0 2023-11-18 20:01:24,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=386826.6666666667, ans=0.125 2023-11-18 20:01:38,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=386893.3333333333, ans=0.125 2023-11-18 20:01:41,656 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 9950, loss[loss=0.09482, simple_loss=0.1047, pruned_loss=0.02582, audio_tagging_loss=0.01664, over 15713.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1162, pruned_loss=0.03139, audio_tagging_loss=0.01124, over 3045177.89 frames. ], batch size: 58, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:01:42,674 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.684e+01 9.823e+01 1.146e+02 1.516e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-18 20:01:46,334 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.28 vs. limit=22.5 2023-11-18 20:01:51,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=387026.6666666667, ans=0.0 2023-11-18 20:01:57,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=387026.6666666667, ans=0.0 2023-11-18 20:02:00,277 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=387026.6666666667, ans=0.025 2023-11-18 20:02:05,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=387093.3333333333, ans=0.125 2023-11-18 20:02:07,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=387093.3333333333, ans=0.2 2023-11-18 20:02:25,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=387226.6666666667, ans=0.1 2023-11-18 20:02:27,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=387226.6666666667, ans=0.1 2023-11-18 20:02:36,726 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10000, loss[loss=0.1514, simple_loss=0.1814, pruned_loss=0.05053, audio_tagging_loss=0.01011, over 14627.00 frames. ], tot_loss[loss=0.09932, simple_loss=0.1146, pruned_loss=0.03066, audio_tagging_loss=0.01134, over 3042437.91 frames. ], batch size: 53, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:02:37,245 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2023-11-18 20:02:38,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=387293.3333333333, ans=0.2 2023-11-18 20:02:49,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=387360.0, ans=0.125 2023-11-18 20:02:55,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=387360.0, ans=0.125 2023-11-18 20:03:00,410 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:03:19,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=387493.3333333333, ans=0.0 2023-11-18 20:03:29,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=387560.0, ans=0.1 2023-11-18 20:03:32,506 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10050, loss[loss=0.08833, simple_loss=0.09029, pruned_loss=0.02918, audio_tagging_loss=0.014, over 14906.00 frames. ], tot_loss[loss=0.1001, simple_loss=0.1155, pruned_loss=0.03097, audio_tagging_loss=0.01134, over 3044695.15 frames. ], batch size: 56, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:03:33,534 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.559e+01 9.098e+01 9.898e+01 1.122e+02 1.719e+02, threshold=1.980e+02, percent-clipped=0.0 2023-11-18 20:03:36,464 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.12 vs. limit=22.5 2023-11-18 20:03:38,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=387626.6666666667, ans=0.0 2023-11-18 20:03:39,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=387626.6666666667, ans=0.1 2023-11-18 20:03:40,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=387626.6666666667, ans=0.125 2023-11-18 20:04:08,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=387826.6666666667, ans=0.04949747468305833 2023-11-18 20:04:08,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=387826.6666666667, ans=0.2 2023-11-18 20:04:13,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=387826.6666666667, ans=0.09899494936611666 2023-11-18 20:04:21,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=387893.3333333333, ans=0.125 2023-11-18 20:04:27,541 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=387960.0, ans=0.125 2023-11-18 20:04:28,974 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10100, loss[loss=0.08925, simple_loss=0.103, pruned_loss=0.02634, audio_tagging_loss=0.0114, over 15715.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.1158, pruned_loss=0.03111, audio_tagging_loss=0.01144, over 3038718.12 frames. ], batch size: 57, lr: 1.31e-02, grad_scale: 32.0 2023-11-18 20:04:39,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=388026.6666666667, ans=0.1 2023-11-18 20:05:00,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=388160.0, ans=0.125 2023-11-18 20:05:08,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=388160.0, ans=0.125 2023-11-18 20:05:08,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=388160.0, ans=0.0 2023-11-18 20:05:12,175 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:05:13,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=388226.6666666667, ans=0.0 2023-11-18 20:05:23,847 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10150, loss[loss=0.1144, simple_loss=0.1289, pruned_loss=0.03764, audio_tagging_loss=0.01225, over 15478.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1162, pruned_loss=0.03133, audio_tagging_loss=0.0115, over 3037569.73 frames. ], batch size: 57, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:05:24,856 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.804e+01 9.203e+01 1.000e+02 1.096e+02 2.259e+02, threshold=2.001e+02, percent-clipped=1.0 2023-11-18 20:05:33,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.45 vs. limit=22.5 2023-11-18 20:05:47,625 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:05:49,298 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.43 vs. limit=15.0 2023-11-18 20:05:50,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.80 vs. limit=15.0 2023-11-18 20:06:00,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=388493.3333333333, ans=0.2 2023-11-18 20:06:19,321 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10200, loss[loss=0.09789, simple_loss=0.1207, pruned_loss=0.02759, audio_tagging_loss=0.009968, over 14682.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1157, pruned_loss=0.03136, audio_tagging_loss=0.0115, over 3047218.66 frames. ], batch size: 54, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:06:20,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=388626.6666666667, ans=0.125 2023-11-18 20:06:38,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=388693.3333333333, ans=0.125 2023-11-18 20:06:40,051 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:06:48,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=388760.0, ans=0.95 2023-11-18 20:07:12,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=388893.3333333333, ans=0.125 2023-11-18 20:07:13,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=388960.0, ans=0.125 2023-11-18 20:07:14,926 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10250, loss[loss=0.111, simple_loss=0.1312, pruned_loss=0.03341, audio_tagging_loss=0.01192, over 14872.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.117, pruned_loss=0.03161, audio_tagging_loss=0.01159, over 3052081.73 frames. ], batch size: 58, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:07:15,955 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.430e+01 9.102e+01 9.857e+01 1.065e+02 1.324e+02, threshold=1.971e+02, percent-clipped=0.0 2023-11-18 20:07:20,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=388960.0, ans=0.125 2023-11-18 20:07:32,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=389026.6666666667, ans=0.1 2023-11-18 20:07:36,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=389093.3333333333, ans=0.125 2023-11-18 20:08:08,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=389226.6666666667, ans=0.0 2023-11-18 20:08:11,208 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10300, loss[loss=0.1158, simple_loss=0.1395, pruned_loss=0.03706, audio_tagging_loss=0.008968, over 14173.00 frames. ], tot_loss[loss=0.1011, simple_loss=0.1161, pruned_loss=0.03141, audio_tagging_loss=0.01161, over 3047756.55 frames. ], batch size: 55, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:08:17,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=389293.3333333333, ans=0.125 2023-11-18 20:08:26,296 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.17 vs. limit=12.0 2023-11-18 20:08:30,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=389360.0, ans=0.1 2023-11-18 20:08:46,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=389493.3333333333, ans=0.0 2023-11-18 20:08:47,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=389493.3333333333, ans=0.125 2023-11-18 20:09:07,686 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10350, loss[loss=0.07944, simple_loss=0.0869, pruned_loss=0.02371, audio_tagging_loss=0.01228, over 14508.00 frames. ], tot_loss[loss=0.1015, simple_loss=0.1165, pruned_loss=0.03164, audio_tagging_loss=0.01158, over 3052887.18 frames. ], batch size: 54, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:09:08,730 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.599e+01 9.314e+01 1.056e+02 1.157e+02 1.834e+02, threshold=2.113e+02, percent-clipped=0.0 2023-11-18 20:09:15,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=389626.6666666667, ans=0.035 2023-11-18 20:09:23,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=389693.3333333333, ans=0.0 2023-11-18 20:09:37,501 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.33 vs. limit=15.0 2023-11-18 20:09:51,341 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=389893.3333333333, ans=0.0 2023-11-18 20:10:02,906 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10400, loss[loss=0.0685, simple_loss=0.07212, pruned_loss=0.01889, audio_tagging_loss=0.01355, over 15557.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.115, pruned_loss=0.03135, audio_tagging_loss=0.0117, over 3047951.13 frames. ], batch size: 59, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:10:25,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=390093.3333333333, ans=0.125 2023-11-18 20:10:25,983 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.14 vs. limit=15.0 2023-11-18 20:10:37,674 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.60 vs. limit=15.0 2023-11-18 20:10:59,434 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10450, loss[loss=0.08691, simple_loss=0.08749, pruned_loss=0.02756, audio_tagging_loss=0.0156, over 14675.00 frames. ], tot_loss[loss=0.09893, simple_loss=0.113, pruned_loss=0.03071, audio_tagging_loss=0.0117, over 3051715.08 frames. ], batch size: 56, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:11:00,428 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.809e+01 9.608e+01 1.086e+02 1.646e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-18 20:11:18,148 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.10 vs. limit=15.0 2023-11-18 20:11:36,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=390493.3333333333, ans=0.2 2023-11-18 20:11:55,860 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10500, loss[loss=0.1374, simple_loss=0.1681, pruned_loss=0.04544, audio_tagging_loss=0.007891, over 16139.00 frames. ], tot_loss[loss=0.1002, simple_loss=0.1149, pruned_loss=0.0313, audio_tagging_loss=0.01151, over 3050448.66 frames. ], batch size: 54, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:12:22,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=390760.0, ans=0.125 2023-11-18 20:12:26,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=390760.0, ans=10.0 2023-11-18 20:12:49,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=390893.3333333333, ans=0.04949747468305833 2023-11-18 20:12:51,617 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10550, loss[loss=0.09755, simple_loss=0.1098, pruned_loss=0.03173, audio_tagging_loss=0.01091, over 15385.00 frames. ], tot_loss[loss=0.1002, simple_loss=0.1152, pruned_loss=0.03132, audio_tagging_loss=0.01126, over 3052699.63 frames. ], batch size: 59, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:12:52,611 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.618e+01 8.716e+01 9.677e+01 1.046e+02 1.546e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-18 20:12:55,725 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.43 vs. limit=15.0 2023-11-18 20:12:59,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=390960.0, ans=0.1 2023-11-18 20:13:00,705 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.85 vs. limit=15.0 2023-11-18 20:13:30,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=391160.0, ans=0.125 2023-11-18 20:13:30,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=391160.0, ans=0.1 2023-11-18 20:13:36,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.33 vs. limit=6.0 2023-11-18 20:13:45,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=391226.6666666667, ans=0.125 2023-11-18 20:13:47,310 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10600, loss[loss=0.07862, simple_loss=0.09345, pruned_loss=0.02345, audio_tagging_loss=0.008441, over 15502.00 frames. ], tot_loss[loss=0.0995, simple_loss=0.1145, pruned_loss=0.03107, audio_tagging_loss=0.01117, over 3049754.70 frames. ], batch size: 56, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:14:27,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=391493.3333333333, ans=0.125 2023-11-18 20:14:37,885 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.35 vs. limit=15.0 2023-11-18 20:14:43,654 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10650, loss[loss=0.1014, simple_loss=0.1224, pruned_loss=0.03167, audio_tagging_loss=0.008593, over 13880.00 frames. ], tot_loss[loss=0.1003, simple_loss=0.1155, pruned_loss=0.03133, audio_tagging_loss=0.01117, over 3056196.90 frames. ], batch size: 53, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:14:44,665 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.531e+01 9.141e+01 1.015e+02 1.176e+02 1.580e+02, threshold=2.030e+02, percent-clipped=0.0 2023-11-18 20:14:53,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=391693.3333333333, ans=0.125 2023-11-18 20:15:08,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=391760.0, ans=0.125 2023-11-18 20:15:17,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=391826.6666666667, ans=0.125 2023-11-18 20:15:28,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.09 vs. limit=15.0 2023-11-18 20:15:38,741 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10700, loss[loss=0.1136, simple_loss=0.1235, pruned_loss=0.0371, audio_tagging_loss=0.01474, over 15225.00 frames. ], tot_loss[loss=0.09966, simple_loss=0.1147, pruned_loss=0.0312, audio_tagging_loss=0.01111, over 3052540.41 frames. ], batch size: 54, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:15:50,219 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.88 vs. limit=15.0 2023-11-18 20:16:00,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=392093.3333333333, ans=0.125 2023-11-18 20:16:26,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=392226.6666666667, ans=0.125 2023-11-18 20:16:28,951 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:16:30,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.96 vs. limit=22.5 2023-11-18 20:16:31,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=392226.6666666667, ans=0.125 2023-11-18 20:16:35,681 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10750, loss[loss=0.1242, simple_loss=0.1457, pruned_loss=0.04129, audio_tagging_loss=0.01003, over 15899.00 frames. ], tot_loss[loss=0.1003, simple_loss=0.1155, pruned_loss=0.03144, audio_tagging_loss=0.01107, over 3051662.26 frames. ], batch size: 59, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:16:36,726 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.277e+01 9.086e+01 9.851e+01 1.129e+02 1.490e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-18 20:16:44,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=392293.3333333333, ans=0.1 2023-11-18 20:16:58,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=392426.6666666667, ans=0.1 2023-11-18 20:17:22,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=392560.0, ans=0.125 2023-11-18 20:17:31,498 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10800, loss[loss=0.07206, simple_loss=0.08156, pruned_loss=0.01953, audio_tagging_loss=0.01175, over 13844.00 frames. ], tot_loss[loss=0.0999, simple_loss=0.1152, pruned_loss=0.03119, audio_tagging_loss=0.01109, over 3048398.42 frames. ], batch size: 55, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:17:47,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=392693.3333333333, ans=0.125 2023-11-18 20:17:58,173 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.59 vs. limit=15.0 2023-11-18 20:18:06,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=392826.6666666667, ans=0.125 2023-11-18 20:18:15,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=392893.3333333333, ans=0.125 2023-11-18 20:18:25,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=392893.3333333333, ans=0.0 2023-11-18 20:18:27,575 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10850, loss[loss=0.09693, simple_loss=0.1111, pruned_loss=0.0325, audio_tagging_loss=0.008891, over 14039.00 frames. ], tot_loss[loss=0.09938, simple_loss=0.1146, pruned_loss=0.031, audio_tagging_loss=0.01107, over 3046946.94 frames. ], batch size: 53, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:18:28,583 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.407e+01 9.217e+01 1.010e+02 1.123e+02 1.956e+02, threshold=2.020e+02, percent-clipped=0.0 2023-11-18 20:18:28,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=392960.0, ans=0.125 2023-11-18 20:18:52,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=393093.3333333333, ans=0.2 2023-11-18 20:19:06,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=393160.0, ans=0.125 2023-11-18 20:19:06,718 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.68 vs. limit=15.0 2023-11-18 20:19:08,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=393160.0, ans=0.0 2023-11-18 20:19:12,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=393226.6666666667, ans=0.0 2023-11-18 20:19:13,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=393226.6666666667, ans=0.0 2023-11-18 20:19:16,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.min_positive, batch_count=393226.6666666667, ans=0.025 2023-11-18 20:19:19,164 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:19:24,009 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10900, loss[loss=0.1317, simple_loss=0.1631, pruned_loss=0.04155, audio_tagging_loss=0.00858, over 15415.00 frames. ], tot_loss[loss=0.0994, simple_loss=0.1147, pruned_loss=0.03083, audio_tagging_loss=0.0112, over 3044459.31 frames. ], batch size: 56, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:19:31,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.23 vs. limit=15.0 2023-11-18 20:19:40,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=393360.0, ans=0.125 2023-11-18 20:19:46,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=393426.6666666667, ans=0.125 2023-11-18 20:20:02,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=393493.3333333333, ans=0.2 2023-11-18 20:20:06,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=393493.3333333333, ans=0.0 2023-11-18 20:20:20,067 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 10950, loss[loss=0.0986, simple_loss=0.1186, pruned_loss=0.02933, audio_tagging_loss=0.00998, over 14571.00 frames. ], tot_loss[loss=0.1002, simple_loss=0.1157, pruned_loss=0.03117, audio_tagging_loss=0.01123, over 3044258.44 frames. ], batch size: 54, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:20:21,116 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.083e+01 9.174e+01 1.016e+02 1.114e+02 1.629e+02, threshold=2.031e+02, percent-clipped=0.0 2023-11-18 20:20:21,931 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.55 vs. limit=15.0 2023-11-18 20:20:23,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=393626.6666666667, ans=0.1 2023-11-18 20:20:25,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=393626.6666666667, ans=0.125 2023-11-18 20:20:29,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=393693.3333333333, ans=0.125 2023-11-18 20:20:44,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.64 vs. limit=22.5 2023-11-18 20:20:51,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=393760.0, ans=0.125 2023-11-18 20:21:11,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=393893.3333333333, ans=0.2 2023-11-18 20:21:13,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=393893.3333333333, ans=0.0 2023-11-18 20:21:15,287 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11000, loss[loss=0.08267, simple_loss=0.0946, pruned_loss=0.02057, audio_tagging_loss=0.0148, over 14661.00 frames. ], tot_loss[loss=0.1006, simple_loss=0.1163, pruned_loss=0.03119, audio_tagging_loss=0.01128, over 3043311.81 frames. ], batch size: 57, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:21:23,344 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:21:31,477 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.82 vs. limit=22.5 2023-11-18 20:22:12,140 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11050, loss[loss=0.1201, simple_loss=0.134, pruned_loss=0.04164, audio_tagging_loss=0.01144, over 15304.00 frames. ], tot_loss[loss=0.1001, simple_loss=0.1156, pruned_loss=0.03102, audio_tagging_loss=0.01131, over 3052768.49 frames. ], batch size: 56, lr: 1.30e-02, grad_scale: 32.0 2023-11-18 20:22:13,192 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.785e+01 9.478e+01 1.012e+02 1.085e+02 1.543e+02, threshold=2.025e+02, percent-clipped=0.0 2023-11-18 20:22:45,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=394493.3333333333, ans=0.125 2023-11-18 20:22:55,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.04 vs. limit=22.5 2023-11-18 20:23:02,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=394560.0, ans=0.0 2023-11-18 20:23:07,229 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11100, loss[loss=0.1108, simple_loss=0.1373, pruned_loss=0.03096, audio_tagging_loss=0.01117, over 15433.00 frames. ], tot_loss[loss=0.1016, simple_loss=0.1173, pruned_loss=0.03153, audio_tagging_loss=0.01143, over 3049654.88 frames. ], batch size: 56, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:23:38,682 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.493e-01 2023-11-18 20:23:42,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=394826.6666666667, ans=0.125 2023-11-18 20:23:51,816 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.30 vs. limit=22.5 2023-11-18 20:23:54,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=394893.3333333333, ans=0.125 2023-11-18 20:24:00,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=394893.3333333333, ans=0.125 2023-11-18 20:24:03,469 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11150, loss[loss=0.08258, simple_loss=0.08776, pruned_loss=0.02469, audio_tagging_loss=0.01402, over 14224.00 frames. ], tot_loss[loss=0.1021, simple_loss=0.1174, pruned_loss=0.03195, audio_tagging_loss=0.0115, over 3048115.26 frames. ], batch size: 54, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:24:04,470 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.464e+01 9.395e+01 1.022e+02 1.169e+02 1.423e+02, threshold=2.044e+02, percent-clipped=0.0 2023-11-18 20:24:08,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=394960.0, ans=0.125 2023-11-18 20:24:42,100 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.80 vs. limit=10.0 2023-11-18 20:24:50,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=395226.6666666667, ans=0.125 2023-11-18 20:24:51,655 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.02 vs. limit=15.0 2023-11-18 20:24:59,122 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11200, loss[loss=0.09899, simple_loss=0.1145, pruned_loss=0.02971, audio_tagging_loss=0.01201, over 14528.00 frames. ], tot_loss[loss=0.1012, simple_loss=0.1162, pruned_loss=0.03146, audio_tagging_loss=0.01167, over 3054546.26 frames. ], batch size: 56, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:25:30,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=395426.6666666667, ans=0.125 2023-11-18 20:25:31,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=395493.3333333333, ans=0.07 2023-11-18 20:25:48,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=395560.0, ans=0.2 2023-11-18 20:25:55,418 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11250, loss[loss=0.08011, simple_loss=0.08779, pruned_loss=0.02215, audio_tagging_loss=0.01406, over 15374.00 frames. ], tot_loss[loss=0.1006, simple_loss=0.1151, pruned_loss=0.03129, audio_tagging_loss=0.01174, over 3057930.72 frames. ], batch size: 59, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:25:56,448 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.620e+01 9.426e+01 1.024e+02 1.146e+02 1.822e+02, threshold=2.048e+02, percent-clipped=0.0 2023-11-18 20:26:06,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=395693.3333333333, ans=0.0 2023-11-18 20:26:14,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=395693.3333333333, ans=0.2 2023-11-18 20:26:29,752 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=9.38 vs. limit=12.0 2023-11-18 20:26:31,853 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.19 vs. limit=15.0 2023-11-18 20:26:32,026 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.40 vs. limit=15.0 2023-11-18 20:26:36,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=395826.6666666667, ans=0.125 2023-11-18 20:26:37,191 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2023-11-18 20:26:38,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=395893.3333333333, ans=0.0 2023-11-18 20:26:48,173 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.94 vs. limit=10.0 2023-11-18 20:26:50,738 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11300, loss[loss=0.145, simple_loss=0.1644, pruned_loss=0.05511, audio_tagging_loss=0.007653, over 15933.00 frames. ], tot_loss[loss=0.1003, simple_loss=0.1152, pruned_loss=0.03117, audio_tagging_loss=0.01157, over 3046409.09 frames. ], batch size: 55, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:26:58,997 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.18 vs. limit=6.0 2023-11-18 20:27:09,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=396026.6666666667, ans=0.0 2023-11-18 20:27:15,119 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.35 vs. limit=12.0 2023-11-18 20:27:33,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=396160.0, ans=0.95 2023-11-18 20:27:38,821 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.77 vs. limit=15.0 2023-11-18 20:27:45,764 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11350, loss[loss=0.1088, simple_loss=0.1402, pruned_loss=0.03091, audio_tagging_loss=0.00784, over 15079.00 frames. ], tot_loss[loss=0.101, simple_loss=0.1163, pruned_loss=0.03153, audio_tagging_loss=0.01127, over 3043898.70 frames. ], batch size: 57, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:27:46,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=396293.3333333333, ans=0.125 2023-11-18 20:27:46,823 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.780e+01 9.361e+01 1.045e+02 1.135e+02 1.699e+02, threshold=2.091e+02, percent-clipped=0.0 2023-11-18 20:27:50,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=396293.3333333333, ans=0.2 2023-11-18 20:28:19,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=396493.3333333333, ans=0.0 2023-11-18 20:28:22,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=396493.3333333333, ans=0.1 2023-11-18 20:28:24,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=396493.3333333333, ans=0.125 2023-11-18 20:28:25,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=396493.3333333333, ans=0.125 2023-11-18 20:28:28,920 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:28:32,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=396560.0, ans=0.0 2023-11-18 20:28:42,013 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11400, loss[loss=0.1013, simple_loss=0.111, pruned_loss=0.0342, audio_tagging_loss=0.01158, over 14717.00 frames. ], tot_loss[loss=0.1005, simple_loss=0.1161, pruned_loss=0.03127, audio_tagging_loss=0.01118, over 3041866.88 frames. ], batch size: 55, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:28:44,803 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.29 vs. limit=12.0 2023-11-18 20:28:46,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=396626.6666666667, ans=0.125 2023-11-18 20:28:53,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=396693.3333333333, ans=0.0 2023-11-18 20:29:05,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=396760.0, ans=0.1 2023-11-18 20:29:05,483 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.20 vs. limit=15.0 2023-11-18 20:29:20,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=396826.6666666667, ans=0.125 2023-11-18 20:29:22,044 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:29:29,898 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.92 vs. limit=15.0 2023-11-18 20:29:37,082 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11450, loss[loss=0.1029, simple_loss=0.1098, pruned_loss=0.0357, audio_tagging_loss=0.01229, over 14918.00 frames. ], tot_loss[loss=0.1004, simple_loss=0.1158, pruned_loss=0.03128, audio_tagging_loss=0.01122, over 3041831.70 frames. ], batch size: 57, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:29:38,112 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.532e+01 8.945e+01 1.000e+02 1.081e+02 1.401e+02, threshold=2.001e+02, percent-clipped=0.0 2023-11-18 20:30:03,411 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:30:17,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=397160.0, ans=0.1 2023-11-18 20:30:20,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=397160.0, ans=0.125 2023-11-18 20:30:24,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=397226.6666666667, ans=0.125 2023-11-18 20:30:27,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=397226.6666666667, ans=0.125 2023-11-18 20:30:32,399 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11500, loss[loss=0.118, simple_loss=0.1367, pruned_loss=0.04211, audio_tagging_loss=0.007516, over 14307.00 frames. ], tot_loss[loss=0.1002, simple_loss=0.1155, pruned_loss=0.03114, audio_tagging_loss=0.01126, over 3041079.91 frames. ], batch size: 55, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:31:01,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=397426.6666666667, ans=0.1 2023-11-18 20:31:28,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=397626.6666666667, ans=0.2 2023-11-18 20:31:29,290 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11550, loss[loss=0.08593, simple_loss=0.09495, pruned_loss=0.02711, audio_tagging_loss=0.01134, over 14853.00 frames. ], tot_loss[loss=0.1004, simple_loss=0.116, pruned_loss=0.03109, audio_tagging_loss=0.01128, over 3045332.64 frames. ], batch size: 57, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:31:30,302 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.929e+01 8.927e+01 9.792e+01 1.098e+02 1.308e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-18 20:31:45,126 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.12 vs. limit=12.0 2023-11-18 20:31:49,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=397693.3333333333, ans=0.0 2023-11-18 20:32:00,541 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 20:32:15,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=397893.3333333333, ans=0.0 2023-11-18 20:32:24,909 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11600, loss[loss=0.121, simple_loss=0.1367, pruned_loss=0.04201, audio_tagging_loss=0.01061, over 15834.00 frames. ], tot_loss[loss=0.09982, simple_loss=0.1153, pruned_loss=0.03085, audio_tagging_loss=0.01134, over 3041101.61 frames. ], batch size: 59, lr: 1.29e-02, grad_scale: 64.0 2023-11-18 20:33:01,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=398160.0, ans=0.125 2023-11-18 20:33:11,051 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.74 vs. limit=6.0 2023-11-18 20:33:20,117 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11650, loss[loss=0.1489, simple_loss=0.1687, pruned_loss=0.05639, audio_tagging_loss=0.008134, over 15131.00 frames. ], tot_loss[loss=0.1007, simple_loss=0.1165, pruned_loss=0.031, audio_tagging_loss=0.0114, over 3043016.44 frames. ], batch size: 56, lr: 1.29e-02, grad_scale: 64.0 2023-11-18 20:33:21,159 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.120e+01 8.987e+01 1.026e+02 1.150e+02 1.533e+02, threshold=2.053e+02, percent-clipped=0.0 2023-11-18 20:33:30,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=398293.3333333333, ans=0.0 2023-11-18 20:33:35,529 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.23 vs. limit=10.0 2023-11-18 20:33:47,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=398426.6666666667, ans=0.0 2023-11-18 20:33:56,687 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.71 vs. limit=10.0 2023-11-18 20:34:16,074 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11700, loss[loss=0.1045, simple_loss=0.1219, pruned_loss=0.02857, audio_tagging_loss=0.015, over 15997.00 frames. ], tot_loss[loss=0.0993, simple_loss=0.1146, pruned_loss=0.0305, audio_tagging_loss=0.01151, over 3035893.40 frames. ], batch size: 60, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:34:24,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=398626.6666666667, ans=0.0 2023-11-18 20:34:26,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=398693.3333333333, ans=0.0 2023-11-18 20:34:32,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=398693.3333333333, ans=0.125 2023-11-18 20:34:32,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=398693.3333333333, ans=0.2 2023-11-18 20:34:40,304 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.92 vs. limit=6.0 2023-11-18 20:34:43,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=398760.0, ans=0.2 2023-11-18 20:34:54,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=398826.6666666667, ans=0.125 2023-11-18 20:34:56,464 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.09 vs. limit=6.0 2023-11-18 20:35:00,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=398893.3333333333, ans=0.2 2023-11-18 20:35:06,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=398893.3333333333, ans=0.0 2023-11-18 20:35:11,093 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:35:12,934 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11750, loss[loss=0.1136, simple_loss=0.1381, pruned_loss=0.03351, audio_tagging_loss=0.01101, over 14853.00 frames. ], tot_loss[loss=0.0983, simple_loss=0.1131, pruned_loss=0.03009, audio_tagging_loss=0.01168, over 3035825.97 frames. ], batch size: 56, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:35:13,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=398960.0, ans=0.0 2023-11-18 20:35:14,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.23 vs. limit=15.0 2023-11-18 20:35:15,031 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.373e+01 8.870e+01 9.922e+01 1.106e+02 1.477e+02, threshold=1.984e+02, percent-clipped=0.0 2023-11-18 20:35:30,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=399026.6666666667, ans=0.125 2023-11-18 20:35:37,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=399093.3333333333, ans=0.1 2023-11-18 20:35:38,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=399093.3333333333, ans=0.2 2023-11-18 20:35:44,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=399093.3333333333, ans=0.05 2023-11-18 20:35:45,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=399160.0, ans=0.125 2023-11-18 20:36:00,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=399226.6666666667, ans=0.125 2023-11-18 20:36:08,089 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11800, loss[loss=0.06501, simple_loss=0.07854, pruned_loss=0.01686, audio_tagging_loss=0.008879, over 14458.00 frames. ], tot_loss[loss=0.09829, simple_loss=0.1129, pruned_loss=0.03017, audio_tagging_loss=0.01166, over 3044066.44 frames. ], batch size: 54, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:36:13,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=399293.3333333333, ans=0.0 2023-11-18 20:36:21,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=399360.0, ans=0.2 2023-11-18 20:36:35,914 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=399426.6666666667, ans=0.125 2023-11-18 20:36:36,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=399426.6666666667, ans=0.0 2023-11-18 20:36:39,334 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.69 vs. limit=15.0 2023-11-18 20:37:01,518 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.46 vs. limit=15.0 2023-11-18 20:37:02,702 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.51 vs. limit=6.0 2023-11-18 20:37:02,929 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.26 vs. limit=5.0 2023-11-18 20:37:03,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=399626.6666666667, ans=0.125 2023-11-18 20:37:04,196 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11850, loss[loss=0.09907, simple_loss=0.1112, pruned_loss=0.03222, audio_tagging_loss=0.01124, over 14633.00 frames. ], tot_loss[loss=0.0989, simple_loss=0.1137, pruned_loss=0.0304, audio_tagging_loss=0.01164, over 3043903.98 frames. ], batch size: 54, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:37:06,259 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.830e+01 9.778e+01 1.086e+02 1.428e+02, threshold=1.956e+02, percent-clipped=0.0 2023-11-18 20:37:14,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=399693.3333333333, ans=0.2 2023-11-18 20:37:19,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=399693.3333333333, ans=0.125 2023-11-18 20:37:23,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=399693.3333333333, ans=0.0 2023-11-18 20:37:32,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=399760.0, ans=0.125 2023-11-18 20:37:38,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=399826.6666666667, ans=0.1 2023-11-18 20:37:40,113 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.52 vs. limit=15.0 2023-11-18 20:37:40,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=399826.6666666667, ans=0.125 2023-11-18 20:37:58,855 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11900, loss[loss=0.101, simple_loss=0.1171, pruned_loss=0.03188, audio_tagging_loss=0.01064, over 15195.00 frames. ], tot_loss[loss=0.09903, simple_loss=0.114, pruned_loss=0.03041, audio_tagging_loss=0.01161, over 3048946.69 frames. ], batch size: 55, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:38:05,014 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-60000.pt 2023-11-18 20:38:25,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=400093.3333333333, ans=0.0 2023-11-18 20:38:36,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=15.0 2023-11-18 20:38:44,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=400226.6666666667, ans=0.125 2023-11-18 20:38:44,965 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.541e-02 2023-11-18 20:38:48,864 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.85 vs. limit=5.0 2023-11-18 20:38:56,546 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 11950, loss[loss=0.0749, simple_loss=0.07821, pruned_loss=0.02452, audio_tagging_loss=0.01128, over 14439.00 frames. ], tot_loss[loss=0.09852, simple_loss=0.1132, pruned_loss=0.03021, audio_tagging_loss=0.01171, over 3047399.00 frames. ], batch size: 54, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:38:58,616 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.199e+01 8.829e+01 9.865e+01 1.129e+02 1.573e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 20:39:11,237 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.34 vs. limit=6.0 2023-11-18 20:39:17,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=400426.6666666667, ans=0.04949747468305833 2023-11-18 20:39:26,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=400426.6666666667, ans=0.0 2023-11-18 20:39:31,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=400493.3333333333, ans=0.035 2023-11-18 20:39:41,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=400560.0, ans=0.2 2023-11-18 20:39:50,212 INFO [train_asr.py:1115] (0/4) Epoch 5, batch 12000, loss[loss=0.1014, simple_loss=0.113, pruned_loss=0.03254, audio_tagging_loss=0.01231, over 15034.00 frames. ], tot_loss[loss=0.09876, simple_loss=0.1131, pruned_loss=0.03035, audio_tagging_loss=0.01184, over 3050161.78 frames. ], batch size: 57, lr: 1.29e-02, grad_scale: 32.0 2023-11-18 20:39:50,214 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 20:40:16,329 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.3493, 3.6208, 1.9079, 3.6199], device='cuda:0') 2023-11-18 20:40:23,261 INFO [train_asr.py:1147] (0/4) Epoch 5, validation: loss=0.07195, simple_loss=0.05986, pruned_loss=0.008725, audio_tagging_loss=0.0333, over 4681554.00 frames. 2023-11-18 20:40:23,262 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 20:40:23,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=400626.6666666667, ans=0.1 2023-11-18 20:40:32,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=400693.3333333333, ans=0.125 2023-11-18 20:40:37,506 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=400693.3333333333, ans=0.0 2023-11-18 20:40:47,533 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-5.pt 2023-11-18 20:41:23,807 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 0, loss[loss=0.1028, simple_loss=0.1009, pruned_loss=0.0249, audio_tagging_loss=0.02743, over 14952.00 frames. ], tot_loss[loss=0.1028, simple_loss=0.1009, pruned_loss=0.0249, audio_tagging_loss=0.02743, over 14952.00 frames. ], batch size: 56, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:41:23,809 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 20:41:44,135 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.4480, 4.0486, 3.6169, 3.1019], device='cuda:0') 2023-11-18 20:41:50,180 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8220, 2.8553, 4.7788, 4.2797], device='cuda:0') 2023-11-18 20:41:55,537 INFO [train_asr.py:1147] (0/4) Epoch 6, validation: loss=0.07069, simple_loss=0.05989, pruned_loss=0.008764, audio_tagging_loss=0.03198, over 4681554.00 frames. 2023-11-18 20:41:55,538 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 20:41:59,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=400780.0, ans=0.0 2023-11-18 20:42:01,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.54 vs. limit=15.0 2023-11-18 20:42:09,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=400846.6666666667, ans=0.1 2023-11-18 20:42:25,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=400913.3333333333, ans=0.125 2023-11-18 20:42:26,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=400913.3333333333, ans=0.04949747468305833 2023-11-18 20:42:27,001 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.778e+01 9.356e+01 1.020e+02 1.152e+02 1.600e+02, threshold=2.040e+02, percent-clipped=0.0 2023-11-18 20:42:30,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=400980.0, ans=0.035 2023-11-18 20:42:46,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=401046.6666666667, ans=0.1 2023-11-18 20:42:50,311 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 50, loss[loss=0.1006, simple_loss=0.107, pruned_loss=0.02652, audio_tagging_loss=0.02059, over 15858.00 frames. ], tot_loss[loss=0.1084, simple_loss=0.1125, pruned_loss=0.02991, audio_tagging_loss=0.02228, over 689775.61 frames. ], batch size: 58, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:42:50,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=401113.3333333333, ans=0.0 2023-11-18 20:42:53,187 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.36 vs. limit=22.5 2023-11-18 20:42:58,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=401113.3333333333, ans=0.125 2023-11-18 20:42:58,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=401113.3333333333, ans=0.09899494936611666 2023-11-18 20:43:01,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.88 vs. limit=22.5 2023-11-18 20:43:04,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=401180.0, ans=0.1 2023-11-18 20:43:06,923 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.57 vs. limit=10.0 2023-11-18 20:43:12,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=401246.6666666667, ans=0.2 2023-11-18 20:43:47,271 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 100, loss[loss=0.0844, simple_loss=0.08524, pruned_loss=0.01674, audio_tagging_loss=0.02505, over 14127.00 frames. ], tot_loss[loss=0.1082, simple_loss=0.1132, pruned_loss=0.03014, audio_tagging_loss=0.02146, over 1208735.41 frames. ], batch size: 55, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:43:51,056 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.95 vs. limit=22.5 2023-11-18 20:43:51,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=401446.6666666667, ans=0.1 2023-11-18 20:43:52,036 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.64 vs. limit=22.5 2023-11-18 20:44:15,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=401580.0, ans=0.0 2023-11-18 20:44:19,276 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.750e+01 9.166e+01 9.950e+01 1.092e+02 1.419e+02, threshold=1.990e+02, percent-clipped=0.0 2023-11-18 20:44:22,170 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.57 vs. limit=15.0 2023-11-18 20:44:34,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=401713.3333333333, ans=0.1 2023-11-18 20:44:43,065 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 150, loss[loss=0.07538, simple_loss=0.07909, pruned_loss=0.02053, audio_tagging_loss=0.0153, over 14876.00 frames. ], tot_loss[loss=0.1056, simple_loss=0.1127, pruned_loss=0.03003, audio_tagging_loss=0.01922, over 1608950.17 frames. ], batch size: 57, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:45:01,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.52 vs. limit=22.5 2023-11-18 20:45:23,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=401980.0, ans=0.125 2023-11-18 20:45:36,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=402046.6666666667, ans=0.125 2023-11-18 20:45:39,168 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 200, loss[loss=0.1289, simple_loss=0.1558, pruned_loss=0.04199, audio_tagging_loss=0.009052, over 16216.00 frames. ], tot_loss[loss=0.1031, simple_loss=0.1124, pruned_loss=0.03, audio_tagging_loss=0.01694, over 1921636.54 frames. ], batch size: 56, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:45:39,697 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.47 vs. limit=15.0 2023-11-18 20:45:41,732 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.44 vs. limit=22.5 2023-11-18 20:45:42,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=402113.3333333333, ans=0.2 2023-11-18 20:45:51,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=402180.0, ans=0.1 2023-11-18 20:46:06,817 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.57 vs. limit=6.0 2023-11-18 20:46:11,545 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.743e+01 9.007e+01 1.004e+02 1.088e+02 1.464e+02, threshold=2.009e+02, percent-clipped=0.0 2023-11-18 20:46:15,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=402313.3333333333, ans=0.0 2023-11-18 20:46:23,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=402380.0, ans=0.1 2023-11-18 20:46:26,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.min_positive, batch_count=402380.0, ans=0.05 2023-11-18 20:46:32,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=402380.0, ans=0.125 2023-11-18 20:46:35,565 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 250, loss[loss=0.08498, simple_loss=0.09578, pruned_loss=0.0208, audio_tagging_loss=0.01629, over 14853.00 frames. ], tot_loss[loss=0.1016, simple_loss=0.113, pruned_loss=0.02988, audio_tagging_loss=0.01523, over 2172398.60 frames. ], batch size: 56, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:46:49,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.81 vs. limit=6.0 2023-11-18 20:46:52,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=402513.3333333333, ans=0.07 2023-11-18 20:46:58,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=402580.0, ans=0.0 2023-11-18 20:47:00,615 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.54 vs. limit=22.5 2023-11-18 20:47:03,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=402580.0, ans=0.0 2023-11-18 20:47:12,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=402646.6666666667, ans=0.125 2023-11-18 20:47:19,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=402713.3333333333, ans=0.125 2023-11-18 20:47:21,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=402713.3333333333, ans=0.0 2023-11-18 20:47:28,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=402713.3333333333, ans=0.125 2023-11-18 20:47:31,867 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 300, loss[loss=0.1382, simple_loss=0.1671, pruned_loss=0.04421, audio_tagging_loss=0.01044, over 14930.00 frames. ], tot_loss[loss=0.1014, simple_loss=0.1144, pruned_loss=0.03014, audio_tagging_loss=0.01402, over 2369751.77 frames. ], batch size: 54, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:47:34,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=402780.0, ans=0.125 2023-11-18 20:47:51,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=402846.6666666667, ans=0.0 2023-11-18 20:47:58,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=402913.3333333333, ans=0.125 2023-11-18 20:48:03,988 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.203e+01 9.372e+01 1.051e+02 1.173e+02 1.706e+02, threshold=2.102e+02, percent-clipped=0.0 2023-11-18 20:48:06,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=402980.0, ans=0.125 2023-11-18 20:48:16,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=403046.6666666667, ans=0.1 2023-11-18 20:48:27,628 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 350, loss[loss=0.1005, simple_loss=0.1075, pruned_loss=0.03424, audio_tagging_loss=0.01248, over 15283.00 frames. ], tot_loss[loss=0.1003, simple_loss=0.1139, pruned_loss=0.03011, audio_tagging_loss=0.01328, over 2511148.45 frames. ], batch size: 60, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:48:31,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=403113.3333333333, ans=0.125 2023-11-18 20:48:45,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=403180.0, ans=0.125 2023-11-18 20:48:45,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=403180.0, ans=0.0 2023-11-18 20:48:51,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=403246.6666666667, ans=0.04949747468305833 2023-11-18 20:48:51,469 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.70 vs. limit=15.0 2023-11-18 20:49:01,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=403313.3333333333, ans=0.125 2023-11-18 20:49:11,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=403380.0, ans=0.125 2023-11-18 20:49:20,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=403380.0, ans=0.1 2023-11-18 20:49:23,918 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 400, loss[loss=0.07535, simple_loss=0.08601, pruned_loss=0.02151, audio_tagging_loss=0.01083, over 14623.00 frames. ], tot_loss[loss=0.09977, simple_loss=0.1141, pruned_loss=0.03015, audio_tagging_loss=0.01259, over 2624059.55 frames. ], batch size: 57, lr: 1.20e-02, grad_scale: 32.0 2023-11-18 20:49:55,726 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.946e+01 9.366e+01 1.079e+02 1.287e+02 1.849e+02, threshold=2.157e+02, percent-clipped=0.0 2023-11-18 20:50:08,828 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=23.85 vs. limit=22.5 2023-11-18 20:50:19,394 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 450, loss[loss=0.09903, simple_loss=0.1084, pruned_loss=0.03198, audio_tagging_loss=0.01285, over 15306.00 frames. ], tot_loss[loss=0.09989, simple_loss=0.1147, pruned_loss=0.03044, audio_tagging_loss=0.01209, over 2720423.63 frames. ], batch size: 58, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:50:36,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=403846.6666666667, ans=0.125 2023-11-18 20:50:38,574 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.79 vs. limit=15.0 2023-11-18 20:50:52,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=403980.0, ans=0.0 2023-11-18 20:50:54,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=403980.0, ans=0.2 2023-11-18 20:51:08,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.48 vs. limit=15.0 2023-11-18 20:51:12,625 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.17 vs. limit=15.0 2023-11-18 20:51:15,205 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 500, loss[loss=0.1004, simple_loss=0.1066, pruned_loss=0.03404, audio_tagging_loss=0.01309, over 15487.00 frames. ], tot_loss[loss=0.09978, simple_loss=0.115, pruned_loss=0.03053, audio_tagging_loss=0.01178, over 2786229.67 frames. ], batch size: 59, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:51:16,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=404113.3333333333, ans=0.2 2023-11-18 20:51:39,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=404246.6666666667, ans=0.1 2023-11-18 20:51:47,904 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 8.724e+01 9.545e+01 1.075e+02 1.901e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-18 20:52:10,932 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.38 vs. limit=15.0 2023-11-18 20:52:11,361 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 550, loss[loss=0.1052, simple_loss=0.1196, pruned_loss=0.03219, audio_tagging_loss=0.01318, over 15413.00 frames. ], tot_loss[loss=0.09972, simple_loss=0.1147, pruned_loss=0.0306, audio_tagging_loss=0.01177, over 2843120.08 frames. ], batch size: 57, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:52:15,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=404446.6666666667, ans=0.125 2023-11-18 20:52:37,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=404580.0, ans=0.125 2023-11-18 20:52:54,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=404646.6666666667, ans=0.025 2023-11-18 20:53:01,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=404713.3333333333, ans=0.0 2023-11-18 20:53:05,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=404713.3333333333, ans=0.1 2023-11-18 20:53:07,245 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 600, loss[loss=0.09321, simple_loss=0.1031, pruned_loss=0.02885, audio_tagging_loss=0.01282, over 16137.00 frames. ], tot_loss[loss=0.1, simple_loss=0.1153, pruned_loss=0.03074, audio_tagging_loss=0.01164, over 2888339.72 frames. ], batch size: 61, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:53:15,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=404780.0, ans=0.125 2023-11-18 20:53:16,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=404780.0, ans=0.125 2023-11-18 20:53:21,040 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.13 vs. limit=22.5 2023-11-18 20:53:31,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=404913.3333333333, ans=0.125 2023-11-18 20:53:40,229 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.079e+01 8.597e+01 9.522e+01 1.046e+02 1.696e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-18 20:54:03,255 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 650, loss[loss=0.09886, simple_loss=0.1119, pruned_loss=0.03051, audio_tagging_loss=0.01241, over 16138.00 frames. ], tot_loss[loss=0.09827, simple_loss=0.1132, pruned_loss=0.02989, audio_tagging_loss=0.01178, over 2925047.96 frames. ], batch size: 60, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:54:07,134 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=15.0 2023-11-18 20:54:20,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=405180.0, ans=0.2 2023-11-18 20:54:42,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=405313.3333333333, ans=0.95 2023-11-18 20:54:59,375 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 700, loss[loss=0.08798, simple_loss=0.0975, pruned_loss=0.02874, audio_tagging_loss=0.0105, over 13760.00 frames. ], tot_loss[loss=0.09831, simple_loss=0.1134, pruned_loss=0.02989, audio_tagging_loss=0.01174, over 2950363.87 frames. ], batch size: 53, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:55:11,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=405513.3333333333, ans=0.125 2023-11-18 20:55:22,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=405580.0, ans=0.025 2023-11-18 20:55:29,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=405580.0, ans=0.125 2023-11-18 20:55:31,762 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.367e+01 9.330e+01 1.028e+02 1.121e+02 2.477e+02, threshold=2.056e+02, percent-clipped=1.0 2023-11-18 20:55:35,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=405646.6666666667, ans=0.0 2023-11-18 20:55:45,485 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.68 vs. limit=10.0 2023-11-18 20:55:47,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=405713.3333333333, ans=0.0 2023-11-18 20:55:55,651 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 750, loss[loss=0.08996, simple_loss=0.1096, pruned_loss=0.0258, audio_tagging_loss=0.009348, over 15539.00 frames. ], tot_loss[loss=0.09914, simple_loss=0.1144, pruned_loss=0.03032, audio_tagging_loss=0.01161, over 2974290.52 frames. ], batch size: 60, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:56:06,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=405846.6666666667, ans=0.125 2023-11-18 20:56:07,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=405846.6666666667, ans=0.125 2023-11-18 20:56:26,780 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 20:56:27,026 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2023-11-18 20:56:40,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=406046.6666666667, ans=0.0 2023-11-18 20:56:51,384 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 800, loss[loss=0.1144, simple_loss=0.1353, pruned_loss=0.03595, audio_tagging_loss=0.01083, over 15018.00 frames. ], tot_loss[loss=0.09975, simple_loss=0.1154, pruned_loss=0.03054, audio_tagging_loss=0.01153, over 2987212.66 frames. ], batch size: 57, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:56:56,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=406113.3333333333, ans=0.125 2023-11-18 20:57:05,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=406180.0, ans=0.125 2023-11-18 20:57:24,275 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.244e+01 9.553e+01 1.008e+02 1.085e+02 1.896e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-18 20:57:39,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=406380.0, ans=0.125 2023-11-18 20:57:46,591 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 850, loss[loss=0.09813, simple_loss=0.1032, pruned_loss=0.03611, audio_tagging_loss=0.01042, over 14109.00 frames. ], tot_loss[loss=0.09949, simple_loss=0.1149, pruned_loss=0.03047, audio_tagging_loss=0.01157, over 3003026.41 frames. ], batch size: 54, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:57:57,104 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2023-11-18 20:58:14,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=406580.0, ans=0.0 2023-11-18 20:58:14,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=406580.0, ans=0.1 2023-11-18 20:58:24,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=406646.6666666667, ans=0.125 2023-11-18 20:58:43,487 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 900, loss[loss=0.09172, simple_loss=0.1102, pruned_loss=0.02819, audio_tagging_loss=0.008409, over 16432.00 frames. ], tot_loss[loss=0.09979, simple_loss=0.1155, pruned_loss=0.03049, audio_tagging_loss=0.01156, over 3012004.36 frames. ], batch size: 60, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:58:55,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=406846.6666666667, ans=0.0 2023-11-18 20:59:05,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=406913.3333333333, ans=0.0 2023-11-18 20:59:15,307 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.061e+01 8.915e+01 9.624e+01 1.067e+02 1.384e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-18 20:59:17,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=406980.0, ans=0.2 2023-11-18 20:59:20,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=406980.0, ans=0.2 2023-11-18 20:59:20,619 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.98 vs. limit=15.0 2023-11-18 20:59:39,118 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 950, loss[loss=0.09186, simple_loss=0.1021, pruned_loss=0.02724, audio_tagging_loss=0.01357, over 15708.00 frames. ], tot_loss[loss=0.1, simple_loss=0.1157, pruned_loss=0.0307, audio_tagging_loss=0.01145, over 3020139.23 frames. ], batch size: 59, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 20:59:39,695 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.03 vs. limit=15.0 2023-11-18 20:59:40,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=407113.3333333333, ans=0.125 2023-11-18 20:59:44,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=407113.3333333333, ans=0.0 2023-11-18 20:59:47,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=407113.3333333333, ans=0.125 2023-11-18 20:59:48,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=407180.0, ans=0.125 2023-11-18 20:59:50,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=407180.0, ans=0.0 2023-11-18 21:00:02,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=407246.6666666667, ans=0.1 2023-11-18 21:00:04,725 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=407246.6666666667, ans=0.2 2023-11-18 21:00:05,040 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.35 vs. limit=10.0 2023-11-18 21:00:11,072 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.16 vs. limit=22.5 2023-11-18 21:00:14,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=407313.3333333333, ans=0.125 2023-11-18 21:00:15,384 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=407313.3333333333, ans=0.0 2023-11-18 21:00:25,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=407380.0, ans=0.2 2023-11-18 21:00:31,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=407380.0, ans=0.035 2023-11-18 21:00:34,309 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1000, loss[loss=0.09343, simple_loss=0.1005, pruned_loss=0.02906, audio_tagging_loss=0.01413, over 15514.00 frames. ], tot_loss[loss=0.09883, simple_loss=0.1146, pruned_loss=0.03028, audio_tagging_loss=0.01123, over 3028188.00 frames. ], batch size: 58, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:00:37,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=407446.6666666667, ans=0.0 2023-11-18 21:00:49,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=407513.3333333333, ans=0.125 2023-11-18 21:00:55,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.76 vs. limit=15.0 2023-11-18 21:00:58,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=407580.0, ans=0.1 2023-11-18 21:00:58,790 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:01:06,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=407580.0, ans=0.125 2023-11-18 21:01:07,167 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 8.744e+01 1.004e+02 1.144e+02 1.885e+02, threshold=2.008e+02, percent-clipped=0.0 2023-11-18 21:01:07,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=407646.6666666667, ans=0.2 2023-11-18 21:01:11,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=407646.6666666667, ans=0.2 2023-11-18 21:01:15,047 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=407646.6666666667, ans=0.04949747468305833 2023-11-18 21:01:30,881 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1050, loss[loss=0.07752, simple_loss=0.09006, pruned_loss=0.02034, audio_tagging_loss=0.01216, over 15274.00 frames. ], tot_loss[loss=0.0985, simple_loss=0.114, pruned_loss=0.03023, audio_tagging_loss=0.01129, over 3033038.59 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:01:36,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.44 vs. limit=15.0 2023-11-18 21:01:58,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=407913.3333333333, ans=0.125 2023-11-18 21:02:14,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=408046.6666666667, ans=0.0 2023-11-18 21:02:27,552 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1100, loss[loss=0.09936, simple_loss=0.1284, pruned_loss=0.02844, audio_tagging_loss=0.006713, over 14515.00 frames. ], tot_loss[loss=0.09814, simple_loss=0.1137, pruned_loss=0.03014, audio_tagging_loss=0.01117, over 3032926.45 frames. ], batch size: 54, lr: 1.19e-02, grad_scale: 16.0 2023-11-18 21:02:29,731 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:02:29,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=408113.3333333333, ans=0.125 2023-11-18 21:02:36,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=408113.3333333333, ans=0.125 2023-11-18 21:02:50,122 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.57 vs. limit=15.0 2023-11-18 21:02:50,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=408246.6666666667, ans=0.07 2023-11-18 21:02:50,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=408246.6666666667, ans=0.2 2023-11-18 21:03:00,441 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.573e+01 9.716e+01 1.058e+02 1.424e+02, threshold=1.943e+02, percent-clipped=0.0 2023-11-18 21:03:07,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=408313.3333333333, ans=0.125 2023-11-18 21:03:07,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=408313.3333333333, ans=0.125 2023-11-18 21:03:19,933 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.10 vs. limit=12.0 2023-11-18 21:03:22,695 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1150, loss[loss=0.1248, simple_loss=0.1422, pruned_loss=0.04463, audio_tagging_loss=0.009084, over 15466.00 frames. ], tot_loss[loss=0.09822, simple_loss=0.1139, pruned_loss=0.03011, audio_tagging_loss=0.01115, over 3030143.07 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 16.0 2023-11-18 21:03:23,020 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=2.575e-03 2023-11-18 21:03:23,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=408446.6666666667, ans=0.02 2023-11-18 21:03:41,192 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.26 vs. limit=6.0 2023-11-18 21:03:54,030 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=7.79 vs. limit=15.0 2023-11-18 21:03:56,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=408646.6666666667, ans=0.125 2023-11-18 21:04:17,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=408713.3333333333, ans=0.07 2023-11-18 21:04:19,240 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1200, loss[loss=0.08111, simple_loss=0.1004, pruned_loss=0.02189, audio_tagging_loss=0.009011, over 14866.00 frames. ], tot_loss[loss=0.09895, simple_loss=0.115, pruned_loss=0.03047, audio_tagging_loss=0.01099, over 3035844.83 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:04:20,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=408780.0, ans=0.125 2023-11-18 21:04:20,847 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.23 vs. limit=12.0 2023-11-18 21:04:37,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=408846.6666666667, ans=0.125 2023-11-18 21:04:52,322 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 9.018e+01 9.709e+01 1.057e+02 1.336e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-18 21:05:15,226 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1250, loss[loss=0.1075, simple_loss=0.1216, pruned_loss=0.03609, audio_tagging_loss=0.01059, over 14898.00 frames. ], tot_loss[loss=0.09861, simple_loss=0.1148, pruned_loss=0.03031, audio_tagging_loss=0.01088, over 3040269.42 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:05:18,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=409113.3333333333, ans=0.2 2023-11-18 21:05:24,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=409113.3333333333, ans=0.0 2023-11-18 21:05:26,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=409180.0, ans=0.1 2023-11-18 21:05:41,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=409246.6666666667, ans=22.5 2023-11-18 21:05:51,506 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.279e-02 2023-11-18 21:06:10,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=409446.6666666667, ans=0.125 2023-11-18 21:06:11,360 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1300, loss[loss=0.09688, simple_loss=0.1142, pruned_loss=0.02536, audio_tagging_loss=0.01443, over 14897.00 frames. ], tot_loss[loss=0.09893, simple_loss=0.1153, pruned_loss=0.03032, audio_tagging_loss=0.01097, over 3038036.55 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:06:13,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=409446.6666666667, ans=0.125 2023-11-18 21:06:22,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=409513.3333333333, ans=0.2 2023-11-18 21:06:45,377 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.998e+01 8.886e+01 9.349e+01 1.016e+02 1.502e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-18 21:06:50,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=409646.6666666667, ans=0.125 2023-11-18 21:06:51,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=409646.6666666667, ans=0.5 2023-11-18 21:06:52,432 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.11 vs. limit=22.5 2023-11-18 21:07:07,806 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1350, loss[loss=0.1062, simple_loss=0.1284, pruned_loss=0.0303, audio_tagging_loss=0.01166, over 17020.00 frames. ], tot_loss[loss=0.09944, simple_loss=0.1158, pruned_loss=0.03053, audio_tagging_loss=0.01101, over 3042377.83 frames. ], batch size: 64, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:07:09,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=409780.0, ans=0.125 2023-11-18 21:07:10,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=409780.0, ans=0.0 2023-11-18 21:07:12,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=409780.0, ans=0.2 2023-11-18 21:07:16,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=409780.0, ans=0.0 2023-11-18 21:07:24,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=409846.6666666667, ans=0.1 2023-11-18 21:07:27,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=409846.6666666667, ans=0.125 2023-11-18 21:07:29,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=409913.3333333333, ans=0.125 2023-11-18 21:07:41,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=409980.0, ans=0.2 2023-11-18 21:07:47,873 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:08:01,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.25 vs. limit=15.0 2023-11-18 21:08:03,667 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1400, loss[loss=0.08301, simple_loss=0.09799, pruned_loss=0.02173, audio_tagging_loss=0.01228, over 15335.00 frames. ], tot_loss[loss=0.09869, simple_loss=0.1148, pruned_loss=0.03023, audio_tagging_loss=0.01108, over 3046000.48 frames. ], batch size: 56, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:08:11,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=410113.3333333333, ans=0.125 2023-11-18 21:08:24,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=410246.6666666667, ans=0.2 2023-11-18 21:08:24,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=410246.6666666667, ans=0.1 2023-11-18 21:08:37,130 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.960e+01 8.879e+01 9.810e+01 1.048e+02 1.417e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-18 21:08:37,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=410313.3333333333, ans=0.125 2023-11-18 21:08:38,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=410313.3333333333, ans=0.125 2023-11-18 21:08:43,986 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=11.20 vs. limit=15.0 2023-11-18 21:08:47,730 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.84 vs. limit=15.0 2023-11-18 21:08:59,548 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1450, loss[loss=0.07869, simple_loss=0.0932, pruned_loss=0.02357, audio_tagging_loss=0.008519, over 15488.00 frames. ], tot_loss[loss=0.09789, simple_loss=0.1136, pruned_loss=0.02986, audio_tagging_loss=0.01122, over 3048756.08 frames. ], batch size: 59, lr: 1.19e-02, grad_scale: 32.0 2023-11-18 21:09:14,951 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=410513.3333333333, ans=0.1 2023-11-18 21:09:19,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=410513.3333333333, ans=0.125 2023-11-18 21:09:32,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=410646.6666666667, ans=0.1 2023-11-18 21:09:39,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=410646.6666666667, ans=0.0 2023-11-18 21:09:45,048 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:09:55,667 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.85 vs. limit=22.5 2023-11-18 21:09:56,062 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1500, loss[loss=0.06418, simple_loss=0.06387, pruned_loss=0.01658, audio_tagging_loss=0.01566, over 14717.00 frames. ], tot_loss[loss=0.09841, simple_loss=0.1138, pruned_loss=0.03024, audio_tagging_loss=0.01129, over 3040338.50 frames. ], batch size: 58, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:10:16,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=410846.6666666667, ans=0.2 2023-11-18 21:10:18,235 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.36 vs. limit=15.0 2023-11-18 21:10:19,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=410913.3333333333, ans=0.0 2023-11-18 21:10:24,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.44 vs. limit=10.0 2023-11-18 21:10:29,753 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.050e+01 8.852e+01 9.763e+01 1.053e+02 1.656e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-18 21:10:47,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=411046.6666666667, ans=0.125 2023-11-18 21:10:51,953 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1550, loss[loss=0.07857, simple_loss=0.08448, pruned_loss=0.02133, audio_tagging_loss=0.015, over 15762.00 frames. ], tot_loss[loss=0.09859, simple_loss=0.1136, pruned_loss=0.03038, audio_tagging_loss=0.01139, over 3045885.13 frames. ], batch size: 61, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:11:01,635 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2023-11-18 21:11:27,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=411313.3333333333, ans=0.125 2023-11-18 21:11:28,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=411313.3333333333, ans=0.0 2023-11-18 21:11:30,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=411313.3333333333, ans=0.125 2023-11-18 21:11:47,819 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1600, loss[loss=0.1062, simple_loss=0.1131, pruned_loss=0.03749, audio_tagging_loss=0.01211, over 14700.00 frames. ], tot_loss[loss=0.09851, simple_loss=0.114, pruned_loss=0.03021, audio_tagging_loss=0.0113, over 3051428.44 frames. ], batch size: 56, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:12:21,741 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.758e+01 8.910e+01 9.772e+01 1.109e+02 1.512e+02, threshold=1.954e+02, percent-clipped=0.0 2023-11-18 21:12:44,078 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1650, loss[loss=0.1037, simple_loss=0.1197, pruned_loss=0.03188, audio_tagging_loss=0.01198, over 14435.00 frames. ], tot_loss[loss=0.09886, simple_loss=0.1145, pruned_loss=0.0303, audio_tagging_loss=0.01132, over 3049876.72 frames. ], batch size: 54, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:12:44,522 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.31 vs. limit=6.0 2023-11-18 21:12:57,827 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.03 vs. limit=22.5 2023-11-18 21:13:03,598 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.52 vs. limit=15.0 2023-11-18 21:13:24,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=411980.0, ans=0.125 2023-11-18 21:13:29,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=412046.6666666667, ans=0.2 2023-11-18 21:13:36,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=412046.6666666667, ans=0.125 2023-11-18 21:13:39,903 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1700, loss[loss=0.1129, simple_loss=0.1294, pruned_loss=0.03762, audio_tagging_loss=0.0106, over 15062.00 frames. ], tot_loss[loss=0.09904, simple_loss=0.1149, pruned_loss=0.03016, audio_tagging_loss=0.01143, over 3051443.32 frames. ], batch size: 58, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:13:43,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=412113.3333333333, ans=0.125 2023-11-18 21:13:50,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=412180.0, ans=0.025 2023-11-18 21:13:54,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=412180.0, ans=0.125 2023-11-18 21:14:01,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=412246.6666666667, ans=0.0 2023-11-18 21:14:03,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=412246.6666666667, ans=0.125 2023-11-18 21:14:04,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=412246.6666666667, ans=0.0 2023-11-18 21:14:08,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=412246.6666666667, ans=0.07 2023-11-18 21:14:13,795 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.581e+01 9.349e+01 1.070e+02 1.315e+02 2.031e+02, threshold=2.140e+02, percent-clipped=2.0 2023-11-18 21:14:17,674 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.44 vs. limit=15.0 2023-11-18 21:14:18,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=412313.3333333333, ans=0.125 2023-11-18 21:14:35,714 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1750, loss[loss=0.1382, simple_loss=0.1611, pruned_loss=0.0472, audio_tagging_loss=0.01043, over 16032.00 frames. ], tot_loss[loss=0.09906, simple_loss=0.1151, pruned_loss=0.03021, audio_tagging_loss=0.01128, over 3046774.30 frames. ], batch size: 57, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:14:39,176 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:14:41,634 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.41 vs. limit=22.5 2023-11-18 21:15:14,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=412646.6666666667, ans=0.125 2023-11-18 21:15:18,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=412646.6666666667, ans=0.0 2023-11-18 21:15:20,982 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.39 vs. limit=15.0 2023-11-18 21:15:32,225 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1800, loss[loss=0.1416, simple_loss=0.1843, pruned_loss=0.04046, audio_tagging_loss=0.009017, over 16071.00 frames. ], tot_loss[loss=0.09938, simple_loss=0.1157, pruned_loss=0.03036, audio_tagging_loss=0.01118, over 3043090.54 frames. ], batch size: 55, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:15:47,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=412846.6666666667, ans=0.125 2023-11-18 21:16:06,199 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.676e+01 9.052e+01 1.017e+02 1.096e+02 2.007e+02, threshold=2.033e+02, percent-clipped=0.0 2023-11-18 21:16:10,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=412980.0, ans=0.125 2023-11-18 21:16:14,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=412980.0, ans=0.0 2023-11-18 21:16:27,505 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1850, loss[loss=0.0831, simple_loss=0.1001, pruned_loss=0.02216, audio_tagging_loss=0.01087, over 15597.00 frames. ], tot_loss[loss=0.09965, simple_loss=0.116, pruned_loss=0.03047, audio_tagging_loss=0.01116, over 3043357.32 frames. ], batch size: 59, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:16:29,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=413113.3333333333, ans=0.015 2023-11-18 21:16:34,944 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.57 vs. limit=12.0 2023-11-18 21:16:47,800 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.79 vs. limit=15.0 2023-11-18 21:16:57,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=413246.6666666667, ans=0.0 2023-11-18 21:16:59,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=413246.6666666667, ans=0.2 2023-11-18 21:17:01,942 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.12 vs. limit=15.0 2023-11-18 21:17:11,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=413380.0, ans=0.0 2023-11-18 21:17:23,593 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1900, loss[loss=0.08251, simple_loss=0.102, pruned_loss=0.02319, audio_tagging_loss=0.008343, over 16661.00 frames. ], tot_loss[loss=0.09848, simple_loss=0.1146, pruned_loss=0.03003, audio_tagging_loss=0.01115, over 3050526.77 frames. ], batch size: 63, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:17:29,117 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=413446.6666666667, ans=0.125 2023-11-18 21:17:30,660 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.76 vs. limit=15.0 2023-11-18 21:17:48,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=9.92 vs. limit=15.0 2023-11-18 21:17:55,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=413580.0, ans=0.125 2023-11-18 21:17:57,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.86 vs. limit=15.0 2023-11-18 21:17:58,665 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.936e+01 9.160e+01 9.941e+01 1.091e+02 1.656e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-18 21:18:07,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=413713.3333333333, ans=0.125 2023-11-18 21:18:11,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=413713.3333333333, ans=0.125 2023-11-18 21:18:13,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=413713.3333333333, ans=0.125 2023-11-18 21:18:13,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=413713.3333333333, ans=0.2 2023-11-18 21:18:15,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=413713.3333333333, ans=0.1 2023-11-18 21:18:18,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=413780.0, ans=0.125 2023-11-18 21:18:19,658 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 1950, loss[loss=0.1062, simple_loss=0.1313, pruned_loss=0.03022, audio_tagging_loss=0.01036, over 15400.00 frames. ], tot_loss[loss=0.09877, simple_loss=0.1149, pruned_loss=0.03021, audio_tagging_loss=0.01109, over 3055202.63 frames. ], batch size: 55, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:18:31,060 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.12 vs. limit=22.5 2023-11-18 21:18:49,085 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=413913.3333333333, ans=0.125 2023-11-18 21:18:59,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=413980.0, ans=0.125 2023-11-18 21:19:15,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=414113.3333333333, ans=0.125 2023-11-18 21:19:15,974 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2000, loss[loss=0.1095, simple_loss=0.1271, pruned_loss=0.03528, audio_tagging_loss=0.01068, over 15920.00 frames. ], tot_loss[loss=0.09799, simple_loss=0.1138, pruned_loss=0.02984, audio_tagging_loss=0.01125, over 3050947.32 frames. ], batch size: 59, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:19:19,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=414113.3333333333, ans=0.125 2023-11-18 21:19:23,031 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.621e-01 2023-11-18 21:19:50,562 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.514e+01 8.645e+01 9.576e+01 1.020e+02 1.190e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-18 21:19:57,206 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.13 vs. limit=10.0 2023-11-18 21:20:10,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=414446.6666666667, ans=0.0 2023-11-18 21:20:11,736 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2050, loss[loss=0.0882, simple_loss=0.09202, pruned_loss=0.02737, audio_tagging_loss=0.01482, over 14567.00 frames. ], tot_loss[loss=0.09827, simple_loss=0.1142, pruned_loss=0.02996, audio_tagging_loss=0.0112, over 3049250.48 frames. ], batch size: 58, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:20:27,635 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.93 vs. limit=22.5 2023-11-18 21:20:51,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=414646.6666666667, ans=0.0 2023-11-18 21:21:07,344 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2100, loss[loss=0.1129, simple_loss=0.1319, pruned_loss=0.03433, audio_tagging_loss=0.01265, over 14900.00 frames. ], tot_loss[loss=0.09766, simple_loss=0.1135, pruned_loss=0.02974, audio_tagging_loss=0.01115, over 3052640.17 frames. ], batch size: 57, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:21:17,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=414780.0, ans=0.125 2023-11-18 21:21:24,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=414846.6666666667, ans=0.125 2023-11-18 21:21:42,501 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.286e+01 9.113e+01 9.926e+01 1.128e+02 1.703e+02, threshold=1.985e+02, percent-clipped=0.0 2023-11-18 21:22:00,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=415046.6666666667, ans=0.0 2023-11-18 21:22:03,828 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2150, loss[loss=0.07342, simple_loss=0.08968, pruned_loss=0.01817, audio_tagging_loss=0.01041, over 14809.00 frames. ], tot_loss[loss=0.09751, simple_loss=0.1136, pruned_loss=0.02955, audio_tagging_loss=0.01116, over 3052837.95 frames. ], batch size: 58, lr: 1.18e-02, grad_scale: 32.0 2023-11-18 21:22:12,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.42 vs. limit=15.0 2023-11-18 21:22:13,992 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.80 vs. limit=6.0 2023-11-18 21:22:14,675 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff3.min_abs, batch_count=415180.0, ans=0.2 2023-11-18 21:22:25,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=415246.6666666667, ans=0.1 2023-11-18 21:22:35,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=415246.6666666667, ans=0.1 2023-11-18 21:22:36,307 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:22:44,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=415313.3333333333, ans=0.2 2023-11-18 21:22:51,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=415380.0, ans=0.125 2023-11-18 21:22:59,972 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2200, loss[loss=0.1086, simple_loss=0.1317, pruned_loss=0.03557, audio_tagging_loss=0.007141, over 16823.00 frames. ], tot_loss[loss=0.0984, simple_loss=0.1148, pruned_loss=0.02992, audio_tagging_loss=0.01109, over 3052414.24 frames. ], batch size: 63, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:23:26,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=415580.0, ans=0.0 2023-11-18 21:23:35,841 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.875e+01 9.823e+01 1.124e+02 2.816e+02, threshold=1.965e+02, percent-clipped=1.0 2023-11-18 21:23:38,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=415646.6666666667, ans=0.125 2023-11-18 21:23:41,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=415646.6666666667, ans=0.2 2023-11-18 21:23:52,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=415713.3333333333, ans=0.125 2023-11-18 21:23:55,604 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2250, loss[loss=0.08776, simple_loss=0.09977, pruned_loss=0.02727, audio_tagging_loss=0.0106, over 14694.00 frames. ], tot_loss[loss=0.09845, simple_loss=0.1148, pruned_loss=0.02993, audio_tagging_loss=0.01112, over 3051377.27 frames. ], batch size: 54, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:24:51,862 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2300, loss[loss=0.1034, simple_loss=0.1187, pruned_loss=0.03262, audio_tagging_loss=0.01139, over 14202.00 frames. ], tot_loss[loss=0.09881, simple_loss=0.1153, pruned_loss=0.02994, audio_tagging_loss=0.01123, over 3048069.58 frames. ], batch size: 53, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:25:00,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=416113.3333333333, ans=0.125 2023-11-18 21:25:08,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=416180.0, ans=0.0 2023-11-18 21:25:19,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=416246.6666666667, ans=0.125 2023-11-18 21:25:23,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=416313.3333333333, ans=0.125 2023-11-18 21:25:26,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=416313.3333333333, ans=0.125 2023-11-18 21:25:27,504 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.888e+01 8.781e+01 9.557e+01 1.037e+02 1.979e+02, threshold=1.911e+02, percent-clipped=1.0 2023-11-18 21:25:30,242 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.93 vs. limit=15.0 2023-11-18 21:25:34,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=416313.3333333333, ans=0.125 2023-11-18 21:25:38,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=416380.0, ans=0.1 2023-11-18 21:25:39,385 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:25:47,864 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2350, loss[loss=0.09211, simple_loss=0.1158, pruned_loss=0.02296, audio_tagging_loss=0.01123, over 16133.00 frames. ], tot_loss[loss=0.0993, simple_loss=0.1158, pruned_loss=0.03011, audio_tagging_loss=0.01128, over 3044380.68 frames. ], batch size: 58, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:26:12,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=416580.0, ans=0.125 2023-11-18 21:26:29,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.74 vs. limit=22.5 2023-11-18 21:26:38,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=416713.3333333333, ans=0.125 2023-11-18 21:26:43,290 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2400, loss[loss=0.07801, simple_loss=0.09339, pruned_loss=0.01899, audio_tagging_loss=0.01232, over 14623.00 frames. ], tot_loss[loss=0.09856, simple_loss=0.1152, pruned_loss=0.02972, audio_tagging_loss=0.01126, over 3046684.28 frames. ], batch size: 55, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:26:44,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=416780.0, ans=0.125 2023-11-18 21:26:47,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=416780.0, ans=0.0 2023-11-18 21:27:09,778 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.69 vs. limit=15.0 2023-11-18 21:27:10,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=416913.3333333333, ans=0.125 2023-11-18 21:27:10,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=416913.3333333333, ans=0.2 2023-11-18 21:27:20,427 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.067e+01 8.680e+01 9.662e+01 1.129e+02 1.566e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-18 21:27:22,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=416980.0, ans=0.1 2023-11-18 21:27:39,627 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2450, loss[loss=0.09251, simple_loss=0.09997, pruned_loss=0.02827, audio_tagging_loss=0.01426, over 14631.00 frames. ], tot_loss[loss=0.09727, simple_loss=0.1132, pruned_loss=0.02921, audio_tagging_loss=0.01147, over 3048289.92 frames. ], batch size: 56, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:27:59,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.28 vs. limit=10.0 2023-11-18 21:27:59,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=417180.0, ans=0.125 2023-11-18 21:28:04,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=417246.6666666667, ans=0.125 2023-11-18 21:28:11,956 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.32 vs. limit=15.0 2023-11-18 21:28:15,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=417313.3333333333, ans=0.125 2023-11-18 21:28:18,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=417313.3333333333, ans=0.125 2023-11-18 21:28:22,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=417313.3333333333, ans=0.125 2023-11-18 21:28:31,495 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.91 vs. limit=12.0 2023-11-18 21:28:34,090 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.71 vs. limit=6.0 2023-11-18 21:28:35,690 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2500, loss[loss=0.1199, simple_loss=0.1432, pruned_loss=0.0385, audio_tagging_loss=0.009779, over 15485.00 frames. ], tot_loss[loss=0.09714, simple_loss=0.1128, pruned_loss=0.02924, audio_tagging_loss=0.01152, over 3047429.84 frames. ], batch size: 57, lr: 1.18e-02, grad_scale: 16.0 2023-11-18 21:28:37,559 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.80 vs. limit=8.0 2023-11-18 21:28:55,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.29 vs. limit=15.0 2023-11-18 21:29:02,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=417580.0, ans=0.1 2023-11-18 21:29:07,697 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.73 vs. limit=22.5 2023-11-18 21:29:12,269 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.822e+01 8.909e+01 1.012e+02 1.108e+02 1.409e+02, threshold=2.024e+02, percent-clipped=0.0 2023-11-18 21:29:17,815 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.25 vs. limit=15.0 2023-11-18 21:29:31,819 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2550, loss[loss=0.1108, simple_loss=0.1341, pruned_loss=0.03285, audio_tagging_loss=0.01088, over 14493.00 frames. ], tot_loss[loss=0.09749, simple_loss=0.1132, pruned_loss=0.02946, audio_tagging_loss=0.01144, over 3045811.28 frames. ], batch size: 53, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:30:28,031 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2600, loss[loss=0.101, simple_loss=0.115, pruned_loss=0.03559, audio_tagging_loss=0.007955, over 15289.00 frames. ], tot_loss[loss=0.09695, simple_loss=0.1124, pruned_loss=0.02951, audio_tagging_loss=0.01126, over 3048451.23 frames. ], batch size: 59, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:30:41,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=418180.0, ans=0.0 2023-11-18 21:31:04,866 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.504e+01 8.916e+01 1.001e+02 1.139e+02 1.588e+02, threshold=2.001e+02, percent-clipped=0.0 2023-11-18 21:31:07,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=418313.3333333333, ans=0.1 2023-11-18 21:31:24,014 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2650, loss[loss=0.1042, simple_loss=0.1314, pruned_loss=0.02989, audio_tagging_loss=0.008571, over 14509.00 frames. ], tot_loss[loss=0.09606, simple_loss=0.1115, pruned_loss=0.02912, audio_tagging_loss=0.01119, over 3049587.86 frames. ], batch size: 55, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:31:32,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=418446.6666666667, ans=0.1 2023-11-18 21:32:19,340 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.90 vs. limit=15.0 2023-11-18 21:32:19,687 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2700, loss[loss=0.1156, simple_loss=0.1329, pruned_loss=0.04041, audio_tagging_loss=0.008708, over 15001.00 frames. ], tot_loss[loss=0.09564, simple_loss=0.111, pruned_loss=0.02899, audio_tagging_loss=0.01116, over 3049919.54 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:32:22,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=418780.0, ans=0.125 2023-11-18 21:32:39,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=418846.6666666667, ans=0.2 2023-11-18 21:32:49,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=418913.3333333333, ans=0.0 2023-11-18 21:32:56,911 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.943e+01 9.076e+01 9.942e+01 1.068e+02 1.459e+02, threshold=1.988e+02, percent-clipped=0.0 2023-11-18 21:33:01,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=418980.0, ans=0.125 2023-11-18 21:33:16,782 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2750, loss[loss=0.08998, simple_loss=0.1025, pruned_loss=0.02761, audio_tagging_loss=0.01112, over 14345.00 frames. ], tot_loss[loss=0.09547, simple_loss=0.111, pruned_loss=0.02889, audio_tagging_loss=0.0111, over 3045473.45 frames. ], batch size: 55, lr: 1.17e-02, grad_scale: 16.0 2023-11-18 21:33:34,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=419180.0, ans=0.125 2023-11-18 21:33:36,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=419180.0, ans=0.1 2023-11-18 21:34:00,106 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.65 vs. limit=15.0 2023-11-18 21:34:02,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=419380.0, ans=0.125 2023-11-18 21:34:03,601 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:34:12,572 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2800, loss[loss=0.1154, simple_loss=0.1429, pruned_loss=0.03752, audio_tagging_loss=0.006411, over 16204.00 frames. ], tot_loss[loss=0.0964, simple_loss=0.1121, pruned_loss=0.02931, audio_tagging_loss=0.01104, over 3048386.00 frames. ], batch size: 58, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:34:16,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=419446.6666666667, ans=0.0 2023-11-18 21:34:23,950 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.31 vs. limit=15.0 2023-11-18 21:34:29,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=419513.3333333333, ans=0.2 2023-11-18 21:34:46,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=419646.6666666667, ans=0.125 2023-11-18 21:34:46,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=419646.6666666667, ans=0.0 2023-11-18 21:34:49,209 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.663e+01 8.954e+01 9.859e+01 1.088e+02 1.629e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-18 21:35:06,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=419780.0, ans=0.125 2023-11-18 21:35:06,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=419780.0, ans=0.0 2023-11-18 21:35:07,675 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2850, loss[loss=0.07032, simple_loss=0.08549, pruned_loss=0.01757, audio_tagging_loss=0.01001, over 15654.00 frames. ], tot_loss[loss=0.09697, simple_loss=0.1129, pruned_loss=0.02941, audio_tagging_loss=0.0111, over 3042090.20 frames. ], batch size: 60, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:35:20,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=419846.6666666667, ans=0.125 2023-11-18 21:35:20,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=419846.6666666667, ans=0.0 2023-11-18 21:35:33,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=419913.3333333333, ans=0.125 2023-11-18 21:35:38,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=419913.3333333333, ans=0.1 2023-11-18 21:35:52,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=420046.6666666667, ans=0.2 2023-11-18 21:35:54,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=420046.6666666667, ans=0.125 2023-11-18 21:36:04,127 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.69 vs. limit=6.0 2023-11-18 21:36:05,318 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2900, loss[loss=0.1338, simple_loss=0.1673, pruned_loss=0.0409, audio_tagging_loss=0.009316, over 15994.00 frames. ], tot_loss[loss=0.09709, simple_loss=0.113, pruned_loss=0.02948, audio_tagging_loss=0.0111, over 3048623.58 frames. ], batch size: 57, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:36:06,527 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:36:28,339 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=420246.6666666667, ans=0.2 2023-11-18 21:36:40,903 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.758e+01 9.574e+01 1.055e+02 1.297e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-18 21:37:00,111 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 2950, loss[loss=0.1285, simple_loss=0.1443, pruned_loss=0.04474, audio_tagging_loss=0.01156, over 15320.00 frames. ], tot_loss[loss=0.09805, simple_loss=0.1142, pruned_loss=0.02993, audio_tagging_loss=0.01104, over 3048477.04 frames. ], batch size: 57, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:37:05,353 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.51 vs. limit=15.0 2023-11-18 21:37:08,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=420446.6666666667, ans=0.0 2023-11-18 21:37:15,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=420513.3333333333, ans=0.125 2023-11-18 21:37:24,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=420580.0, ans=0.0 2023-11-18 21:37:24,782 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.29 vs. limit=15.0 2023-11-18 21:37:29,797 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=420580.0, ans=0.0 2023-11-18 21:37:43,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=420713.3333333333, ans=0.125 2023-11-18 21:37:45,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.46 vs. limit=22.5 2023-11-18 21:37:50,265 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:37:55,429 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3000, loss[loss=0.08191, simple_loss=0.09311, pruned_loss=0.01979, audio_tagging_loss=0.01556, over 15245.00 frames. ], tot_loss[loss=0.09831, simple_loss=0.1144, pruned_loss=0.03004, audio_tagging_loss=0.01104, over 3052559.60 frames. ], batch size: 58, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:37:55,432 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 21:38:18,428 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.5.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8811, 3.0882, 4.8394, 4.4197], device='cuda:0') 2023-11-18 21:38:28,444 INFO [train_asr.py:1147] (0/4) Epoch 6, validation: loss=0.07003, simple_loss=0.05914, pruned_loss=0.008279, audio_tagging_loss=0.03218, over 4681554.00 frames. 2023-11-18 21:38:28,444 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 21:38:51,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=420913.3333333333, ans=0.0 2023-11-18 21:38:58,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=420913.3333333333, ans=0.0 2023-11-18 21:39:03,792 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.758e+01 9.190e+01 1.009e+02 1.131e+02 1.432e+02, threshold=2.017e+02, percent-clipped=0.0 2023-11-18 21:39:11,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=421046.6666666667, ans=0.0 2023-11-18 21:39:15,077 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.58 vs. limit=15.0 2023-11-18 21:39:23,535 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3050, loss[loss=0.08071, simple_loss=0.08906, pruned_loss=0.02377, audio_tagging_loss=0.01241, over 15009.00 frames. ], tot_loss[loss=0.09866, simple_loss=0.1149, pruned_loss=0.0301, audio_tagging_loss=0.01113, over 3046352.48 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:39:34,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=421180.0, ans=0.125 2023-11-18 21:39:35,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.17 vs. limit=22.5 2023-11-18 21:39:53,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=421246.6666666667, ans=0.125 2023-11-18 21:39:54,509 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.59 vs. limit=15.0 2023-11-18 21:39:55,065 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:40:04,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=421313.3333333333, ans=0.0 2023-11-18 21:40:08,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=421380.0, ans=0.125 2023-11-18 21:40:19,081 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3100, loss[loss=0.09494, simple_loss=0.1005, pruned_loss=0.02969, audio_tagging_loss=0.01498, over 16597.00 frames. ], tot_loss[loss=0.09945, simple_loss=0.1153, pruned_loss=0.0306, audio_tagging_loss=0.01121, over 3046330.04 frames. ], batch size: 64, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:40:35,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=421513.3333333333, ans=0.1 2023-11-18 21:40:40,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=421580.0, ans=0.125 2023-11-18 21:40:50,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=421580.0, ans=0.0 2023-11-18 21:40:55,634 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.206e+01 8.959e+01 9.848e+01 1.091e+02 1.372e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-18 21:40:58,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=421646.6666666667, ans=0.125 2023-11-18 21:41:03,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=421713.3333333333, ans=0.125 2023-11-18 21:41:06,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=421713.3333333333, ans=0.0 2023-11-18 21:41:06,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=421713.3333333333, ans=0.125 2023-11-18 21:41:14,378 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3150, loss[loss=0.08174, simple_loss=0.08912, pruned_loss=0.02301, audio_tagging_loss=0.01417, over 14733.00 frames. ], tot_loss[loss=0.09961, simple_loss=0.1156, pruned_loss=0.03051, audio_tagging_loss=0.0113, over 3051339.20 frames. ], batch size: 57, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:41:22,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=421780.0, ans=0.2 2023-11-18 21:41:22,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=421780.0, ans=0.0 2023-11-18 21:41:42,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=421913.3333333333, ans=0.1 2023-11-18 21:41:50,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=421980.0, ans=0.0 2023-11-18 21:41:56,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=421980.0, ans=0.125 2023-11-18 21:42:10,394 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3200, loss[loss=0.09502, simple_loss=0.1155, pruned_loss=0.02394, audio_tagging_loss=0.01334, over 14716.00 frames. ], tot_loss[loss=0.09907, simple_loss=0.115, pruned_loss=0.0301, audio_tagging_loss=0.01148, over 3054811.42 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:42:46,573 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.629e+01 9.130e+01 9.753e+01 1.111e+02 1.645e+02, threshold=1.951e+02, percent-clipped=0.0 2023-11-18 21:43:03,104 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.26 vs. limit=15.0 2023-11-18 21:43:05,606 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3250, loss[loss=0.1006, simple_loss=0.1189, pruned_loss=0.03058, audio_tagging_loss=0.01056, over 16313.00 frames. ], tot_loss[loss=0.09879, simple_loss=0.1145, pruned_loss=0.02997, audio_tagging_loss=0.01157, over 3051034.82 frames. ], batch size: 62, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:43:06,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=422446.6666666667, ans=0.125 2023-11-18 21:43:35,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=422580.0, ans=0.2 2023-11-18 21:43:43,274 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=422646.6666666667, ans=0.0 2023-11-18 21:43:49,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=422713.3333333333, ans=0.07 2023-11-18 21:44:01,576 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3300, loss[loss=0.09754, simple_loss=0.1192, pruned_loss=0.02961, audio_tagging_loss=0.00832, over 15687.00 frames. ], tot_loss[loss=0.09797, simple_loss=0.1137, pruned_loss=0.02952, audio_tagging_loss=0.01161, over 3050868.82 frames. ], batch size: 59, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:44:11,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=422846.6666666667, ans=0.125 2023-11-18 21:44:21,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=422846.6666666667, ans=0.125 2023-11-18 21:44:38,003 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.927e+01 9.126e+01 1.034e+02 1.155e+02 1.977e+02, threshold=2.069e+02, percent-clipped=1.0 2023-11-18 21:44:54,252 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:44:57,142 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3350, loss[loss=0.1051, simple_loss=0.1235, pruned_loss=0.03452, audio_tagging_loss=0.008815, over 14506.00 frames. ], tot_loss[loss=0.09832, simple_loss=0.1144, pruned_loss=0.0297, audio_tagging_loss=0.01144, over 3050757.91 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:45:10,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=423180.0, ans=0.2 2023-11-18 21:45:10,689 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=423180.0, ans=0.125 2023-11-18 21:45:27,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=423246.6666666667, ans=0.2 2023-11-18 21:45:30,157 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.08 vs. limit=22.5 2023-11-18 21:45:52,878 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3400, loss[loss=0.1207, simple_loss=0.1451, pruned_loss=0.04035, audio_tagging_loss=0.007854, over 15317.00 frames. ], tot_loss[loss=0.09873, simple_loss=0.1153, pruned_loss=0.02991, audio_tagging_loss=0.01115, over 3046725.36 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:45:53,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.59 vs. limit=22.5 2023-11-18 21:45:57,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=423446.6666666667, ans=0.125 2023-11-18 21:46:12,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=423513.3333333333, ans=0.0 2023-11-18 21:46:27,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=423646.6666666667, ans=0.0 2023-11-18 21:46:28,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=423646.6666666667, ans=0.125 2023-11-18 21:46:29,908 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.592e+01 8.844e+01 9.699e+01 1.055e+02 1.387e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-18 21:46:47,942 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3450, loss[loss=0.1014, simple_loss=0.1347, pruned_loss=0.02536, audio_tagging_loss=0.00869, over 14497.00 frames. ], tot_loss[loss=0.09806, simple_loss=0.1145, pruned_loss=0.02976, audio_tagging_loss=0.01103, over 3041276.98 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:46:48,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=423780.0, ans=0.125 2023-11-18 21:47:39,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=424046.6666666667, ans=0.1 2023-11-18 21:47:44,728 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3500, loss[loss=0.07185, simple_loss=0.08017, pruned_loss=0.01587, audio_tagging_loss=0.01589, over 14995.00 frames. ], tot_loss[loss=0.09761, simple_loss=0.1138, pruned_loss=0.02968, audio_tagging_loss=0.01101, over 3047697.65 frames. ], batch size: 56, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:48:03,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=424180.0, ans=0.0 2023-11-18 21:48:11,278 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:48:21,879 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.688e+01 9.190e+01 1.062e+02 1.231e+02 1.599e+02, threshold=2.123e+02, percent-clipped=0.0 2023-11-18 21:48:22,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=424313.3333333333, ans=0.1 2023-11-18 21:48:41,130 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3550, loss[loss=0.08384, simple_loss=0.09616, pruned_loss=0.02454, audio_tagging_loss=0.01122, over 15807.00 frames. ], tot_loss[loss=0.09732, simple_loss=0.1136, pruned_loss=0.02956, audio_tagging_loss=0.01097, over 3046045.09 frames. ], batch size: 61, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:48:42,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=424446.6666666667, ans=0.125 2023-11-18 21:48:45,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=424446.6666666667, ans=0.0 2023-11-18 21:48:56,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=424513.3333333333, ans=0.5 2023-11-18 21:49:02,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=424580.0, ans=0.05 2023-11-18 21:49:06,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=424580.0, ans=0.125 2023-11-18 21:49:07,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=424580.0, ans=0.1 2023-11-18 21:49:12,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=424580.0, ans=0.125 2023-11-18 21:49:14,054 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:49:17,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=424646.6666666667, ans=0.2 2023-11-18 21:49:24,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=424646.6666666667, ans=0.0 2023-11-18 21:49:35,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=424780.0, ans=0.125 2023-11-18 21:49:36,596 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3600, loss[loss=0.1035, simple_loss=0.1319, pruned_loss=0.02896, audio_tagging_loss=0.008566, over 15330.00 frames. ], tot_loss[loss=0.09666, simple_loss=0.1126, pruned_loss=0.0293, audio_tagging_loss=0.01104, over 3045928.57 frames. ], batch size: 57, lr: 1.17e-02, grad_scale: 32.0 2023-11-18 21:49:38,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=424780.0, ans=0.125 2023-11-18 21:49:51,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=424846.6666666667, ans=0.125 2023-11-18 21:49:52,316 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=14.58 vs. limit=22.5 2023-11-18 21:50:13,743 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 9.290e+01 1.027e+02 1.176e+02 1.572e+02, threshold=2.055e+02, percent-clipped=0.0 2023-11-18 21:50:24,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=425046.6666666667, ans=0.125 2023-11-18 21:50:33,112 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3650, loss[loss=0.0936, simple_loss=0.1069, pruned_loss=0.03098, audio_tagging_loss=0.009149, over 15347.00 frames. ], tot_loss[loss=0.09732, simple_loss=0.1135, pruned_loss=0.02955, audio_tagging_loss=0.01101, over 3046601.99 frames. ], batch size: 61, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:50:50,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2023-11-18 21:50:55,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=425246.6666666667, ans=0.125 2023-11-18 21:51:15,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=425313.3333333333, ans=0.125 2023-11-18 21:51:17,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=425380.0, ans=0.125 2023-11-18 21:51:20,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.82 vs. limit=15.0 2023-11-18 21:51:27,843 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.32 vs. limit=15.0 2023-11-18 21:51:29,211 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3700, loss[loss=0.1015, simple_loss=0.1266, pruned_loss=0.02774, audio_tagging_loss=0.01042, over 15050.00 frames. ], tot_loss[loss=0.09699, simple_loss=0.1133, pruned_loss=0.0294, audio_tagging_loss=0.01095, over 3050451.21 frames. ], batch size: 57, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:51:39,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=425513.3333333333, ans=0.125 2023-11-18 21:51:40,618 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 21:51:43,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=425513.3333333333, ans=0.125 2023-11-18 21:51:54,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=425580.0, ans=0.1 2023-11-18 21:52:06,435 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.585e+01 8.991e+01 9.791e+01 1.095e+02 1.443e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-18 21:52:07,053 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.23 vs. limit=15.0 2023-11-18 21:52:25,160 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3750, loss[loss=0.08374, simple_loss=0.09303, pruned_loss=0.02293, audio_tagging_loss=0.01429, over 15143.00 frames. ], tot_loss[loss=0.09719, simple_loss=0.1131, pruned_loss=0.02955, audio_tagging_loss=0.01108, over 3048539.77 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:52:25,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=425780.0, ans=0.125 2023-11-18 21:52:37,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=425846.6666666667, ans=0.0 2023-11-18 21:52:38,490 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.20 vs. limit=15.0 2023-11-18 21:52:48,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=425913.3333333333, ans=0.0 2023-11-18 21:52:52,849 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.81 vs. limit=6.0 2023-11-18 21:53:02,241 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:53:05,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=425980.0, ans=0.1 2023-11-18 21:53:14,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=426046.6666666667, ans=0.125 2023-11-18 21:53:18,396 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=426046.6666666667, ans=0.0 2023-11-18 21:53:21,311 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3800, loss[loss=0.09645, simple_loss=0.1102, pruned_loss=0.02887, audio_tagging_loss=0.01246, over 14788.00 frames. ], tot_loss[loss=0.09715, simple_loss=0.1127, pruned_loss=0.02953, audio_tagging_loss=0.01126, over 3046007.12 frames. ], batch size: 57, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:53:23,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=426113.3333333333, ans=0.125 2023-11-18 21:53:29,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=426113.3333333333, ans=0.0 2023-11-18 21:53:49,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=426246.6666666667, ans=0.95 2023-11-18 21:53:50,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=426246.6666666667, ans=0.125 2023-11-18 21:53:57,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.04 vs. limit=15.0 2023-11-18 21:53:57,909 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.638e+01 8.759e+01 9.502e+01 1.058e+02 1.503e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-18 21:54:10,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=426380.0, ans=0.07 2023-11-18 21:54:16,856 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3850, loss[loss=0.1224, simple_loss=0.1635, pruned_loss=0.03385, audio_tagging_loss=0.006797, over 16060.00 frames. ], tot_loss[loss=0.09622, simple_loss=0.1117, pruned_loss=0.02907, audio_tagging_loss=0.01132, over 3049957.24 frames. ], batch size: 57, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:54:19,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=426446.6666666667, ans=0.2 2023-11-18 21:54:20,306 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=426446.6666666667, ans=0.0 2023-11-18 21:54:23,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=426446.6666666667, ans=0.1 2023-11-18 21:54:25,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=426446.6666666667, ans=0.035 2023-11-18 21:54:29,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=426513.3333333333, ans=0.125 2023-11-18 21:54:36,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.14 vs. limit=15.0 2023-11-18 21:54:39,886 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.10 vs. limit=15.0 2023-11-18 21:54:51,051 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.93 vs. limit=12.0 2023-11-18 21:54:51,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.49 vs. limit=12.0 2023-11-18 21:54:52,946 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-64000.pt 2023-11-18 21:55:14,863 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3900, loss[loss=0.1016, simple_loss=0.1126, pruned_loss=0.0337, audio_tagging_loss=0.01159, over 14662.00 frames. ], tot_loss[loss=0.09548, simple_loss=0.1106, pruned_loss=0.02879, audio_tagging_loss=0.0114, over 3042347.52 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:55:31,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.34 vs. limit=15.0 2023-11-18 21:55:38,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=426913.3333333333, ans=0.0 2023-11-18 21:55:47,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=426980.0, ans=10.0 2023-11-18 21:55:48,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=426980.0, ans=0.125 2023-11-18 21:55:51,456 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.999e+01 9.434e+01 1.040e+02 1.132e+02 1.500e+02, threshold=2.079e+02, percent-clipped=0.0 2023-11-18 21:56:01,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=427046.6666666667, ans=0.125 2023-11-18 21:56:08,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=427046.6666666667, ans=0.1 2023-11-18 21:56:10,954 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 3950, loss[loss=0.08525, simple_loss=0.09375, pruned_loss=0.02416, audio_tagging_loss=0.01422, over 15095.00 frames. ], tot_loss[loss=0.09655, simple_loss=0.1118, pruned_loss=0.0292, audio_tagging_loss=0.01145, over 3045691.97 frames. ], batch size: 57, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:56:22,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.14 vs. limit=15.0 2023-11-18 21:56:45,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=427313.3333333333, ans=0.04949747468305833 2023-11-18 21:56:55,513 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.44 vs. limit=6.0 2023-11-18 21:56:58,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=427380.0, ans=0.125 2023-11-18 21:57:07,272 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4000, loss[loss=0.07206, simple_loss=0.07156, pruned_loss=0.02112, audio_tagging_loss=0.01516, over 14766.00 frames. ], tot_loss[loss=0.09648, simple_loss=0.1116, pruned_loss=0.02919, audio_tagging_loss=0.01147, over 3048850.37 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:57:09,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=427446.6666666667, ans=0.2 2023-11-18 21:57:16,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=427446.6666666667, ans=0.0 2023-11-18 21:57:19,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=427513.3333333333, ans=0.0 2023-11-18 21:57:19,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=427513.3333333333, ans=0.2 2023-11-18 21:57:39,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=427646.6666666667, ans=0.125 2023-11-18 21:57:43,816 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.662e+01 9.312e+01 1.008e+02 1.147e+02 1.511e+02, threshold=2.016e+02, percent-clipped=0.0 2023-11-18 21:58:02,416 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4050, loss[loss=0.09225, simple_loss=0.1087, pruned_loss=0.0271, audio_tagging_loss=0.01079, over 15538.00 frames. ], tot_loss[loss=0.09792, simple_loss=0.1135, pruned_loss=0.02974, audio_tagging_loss=0.01146, over 3051447.96 frames. ], batch size: 57, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:58:03,497 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 21:58:14,733 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.96 vs. limit=10.0 2023-11-18 21:58:47,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=428046.6666666667, ans=0.0 2023-11-18 21:58:59,667 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4100, loss[loss=0.09538, simple_loss=0.1138, pruned_loss=0.02734, audio_tagging_loss=0.01113, over 16799.00 frames. ], tot_loss[loss=0.09783, simple_loss=0.1137, pruned_loss=0.0296, audio_tagging_loss=0.0114, over 3061056.85 frames. ], batch size: 62, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:59:35,729 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.382e+01 8.951e+01 9.681e+01 1.090e+02 3.452e+02, threshold=1.936e+02, percent-clipped=1.0 2023-11-18 21:59:38,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=428313.3333333333, ans=0.0 2023-11-18 21:59:38,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=428313.3333333333, ans=0.125 2023-11-18 21:59:49,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=428380.0, ans=0.1 2023-11-18 21:59:55,350 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4150, loss[loss=0.1109, simple_loss=0.1305, pruned_loss=0.03623, audio_tagging_loss=0.009462, over 15506.00 frames. ], tot_loss[loss=0.09815, simple_loss=0.1143, pruned_loss=0.02987, audio_tagging_loss=0.01114, over 3062375.79 frames. ], batch size: 58, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 21:59:55,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=428446.6666666667, ans=0.0 2023-11-18 21:59:58,968 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2023-11-18 22:00:07,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=428513.3333333333, ans=0.2 2023-11-18 22:00:28,302 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.84 vs. limit=12.0 2023-11-18 22:00:29,682 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.28 vs. limit=6.0 2023-11-18 22:00:33,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=428646.6666666667, ans=0.0 2023-11-18 22:00:34,714 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:00:50,541 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4200, loss[loss=0.08505, simple_loss=0.1044, pruned_loss=0.02309, audio_tagging_loss=0.009771, over 15780.00 frames. ], tot_loss[loss=0.09835, simple_loss=0.1148, pruned_loss=0.03, audio_tagging_loss=0.01097, over 3065360.66 frames. ], batch size: 59, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 22:00:51,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=428780.0, ans=0.1 2023-11-18 22:01:00,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=428846.6666666667, ans=0.125 2023-11-18 22:01:21,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=428913.3333333333, ans=15.0 2023-11-18 22:01:27,562 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 8.609e+01 9.394e+01 1.081e+02 1.374e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-18 22:01:28,325 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.93 vs. limit=15.0 2023-11-18 22:01:33,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=428980.0, ans=0.05 2023-11-18 22:01:35,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=429046.6666666667, ans=0.125 2023-11-18 22:01:46,216 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4250, loss[loss=0.1182, simple_loss=0.1518, pruned_loss=0.03324, audio_tagging_loss=0.009081, over 15034.00 frames. ], tot_loss[loss=0.0985, simple_loss=0.1149, pruned_loss=0.03008, audio_tagging_loss=0.01096, over 3055992.93 frames. ], batch size: 55, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 22:01:55,124 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.77 vs. limit=15.0 2023-11-18 22:02:13,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=429246.6666666667, ans=0.125 2023-11-18 22:02:37,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=429380.0, ans=0.125 2023-11-18 22:02:43,271 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4300, loss[loss=0.135, simple_loss=0.1655, pruned_loss=0.04423, audio_tagging_loss=0.008008, over 14735.00 frames. ], tot_loss[loss=0.09847, simple_loss=0.1147, pruned_loss=0.03025, audio_tagging_loss=0.01086, over 3053877.32 frames. ], batch size: 54, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 22:02:46,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=429446.6666666667, ans=0.125 2023-11-18 22:02:58,671 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.47 vs. limit=15.0 2023-11-18 22:03:20,036 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 9.239e+01 1.003e+02 1.122e+02 1.597e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-18 22:03:38,794 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4350, loss[loss=0.1074, simple_loss=0.1385, pruned_loss=0.03082, audio_tagging_loss=0.007318, over 16434.00 frames. ], tot_loss[loss=0.09865, simple_loss=0.1152, pruned_loss=0.03016, audio_tagging_loss=0.01089, over 3047886.02 frames. ], batch size: 59, lr: 1.16e-02, grad_scale: 32.0 2023-11-18 22:03:44,599 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.97 vs. limit=15.0 2023-11-18 22:03:45,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=429780.0, ans=0.125 2023-11-18 22:03:52,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.min_positive, batch_count=429846.6666666667, ans=0.05 2023-11-18 22:04:01,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=429913.3333333333, ans=0.0 2023-11-18 22:04:33,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=430113.3333333333, ans=0.0 2023-11-18 22:04:34,585 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4400, loss[loss=0.07841, simple_loss=0.08743, pruned_loss=0.02027, audio_tagging_loss=0.01443, over 14597.00 frames. ], tot_loss[loss=0.09859, simple_loss=0.1151, pruned_loss=0.03013, audio_tagging_loss=0.01091, over 3056049.75 frames. ], batch size: 57, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:04:42,118 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.70 vs. limit=15.0 2023-11-18 22:05:01,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=430246.6666666667, ans=0.125 2023-11-18 22:05:08,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=430313.3333333333, ans=0.0 2023-11-18 22:05:11,341 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.875e+01 9.886e+01 1.073e+02 1.418e+02, threshold=1.977e+02, percent-clipped=0.0 2023-11-18 22:05:31,694 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4450, loss[loss=0.06156, simple_loss=0.06855, pruned_loss=0.01392, audio_tagging_loss=0.01337, over 15826.00 frames. ], tot_loss[loss=0.09868, simple_loss=0.115, pruned_loss=0.0303, audio_tagging_loss=0.01086, over 3060495.99 frames. ], batch size: 63, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:05:34,411 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.99 vs. limit=15.0 2023-11-18 22:05:37,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=430446.6666666667, ans=0.0 2023-11-18 22:05:45,171 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.52 vs. limit=15.0 2023-11-18 22:05:50,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=430513.3333333333, ans=0.125 2023-11-18 22:06:02,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=430580.0, ans=0.035 2023-11-18 22:06:08,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=430646.6666666667, ans=0.2 2023-11-18 22:06:14,574 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.32 vs. limit=15.0 2023-11-18 22:06:15,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=430713.3333333333, ans=0.035 2023-11-18 22:06:26,829 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4500, loss[loss=0.06311, simple_loss=0.07529, pruned_loss=0.01566, audio_tagging_loss=0.009805, over 14119.00 frames. ], tot_loss[loss=0.09821, simple_loss=0.1147, pruned_loss=0.03004, audio_tagging_loss=0.0108, over 3058607.57 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:06:27,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=430780.0, ans=0.2 2023-11-18 22:06:39,015 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.08 vs. limit=22.5 2023-11-18 22:06:56,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=430913.3333333333, ans=0.0 2023-11-18 22:07:03,971 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.610e+01 9.076e+01 9.936e+01 1.124e+02 1.630e+02, threshold=1.987e+02, percent-clipped=0.0 2023-11-18 22:07:10,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=431046.6666666667, ans=0.2 2023-11-18 22:07:21,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=431113.3333333333, ans=0.0 2023-11-18 22:07:22,445 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4550, loss[loss=0.0814, simple_loss=0.1003, pruned_loss=0.02067, audio_tagging_loss=0.01057, over 14668.00 frames. ], tot_loss[loss=0.09767, simple_loss=0.1141, pruned_loss=0.02976, audio_tagging_loss=0.01086, over 3056465.06 frames. ], batch size: 55, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:07:27,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.78 vs. limit=15.0 2023-11-18 22:07:30,882 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.25 vs. limit=22.5 2023-11-18 22:07:40,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.11 vs. limit=15.0 2023-11-18 22:08:02,173 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:08:18,467 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4600, loss[loss=0.1019, simple_loss=0.12, pruned_loss=0.03251, audio_tagging_loss=0.009395, over 14854.00 frames. ], tot_loss[loss=0.09817, simple_loss=0.1148, pruned_loss=0.02989, audio_tagging_loss=0.0109, over 3049681.09 frames. ], batch size: 55, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:08:21,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=431446.6666666667, ans=0.0 2023-11-18 22:08:24,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=431446.6666666667, ans=0.125 2023-11-18 22:08:44,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=431580.0, ans=0.1 2023-11-18 22:08:50,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=431646.6666666667, ans=0.1 2023-11-18 22:08:55,073 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.849e+01 8.959e+01 9.865e+01 1.112e+02 1.512e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 22:08:57,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=431646.6666666667, ans=0.125 2023-11-18 22:09:11,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=431713.3333333333, ans=0.125 2023-11-18 22:09:11,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=431713.3333333333, ans=0.125 2023-11-18 22:09:14,277 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4650, loss[loss=0.1135, simple_loss=0.1437, pruned_loss=0.03418, audio_tagging_loss=0.007459, over 14746.00 frames. ], tot_loss[loss=0.09807, simple_loss=0.1143, pruned_loss=0.02988, audio_tagging_loss=0.01104, over 3046918.06 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:09:28,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=431846.6666666667, ans=0.0 2023-11-18 22:09:32,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=431846.6666666667, ans=0.125 2023-11-18 22:09:36,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=431913.3333333333, ans=0.0 2023-11-18 22:09:36,975 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=15.0 2023-11-18 22:09:40,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=431913.3333333333, ans=0.1 2023-11-18 22:09:52,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=431980.0, ans=0.0 2023-11-18 22:09:56,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=431980.0, ans=0.04949747468305833 2023-11-18 22:10:02,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=432046.6666666667, ans=0.125 2023-11-18 22:10:06,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=432046.6666666667, ans=0.125 2023-11-18 22:10:09,903 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4700, loss[loss=0.1041, simple_loss=0.1221, pruned_loss=0.03124, audio_tagging_loss=0.01185, over 14794.00 frames. ], tot_loss[loss=0.09768, simple_loss=0.1137, pruned_loss=0.02973, audio_tagging_loss=0.01112, over 3049441.41 frames. ], batch size: 56, lr: 1.16e-02, grad_scale: 64.0 2023-11-18 22:10:10,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=432113.3333333333, ans=0.0 2023-11-18 22:10:13,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=432113.3333333333, ans=0.2 2023-11-18 22:10:18,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=432113.3333333333, ans=0.0 2023-11-18 22:10:25,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=432180.0, ans=0.125 2023-11-18 22:10:33,247 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.53 vs. limit=15.0 2023-11-18 22:10:35,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=432246.6666666667, ans=0.125 2023-11-18 22:10:47,009 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.309e+01 8.805e+01 9.825e+01 1.121e+02 1.529e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-18 22:11:06,127 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4750, loss[loss=0.07659, simple_loss=0.07549, pruned_loss=0.02051, audio_tagging_loss=0.01833, over 14482.00 frames. ], tot_loss[loss=0.09797, simple_loss=0.1135, pruned_loss=0.02984, audio_tagging_loss=0.01135, over 3039569.15 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:11:06,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=432446.6666666667, ans=0.0 2023-11-18 22:11:11,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=432446.6666666667, ans=0.125 2023-11-18 22:11:16,431 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:11:23,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=432513.3333333333, ans=0.2 2023-11-18 22:11:53,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=432713.3333333333, ans=0.1 2023-11-18 22:12:02,256 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4800, loss[loss=0.09036, simple_loss=0.09266, pruned_loss=0.0316, audio_tagging_loss=0.01242, over 14491.00 frames. ], tot_loss[loss=0.09796, simple_loss=0.1133, pruned_loss=0.02984, audio_tagging_loss=0.01145, over 3043394.96 frames. ], batch size: 58, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:12:09,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=432780.0, ans=0.0 2023-11-18 22:12:13,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=432846.6666666667, ans=0.5 2023-11-18 22:12:22,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=14.67 vs. limit=15.0 2023-11-18 22:12:26,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=432913.3333333333, ans=0.2 2023-11-18 22:12:28,453 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=10.61 vs. limit=15.0 2023-11-18 22:12:28,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=432913.3333333333, ans=0.125 2023-11-18 22:12:39,893 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.105e+01 8.787e+01 9.594e+01 1.036e+02 1.388e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-18 22:12:57,422 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4850, loss[loss=0.1159, simple_loss=0.1375, pruned_loss=0.03588, audio_tagging_loss=0.01126, over 14960.00 frames. ], tot_loss[loss=0.0991, simple_loss=0.1149, pruned_loss=0.03025, audio_tagging_loss=0.01138, over 3044448.56 frames. ], batch size: 57, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:12:59,033 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.94 vs. limit=15.0 2023-11-18 22:13:09,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.23 vs. limit=15.0 2023-11-18 22:13:17,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=433180.0, ans=0.1 2023-11-18 22:13:51,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.15 vs. limit=15.0 2023-11-18 22:13:53,999 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4900, loss[loss=0.08301, simple_loss=0.09111, pruned_loss=0.02246, audio_tagging_loss=0.01499, over 14533.00 frames. ], tot_loss[loss=0.09908, simple_loss=0.1153, pruned_loss=0.03014, audio_tagging_loss=0.01129, over 3043858.97 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:14:01,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=433446.6666666667, ans=0.125 2023-11-18 22:14:15,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=433580.0, ans=0.125 2023-11-18 22:14:32,279 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.047e+01 8.759e+01 9.245e+01 1.025e+02 1.316e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-18 22:14:32,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=433646.6666666667, ans=0.04949747468305833 2023-11-18 22:14:34,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=433646.6666666667, ans=0.125 2023-11-18 22:14:49,985 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 4950, loss[loss=0.09451, simple_loss=0.1152, pruned_loss=0.02529, audio_tagging_loss=0.01161, over 14537.00 frames. ], tot_loss[loss=0.09835, simple_loss=0.1147, pruned_loss=0.0298, audio_tagging_loss=0.01122, over 3044591.50 frames. ], batch size: 53, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:14:53,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=433780.0, ans=0.0 2023-11-18 22:14:58,604 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.28 vs. limit=15.0 2023-11-18 22:15:18,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=433913.3333333333, ans=0.125 2023-11-18 22:15:43,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=434046.6666666667, ans=0.0 2023-11-18 22:15:44,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.whiten.whitening_limit, batch_count=434046.6666666667, ans=15.0 2023-11-18 22:15:45,214 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.25 vs. limit=15.0 2023-11-18 22:15:45,753 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5000, loss[loss=0.1086, simple_loss=0.1315, pruned_loss=0.03367, audio_tagging_loss=0.009197, over 14491.00 frames. ], tot_loss[loss=0.09831, simple_loss=0.1149, pruned_loss=0.02985, audio_tagging_loss=0.01103, over 3040205.84 frames. ], batch size: 53, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:16:01,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=434180.0, ans=0.07 2023-11-18 22:16:05,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=434180.0, ans=0.0 2023-11-18 22:16:06,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=434180.0, ans=0.125 2023-11-18 22:16:23,431 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.033e+01 8.790e+01 9.696e+01 1.074e+02 1.675e+02, threshold=1.939e+02, percent-clipped=0.0 2023-11-18 22:16:24,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=434313.3333333333, ans=0.1 2023-11-18 22:16:41,998 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5050, loss[loss=0.1066, simple_loss=0.1126, pruned_loss=0.036, audio_tagging_loss=0.01431, over 14895.00 frames. ], tot_loss[loss=0.09867, simple_loss=0.1152, pruned_loss=0.03008, audio_tagging_loss=0.01097, over 3039439.35 frames. ], batch size: 55, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:16:55,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=434513.3333333333, ans=0.125 2023-11-18 22:16:58,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=434513.3333333333, ans=0.0 2023-11-18 22:17:06,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=434580.0, ans=0.125 2023-11-18 22:17:09,333 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.66 vs. limit=6.0 2023-11-18 22:17:12,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=434580.0, ans=0.125 2023-11-18 22:17:27,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=434713.3333333333, ans=0.0 2023-11-18 22:17:38,163 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5100, loss[loss=0.1004, simple_loss=0.1164, pruned_loss=0.03032, audio_tagging_loss=0.01188, over 15173.00 frames. ], tot_loss[loss=0.09828, simple_loss=0.1148, pruned_loss=0.02994, audio_tagging_loss=0.01094, over 3034550.54 frames. ], batch size: 57, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:18:16,622 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.735e+01 9.607e+01 1.051e+02 1.879e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-18 22:18:18,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=434980.0, ans=0.0 2023-11-18 22:18:19,381 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.67 vs. limit=15.0 2023-11-18 22:18:33,465 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5150, loss[loss=0.07239, simple_loss=0.07796, pruned_loss=0.02087, audio_tagging_loss=0.01254, over 14781.00 frames. ], tot_loss[loss=0.09735, simple_loss=0.1136, pruned_loss=0.02955, audio_tagging_loss=0.01102, over 3031958.73 frames. ], batch size: 57, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:18:38,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=435113.3333333333, ans=0.125 2023-11-18 22:18:40,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=435113.3333333333, ans=0.015 2023-11-18 22:18:43,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.72 vs. limit=15.0 2023-11-18 22:18:49,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=435180.0, ans=0.2 2023-11-18 22:18:53,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=435180.0, ans=0.1 2023-11-18 22:19:11,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.19 vs. limit=15.0 2023-11-18 22:19:30,433 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5200, loss[loss=0.08334, simple_loss=0.09727, pruned_loss=0.02179, audio_tagging_loss=0.01291, over 15512.00 frames. ], tot_loss[loss=0.09743, simple_loss=0.1136, pruned_loss=0.02967, audio_tagging_loss=0.01097, over 3025696.55 frames. ], batch size: 59, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:19:32,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=435446.6666666667, ans=0.1 2023-11-18 22:19:32,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=435446.6666666667, ans=0.2 2023-11-18 22:19:41,039 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.44 vs. limit=12.0 2023-11-18 22:19:48,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=435513.3333333333, ans=0.1 2023-11-18 22:19:57,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=435580.0, ans=0.0 2023-11-18 22:20:00,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=435580.0, ans=0.1 2023-11-18 22:20:08,097 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.80 vs. limit=15.0 2023-11-18 22:20:09,695 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.828e+01 8.948e+01 9.784e+01 1.083e+02 1.629e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-18 22:20:22,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=435713.3333333333, ans=0.125 2023-11-18 22:20:25,525 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5250, loss[loss=0.1032, simple_loss=0.1148, pruned_loss=0.03676, audio_tagging_loss=0.009058, over 15247.00 frames. ], tot_loss[loss=0.09717, simple_loss=0.1131, pruned_loss=0.02966, audio_tagging_loss=0.01098, over 3025073.11 frames. ], batch size: 61, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:20:55,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=435913.3333333333, ans=0.125 2023-11-18 22:20:58,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=435980.0, ans=0.125 2023-11-18 22:21:04,787 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.84 vs. limit=15.0 2023-11-18 22:21:10,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=436046.6666666667, ans=0.0 2023-11-18 22:21:21,108 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5300, loss[loss=0.119, simple_loss=0.1376, pruned_loss=0.03767, audio_tagging_loss=0.01255, over 16361.00 frames. ], tot_loss[loss=0.09784, simple_loss=0.114, pruned_loss=0.02979, audio_tagging_loss=0.01105, over 3039437.18 frames. ], batch size: 60, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:21:23,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=436113.3333333333, ans=0.0 2023-11-18 22:22:01,695 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.866e+01 8.659e+01 9.451e+01 1.050e+02 1.358e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-18 22:22:12,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=436380.0, ans=0.125 2023-11-18 22:22:17,480 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5350, loss[loss=0.07512, simple_loss=0.07836, pruned_loss=0.02256, audio_tagging_loss=0.01338, over 16409.00 frames. ], tot_loss[loss=0.09688, simple_loss=0.1129, pruned_loss=0.0294, audio_tagging_loss=0.01104, over 3037796.84 frames. ], batch size: 63, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:22:28,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=436513.3333333333, ans=0.125 2023-11-18 22:22:37,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=436513.3333333333, ans=10.0 2023-11-18 22:22:37,253 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2023-11-18 22:22:44,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=436580.0, ans=0.125 2023-11-18 22:22:50,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=436646.6666666667, ans=0.1 2023-11-18 22:22:52,318 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=13.13 vs. limit=22.5 2023-11-18 22:22:58,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=436646.6666666667, ans=0.125 2023-11-18 22:23:00,771 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.05 vs. limit=6.0 2023-11-18 22:23:08,171 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=10.81 vs. limit=15.0 2023-11-18 22:23:13,378 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5400, loss[loss=0.1097, simple_loss=0.1296, pruned_loss=0.03283, audio_tagging_loss=0.01208, over 16471.00 frames. ], tot_loss[loss=0.09687, simple_loss=0.1128, pruned_loss=0.02935, audio_tagging_loss=0.01111, over 3040699.52 frames. ], batch size: 59, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:23:38,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=436913.3333333333, ans=0.1 2023-11-18 22:23:40,868 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2023-11-18 22:23:53,554 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.845e+01 9.112e+01 1.017e+02 1.141e+02 1.585e+02, threshold=2.034e+02, percent-clipped=0.0 2023-11-18 22:23:59,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=437046.6666666667, ans=0.125 2023-11-18 22:24:01,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=437046.6666666667, ans=0.0 2023-11-18 22:24:08,353 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5450, loss[loss=0.09197, simple_loss=0.1069, pruned_loss=0.02833, audio_tagging_loss=0.01018, over 15256.00 frames. ], tot_loss[loss=0.09774, simple_loss=0.1137, pruned_loss=0.02975, audio_tagging_loss=0.01116, over 3043294.35 frames. ], batch size: 57, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:24:23,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=437180.0, ans=0.2 2023-11-18 22:24:37,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.68 vs. limit=10.0 2023-11-18 22:25:04,305 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5500, loss[loss=0.09811, simple_loss=0.1178, pruned_loss=0.02882, audio_tagging_loss=0.01041, over 15597.00 frames. ], tot_loss[loss=0.09646, simple_loss=0.112, pruned_loss=0.02913, audio_tagging_loss=0.01133, over 3043388.73 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:25:04,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=437446.6666666667, ans=0.125 2023-11-18 22:25:06,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=437446.6666666667, ans=0.0 2023-11-18 22:25:09,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=437446.6666666667, ans=0.125 2023-11-18 22:25:10,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=437446.6666666667, ans=0.125 2023-11-18 22:25:14,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437446.6666666667, ans=0.1 2023-11-18 22:25:16,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=437513.3333333333, ans=0.125 2023-11-18 22:25:25,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=437513.3333333333, ans=0.0 2023-11-18 22:25:38,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=437646.6666666667, ans=0.125 2023-11-18 22:25:44,446 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.355e+01 8.778e+01 9.490e+01 1.043e+02 1.354e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-18 22:25:44,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=437646.6666666667, ans=0.125 2023-11-18 22:25:48,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=437713.3333333333, ans=0.1 2023-11-18 22:26:00,859 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5550, loss[loss=0.08399, simple_loss=0.08817, pruned_loss=0.02718, audio_tagging_loss=0.01273, over 14834.00 frames. ], tot_loss[loss=0.0976, simple_loss=0.1137, pruned_loss=0.0295, audio_tagging_loss=0.01128, over 3049890.49 frames. ], batch size: 57, lr: 1.15e-02, grad_scale: 16.0 2023-11-18 22:26:06,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=437780.0, ans=0.125 2023-11-18 22:26:07,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=437780.0, ans=0.125 2023-11-18 22:26:29,336 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.33 vs. limit=22.5 2023-11-18 22:26:30,515 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.63 vs. limit=15.0 2023-11-18 22:26:39,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=437980.0, ans=0.0 2023-11-18 22:26:45,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=438046.6666666667, ans=0.0 2023-11-18 22:26:55,685 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5600, loss[loss=0.1162, simple_loss=0.1398, pruned_loss=0.03422, audio_tagging_loss=0.01212, over 16494.00 frames. ], tot_loss[loss=0.09844, simple_loss=0.1147, pruned_loss=0.02981, audio_tagging_loss=0.0113, over 3059298.00 frames. ], batch size: 60, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:27:21,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=438246.6666666667, ans=0.125 2023-11-18 22:27:34,865 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:27:35,842 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.410e+01 9.319e+01 9.867e+01 1.109e+02 1.388e+02, threshold=1.973e+02, percent-clipped=0.0 2023-11-18 22:27:37,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=438313.3333333333, ans=0.125 2023-11-18 22:27:38,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=438313.3333333333, ans=0.2 2023-11-18 22:27:51,225 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5650, loss[loss=0.1197, simple_loss=0.1394, pruned_loss=0.04131, audio_tagging_loss=0.008682, over 14849.00 frames. ], tot_loss[loss=0.09845, simple_loss=0.1144, pruned_loss=0.02985, audio_tagging_loss=0.0114, over 3054712.31 frames. ], batch size: 54, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:27:53,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=438446.6666666667, ans=0.2 2023-11-18 22:27:57,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=438446.6666666667, ans=0.2 2023-11-18 22:27:57,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=438446.6666666667, ans=0.0 2023-11-18 22:27:59,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=438446.6666666667, ans=0.125 2023-11-18 22:28:22,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_ff3.min_abs, batch_count=438580.0, ans=0.2 2023-11-18 22:28:24,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=438646.6666666667, ans=0.0 2023-11-18 22:28:38,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=438713.3333333333, ans=0.1 2023-11-18 22:28:40,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=438713.3333333333, ans=0.0 2023-11-18 22:28:41,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=438713.3333333333, ans=0.2 2023-11-18 22:28:46,901 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5700, loss[loss=0.1198, simple_loss=0.1378, pruned_loss=0.04096, audio_tagging_loss=0.009959, over 16586.00 frames. ], tot_loss[loss=0.09806, simple_loss=0.1138, pruned_loss=0.02982, audio_tagging_loss=0.01133, over 3046984.40 frames. ], batch size: 62, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:29:27,085 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.478e+01 8.890e+01 9.546e+01 1.053e+02 1.340e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-18 22:29:43,002 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5750, loss[loss=0.09597, simple_loss=0.1118, pruned_loss=0.02855, audio_tagging_loss=0.01154, over 15173.00 frames. ], tot_loss[loss=0.09849, simple_loss=0.1146, pruned_loss=0.03004, audio_tagging_loss=0.01114, over 3047682.55 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:29:51,821 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.22 vs. limit=15.0 2023-11-18 22:29:51,920 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.46 vs. limit=15.0 2023-11-18 22:30:08,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=439246.6666666667, ans=0.125 2023-11-18 22:30:10,690 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.01 vs. limit=15.0 2023-11-18 22:30:10,807 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2023-11-18 22:30:35,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=439380.0, ans=0.125 2023-11-18 22:30:37,464 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5800, loss[loss=0.1108, simple_loss=0.1438, pruned_loss=0.0291, audio_tagging_loss=0.009837, over 16432.00 frames. ], tot_loss[loss=0.09884, simple_loss=0.1154, pruned_loss=0.03017, audio_tagging_loss=0.01095, over 3043416.82 frames. ], batch size: 56, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:30:37,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=439446.6666666667, ans=0.0 2023-11-18 22:30:43,717 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.31 vs. limit=22.5 2023-11-18 22:30:49,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=439513.3333333333, ans=0.1 2023-11-18 22:30:54,129 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=439513.3333333333, ans=0.95 2023-11-18 22:30:55,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=439513.3333333333, ans=0.1 2023-11-18 22:31:10,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=439646.6666666667, ans=0.125 2023-11-18 22:31:17,840 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.084e+01 8.421e+01 9.751e+01 1.061e+02 1.528e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-18 22:31:33,826 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5850, loss[loss=0.1039, simple_loss=0.1314, pruned_loss=0.03031, audio_tagging_loss=0.007933, over 15087.00 frames. ], tot_loss[loss=0.09814, simple_loss=0.1148, pruned_loss=0.0298, audio_tagging_loss=0.01092, over 3034421.01 frames. ], batch size: 54, lr: 1.15e-02, grad_scale: 32.0 2023-11-18 22:31:41,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=439780.0, ans=0.0 2023-11-18 22:31:42,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=439780.0, ans=0.09899494936611666 2023-11-18 22:31:45,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=439846.6666666667, ans=0.125 2023-11-18 22:31:56,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=439913.3333333333, ans=0.125 2023-11-18 22:31:58,253 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=7.898e-01 2023-11-18 22:32:07,449 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.82 vs. limit=15.0 2023-11-18 22:32:10,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=439980.0, ans=0.125 2023-11-18 22:32:17,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=440046.6666666667, ans=0.125 2023-11-18 22:32:29,692 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5900, loss[loss=0.07743, simple_loss=0.08488, pruned_loss=0.02028, audio_tagging_loss=0.01471, over 16053.00 frames. ], tot_loss[loss=0.09775, simple_loss=0.1144, pruned_loss=0.02959, audio_tagging_loss=0.01098, over 3037982.65 frames. ], batch size: 62, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:32:35,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.63 vs. limit=15.0 2023-11-18 22:32:36,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440113.3333333333, ans=0.1 2023-11-18 22:32:40,887 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.82 vs. limit=15.0 2023-11-18 22:32:48,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=440180.0, ans=0.0 2023-11-18 22:32:49,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=440180.0, ans=0.125 2023-11-18 22:33:04,722 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.15 vs. limit=22.5 2023-11-18 22:33:09,415 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.910e+01 8.634e+01 9.404e+01 1.031e+02 1.635e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-18 22:33:22,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=440380.0, ans=0.125 2023-11-18 22:33:22,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=440380.0, ans=0.0 2023-11-18 22:33:25,042 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 5950, loss[loss=0.1149, simple_loss=0.1389, pruned_loss=0.03501, audio_tagging_loss=0.01045, over 14643.00 frames. ], tot_loss[loss=0.09712, simple_loss=0.1138, pruned_loss=0.02929, audio_tagging_loss=0.01095, over 3034859.80 frames. ], batch size: 55, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:33:29,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=440446.6666666667, ans=0.1 2023-11-18 22:33:43,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=440513.3333333333, ans=0.125 2023-11-18 22:34:02,780 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.96 vs. limit=10.0 2023-11-18 22:34:04,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=440646.6666666667, ans=0.125 2023-11-18 22:34:21,295 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6000, loss[loss=0.0899, simple_loss=0.1019, pruned_loss=0.02645, audio_tagging_loss=0.01251, over 14370.00 frames. ], tot_loss[loss=0.09634, simple_loss=0.1127, pruned_loss=0.02899, audio_tagging_loss=0.01099, over 3033337.22 frames. ], batch size: 54, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:34:21,297 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 22:34:54,500 INFO [train_asr.py:1147] (0/4) Epoch 6, validation: loss=0.07034, simple_loss=0.0589, pruned_loss=0.008199, audio_tagging_loss=0.03269, over 4681554.00 frames. 2023-11-18 22:34:54,501 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 22:35:33,142 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:35:34,146 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.702e+01 9.372e+01 1.021e+02 1.628e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-18 22:35:39,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=441046.6666666667, ans=0.125 2023-11-18 22:35:50,042 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6050, loss[loss=0.07515, simple_loss=0.09282, pruned_loss=0.01722, audio_tagging_loss=0.01151, over 16160.00 frames. ], tot_loss[loss=0.09699, simple_loss=0.1134, pruned_loss=0.02928, audio_tagging_loss=0.011, over 3037327.84 frames. ], batch size: 62, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:36:29,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=441313.3333333333, ans=0.1 2023-11-18 22:36:34,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=441380.0, ans=0.2 2023-11-18 22:36:44,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.68 vs. limit=15.0 2023-11-18 22:36:46,621 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6100, loss[loss=0.08215, simple_loss=0.09903, pruned_loss=0.0247, audio_tagging_loss=0.007934, over 14967.00 frames. ], tot_loss[loss=0.09747, simple_loss=0.1144, pruned_loss=0.02943, audio_tagging_loss=0.01085, over 3049540.27 frames. ], batch size: 56, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:37:12,034 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.67 vs. limit=15.0 2023-11-18 22:37:14,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=441580.0, ans=0.125 2023-11-18 22:37:23,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=441646.6666666667, ans=0.1 2023-11-18 22:37:26,318 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.360e+01 8.924e+01 9.623e+01 1.090e+02 1.421e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-18 22:37:39,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=441713.3333333333, ans=0.125 2023-11-18 22:37:39,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=441713.3333333333, ans=0.125 2023-11-18 22:37:41,750 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6150, loss[loss=0.1397, simple_loss=0.1643, pruned_loss=0.04951, audio_tagging_loss=0.008087, over 14731.00 frames. ], tot_loss[loss=0.09837, simple_loss=0.1154, pruned_loss=0.02978, audio_tagging_loss=0.01091, over 3044350.78 frames. ], batch size: 56, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:37:41,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=441780.0, ans=0.09899494936611666 2023-11-18 22:38:04,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=441913.3333333333, ans=0.0 2023-11-18 22:38:32,117 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:38:37,172 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6200, loss[loss=0.09545, simple_loss=0.09303, pruned_loss=0.03382, audio_tagging_loss=0.01511, over 14598.00 frames. ], tot_loss[loss=0.09829, simple_loss=0.1147, pruned_loss=0.0299, audio_tagging_loss=0.01105, over 3048105.87 frames. ], batch size: 56, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:38:39,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=442113.3333333333, ans=0.0 2023-11-18 22:39:01,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=442246.6666666667, ans=0.125 2023-11-18 22:39:17,403 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.454e+01 8.898e+01 9.637e+01 1.062e+02 1.709e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-18 22:39:19,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=442313.3333333333, ans=0.125 2023-11-18 22:39:27,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.min_positive, batch_count=442380.0, ans=0.05 2023-11-18 22:39:28,142 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.53 vs. limit=15.0 2023-11-18 22:39:33,418 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6250, loss[loss=0.08338, simple_loss=0.08986, pruned_loss=0.0237, audio_tagging_loss=0.01475, over 15875.00 frames. ], tot_loss[loss=0.09757, simple_loss=0.1136, pruned_loss=0.02959, audio_tagging_loss=0.0112, over 3055430.17 frames. ], batch size: 61, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:39:36,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=442446.6666666667, ans=0.125 2023-11-18 22:39:37,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=442446.6666666667, ans=0.1 2023-11-18 22:39:49,420 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.59 vs. limit=6.0 2023-11-18 22:40:06,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=442646.6666666667, ans=0.0 2023-11-18 22:40:09,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=442646.6666666667, ans=0.95 2023-11-18 22:40:15,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=442646.6666666667, ans=0.0 2023-11-18 22:40:18,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=442713.3333333333, ans=0.2 2023-11-18 22:40:26,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=442713.3333333333, ans=0.0 2023-11-18 22:40:28,009 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.86 vs. limit=15.0 2023-11-18 22:40:29,547 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6300, loss[loss=0.1065, simple_loss=0.1265, pruned_loss=0.03252, audio_tagging_loss=0.01075, over 15284.00 frames. ], tot_loss[loss=0.09808, simple_loss=0.1145, pruned_loss=0.02968, audio_tagging_loss=0.01116, over 3054269.08 frames. ], batch size: 57, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:40:34,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=442780.0, ans=0.125 2023-11-18 22:40:35,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=442780.0, ans=0.0 2023-11-18 22:40:44,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=442846.6666666667, ans=10.0 2023-11-18 22:40:46,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=442846.6666666667, ans=0.125 2023-11-18 22:40:51,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=442913.3333333333, ans=0.1 2023-11-18 22:41:09,551 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.517e+01 8.985e+01 9.817e+01 1.039e+02 1.348e+02, threshold=1.963e+02, percent-clipped=0.0 2023-11-18 22:41:14,343 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.91 vs. limit=15.0 2023-11-18 22:41:22,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=443046.6666666667, ans=0.125 2023-11-18 22:41:24,958 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6350, loss[loss=0.1375, simple_loss=0.1594, pruned_loss=0.05014, audio_tagging_loss=0.007687, over 14968.00 frames. ], tot_loss[loss=0.09802, simple_loss=0.1145, pruned_loss=0.02955, audio_tagging_loss=0.01123, over 3058272.03 frames. ], batch size: 56, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:41:33,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=443113.3333333333, ans=0.09899494936611666 2023-11-18 22:41:37,034 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=22.5 2023-11-18 22:41:53,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=443246.6666666667, ans=0.0 2023-11-18 22:41:58,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=443313.3333333333, ans=0.125 2023-11-18 22:42:03,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.08 vs. limit=15.0 2023-11-18 22:42:21,096 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6400, loss[loss=0.08749, simple_loss=0.09554, pruned_loss=0.02447, audio_tagging_loss=0.01525, over 14644.00 frames. ], tot_loss[loss=0.09814, simple_loss=0.1147, pruned_loss=0.02954, audio_tagging_loss=0.01127, over 3042684.41 frames. ], batch size: 53, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:42:21,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=443446.6666666667, ans=0.2 2023-11-18 22:42:26,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=443446.6666666667, ans=0.0 2023-11-18 22:43:01,449 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 8.672e+01 9.519e+01 1.064e+02 1.432e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-18 22:43:06,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=443713.3333333333, ans=0.0 2023-11-18 22:43:09,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=443713.3333333333, ans=0.125 2023-11-18 22:43:16,888 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6450, loss[loss=0.08184, simple_loss=0.0914, pruned_loss=0.02341, audio_tagging_loss=0.01272, over 13160.00 frames. ], tot_loss[loss=0.09785, simple_loss=0.1141, pruned_loss=0.02947, audio_tagging_loss=0.01133, over 3032890.34 frames. ], batch size: 53, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:43:45,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=443913.3333333333, ans=0.1 2023-11-18 22:43:51,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=443980.0, ans=0.1 2023-11-18 22:43:54,981 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.43 vs. limit=12.0 2023-11-18 22:44:10,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=444046.6666666667, ans=0.1 2023-11-18 22:44:12,487 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6500, loss[loss=0.1002, simple_loss=0.1165, pruned_loss=0.02965, audio_tagging_loss=0.01229, over 15011.00 frames. ], tot_loss[loss=0.09728, simple_loss=0.1133, pruned_loss=0.02928, audio_tagging_loss=0.01133, over 3036238.28 frames. ], batch size: 57, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:44:12,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=444113.3333333333, ans=0.1 2023-11-18 22:44:13,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=444113.3333333333, ans=0.2 2023-11-18 22:44:21,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=444113.3333333333, ans=0.125 2023-11-18 22:44:52,095 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.35 vs. limit=10.0 2023-11-18 22:44:52,543 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.212e+01 8.587e+01 9.452e+01 1.044e+02 1.613e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-18 22:44:56,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=444380.0, ans=0.125 2023-11-18 22:45:09,097 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6550, loss[loss=0.1058, simple_loss=0.1226, pruned_loss=0.03277, audio_tagging_loss=0.01169, over 16931.00 frames. ], tot_loss[loss=0.09741, simple_loss=0.1137, pruned_loss=0.02933, audio_tagging_loss=0.01124, over 3041262.97 frames. ], batch size: 65, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:45:09,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=444446.6666666667, ans=0.125 2023-11-18 22:45:43,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=444646.6666666667, ans=0.125 2023-11-18 22:45:43,634 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.40 vs. limit=15.0 2023-11-18 22:45:44,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=444646.6666666667, ans=0.04949747468305833 2023-11-18 22:45:57,834 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.46 vs. limit=22.5 2023-11-18 22:46:04,536 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6600, loss[loss=0.09994, simple_loss=0.113, pruned_loss=0.03035, audio_tagging_loss=0.01307, over 14127.00 frames. ], tot_loss[loss=0.09731, simple_loss=0.1137, pruned_loss=0.02936, audio_tagging_loss=0.01112, over 3042919.35 frames. ], batch size: 54, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:46:24,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=444846.6666666667, ans=0.125 2023-11-18 22:46:42,105 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.74 vs. limit=15.0 2023-11-18 22:46:44,722 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.299e+01 9.179e+01 9.898e+01 1.109e+02 1.412e+02, threshold=1.980e+02, percent-clipped=0.0 2023-11-18 22:46:45,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=444980.0, ans=0.125 2023-11-18 22:46:50,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=445046.6666666667, ans=0.0 2023-11-18 22:46:59,490 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6650, loss[loss=0.08745, simple_loss=0.09904, pruned_loss=0.02396, audio_tagging_loss=0.01397, over 14436.00 frames. ], tot_loss[loss=0.0969, simple_loss=0.1131, pruned_loss=0.02929, audio_tagging_loss=0.01106, over 3031769.66 frames. ], batch size: 55, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:47:02,162 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.53 vs. limit=12.0 2023-11-18 22:47:31,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=445246.6666666667, ans=0.125 2023-11-18 22:47:34,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=445313.3333333333, ans=0.125 2023-11-18 22:47:35,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=445313.3333333333, ans=0.1 2023-11-18 22:47:35,747 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.39 vs. limit=15.0 2023-11-18 22:47:43,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=445380.0, ans=0.125 2023-11-18 22:47:54,898 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6700, loss[loss=0.087, simple_loss=0.1037, pruned_loss=0.0255, audio_tagging_loss=0.00966, over 14950.00 frames. ], tot_loss[loss=0.09675, simple_loss=0.1126, pruned_loss=0.02934, audio_tagging_loss=0.01109, over 3033149.46 frames. ], batch size: 56, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:48:10,919 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.28 vs. limit=15.0 2023-11-18 22:48:32,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=445646.6666666667, ans=0.125 2023-11-18 22:48:36,160 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.560e+01 9.188e+01 9.958e+01 1.118e+02 1.458e+02, threshold=1.992e+02, percent-clipped=0.0 2023-11-18 22:48:51,582 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6750, loss[loss=0.1122, simple_loss=0.1383, pruned_loss=0.03369, audio_tagging_loss=0.009356, over 15386.00 frames. ], tot_loss[loss=0.0962, simple_loss=0.1119, pruned_loss=0.02903, audio_tagging_loss=0.01121, over 3029286.17 frames. ], batch size: 56, lr: 1.14e-02, grad_scale: 16.0 2023-11-18 22:48:55,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=445780.0, ans=0.0 2023-11-18 22:49:00,865 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.30 vs. limit=22.5 2023-11-18 22:49:03,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=445846.6666666667, ans=0.125 2023-11-18 22:49:27,820 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.66 vs. limit=12.0 2023-11-18 22:49:33,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.12 vs. limit=15.0 2023-11-18 22:49:39,838 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.14 vs. limit=10.0 2023-11-18 22:49:40,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=446046.6666666667, ans=0.0 2023-11-18 22:49:46,737 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6800, loss[loss=0.1095, simple_loss=0.1339, pruned_loss=0.0351, audio_tagging_loss=0.007413, over 15350.00 frames. ], tot_loss[loss=0.09597, simple_loss=0.1116, pruned_loss=0.02893, audio_tagging_loss=0.01125, over 3025490.75 frames. ], batch size: 56, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:49:53,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=446113.3333333333, ans=0.2 2023-11-18 22:50:04,817 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2023-11-18 22:50:21,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.01 vs. limit=22.5 2023-11-18 22:50:27,944 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.487e+01 8.920e+01 9.995e+01 1.137e+02 1.788e+02, threshold=1.999e+02, percent-clipped=0.0 2023-11-18 22:50:36,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=446380.0, ans=0.125 2023-11-18 22:50:42,349 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6850, loss[loss=0.09886, simple_loss=0.112, pruned_loss=0.03262, audio_tagging_loss=0.01024, over 14970.00 frames. ], tot_loss[loss=0.0965, simple_loss=0.1126, pruned_loss=0.02918, audio_tagging_loss=0.01103, over 3025204.63 frames. ], batch size: 58, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:50:45,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=446446.6666666667, ans=0.0 2023-11-18 22:51:11,087 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.42 vs. limit=10.0 2023-11-18 22:51:21,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=446646.6666666667, ans=0.0 2023-11-18 22:51:38,583 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.33 vs. limit=12.0 2023-11-18 22:51:39,183 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6900, loss[loss=0.0864, simple_loss=0.1042, pruned_loss=0.02321, audio_tagging_loss=0.0111, over 15493.00 frames. ], tot_loss[loss=0.09724, simple_loss=0.1136, pruned_loss=0.02942, audio_tagging_loss=0.01103, over 3026433.89 frames. ], batch size: 59, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:51:41,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=446780.0, ans=0.1 2023-11-18 22:51:47,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=446780.0, ans=0.125 2023-11-18 22:51:49,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=446846.6666666667, ans=0.1 2023-11-18 22:51:49,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=446846.6666666667, ans=0.0 2023-11-18 22:51:59,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=446913.3333333333, ans=10.0 2023-11-18 22:52:19,850 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.499e+01 9.182e+01 9.955e+01 1.058e+02 1.430e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-18 22:52:20,941 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 22:52:34,142 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 6950, loss[loss=0.0716, simple_loss=0.08974, pruned_loss=0.01576, audio_tagging_loss=0.01096, over 14798.00 frames. ], tot_loss[loss=0.09699, simple_loss=0.1134, pruned_loss=0.02931, audio_tagging_loss=0.011, over 3035191.38 frames. ], batch size: 56, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:52:39,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=447113.3333333333, ans=0.125 2023-11-18 22:52:44,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=447180.0, ans=0.125 2023-11-18 22:53:11,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=447313.3333333333, ans=10.0 2023-11-18 22:53:13,614 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 22:53:17,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=447380.0, ans=0.125 2023-11-18 22:53:27,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=447380.0, ans=0.0 2023-11-18 22:53:29,847 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7000, loss[loss=0.09576, simple_loss=0.108, pruned_loss=0.02942, audio_tagging_loss=0.01236, over 15306.00 frames. ], tot_loss[loss=0.09738, simple_loss=0.1138, pruned_loss=0.02952, audio_tagging_loss=0.01096, over 3034321.57 frames. ], batch size: 57, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:53:37,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=447446.6666666667, ans=0.1 2023-11-18 22:54:05,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=447646.6666666667, ans=0.1 2023-11-18 22:54:05,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=447646.6666666667, ans=0.2 2023-11-18 22:54:10,526 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.357e+01 8.734e+01 9.498e+01 1.045e+02 1.881e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-18 22:54:21,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=447713.3333333333, ans=0.125 2023-11-18 22:54:24,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=447713.3333333333, ans=0.0 2023-11-18 22:54:25,865 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7050, loss[loss=0.1073, simple_loss=0.1246, pruned_loss=0.03371, audio_tagging_loss=0.01127, over 14761.00 frames. ], tot_loss[loss=0.09734, simple_loss=0.1132, pruned_loss=0.02959, audio_tagging_loss=0.01114, over 3035828.67 frames. ], batch size: 53, lr: 1.14e-02, grad_scale: 32.0 2023-11-18 22:54:39,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=447846.6666666667, ans=0.0 2023-11-18 22:54:59,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=447980.0, ans=0.2 2023-11-18 22:55:05,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=447980.0, ans=0.1 2023-11-18 22:55:16,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=6.50 vs. limit=15.0 2023-11-18 22:55:21,694 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7100, loss[loss=0.1008, simple_loss=0.1077, pruned_loss=0.03521, audio_tagging_loss=0.01172, over 15010.00 frames. ], tot_loss[loss=0.09753, simple_loss=0.1131, pruned_loss=0.02973, audio_tagging_loss=0.01124, over 3033003.97 frames. ], batch size: 58, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:55:45,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=448246.6666666667, ans=0.0 2023-11-18 22:55:53,471 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.49 vs. limit=15.0 2023-11-18 22:56:02,998 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 9.011e+01 9.786e+01 1.101e+02 1.464e+02, threshold=1.957e+02, percent-clipped=0.0 2023-11-18 22:56:11,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=448380.0, ans=0.2 2023-11-18 22:56:13,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=448380.0, ans=0.125 2023-11-18 22:56:16,827 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7150, loss[loss=0.09794, simple_loss=0.1136, pruned_loss=0.0294, audio_tagging_loss=0.01172, over 15252.00 frames. ], tot_loss[loss=0.09813, simple_loss=0.1142, pruned_loss=0.02986, audio_tagging_loss=0.01117, over 3036049.37 frames. ], batch size: 57, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:56:27,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=448513.3333333333, ans=0.125 2023-11-18 22:56:28,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.32 vs. limit=15.0 2023-11-18 22:56:28,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=448513.3333333333, ans=0.04949747468305833 2023-11-18 22:56:43,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=448580.0, ans=0.125 2023-11-18 22:57:03,791 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.10 vs. limit=22.5 2023-11-18 22:57:11,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=448713.3333333333, ans=0.1 2023-11-18 22:57:13,670 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7200, loss[loss=0.04581, simple_loss=0.04675, pruned_loss=0.009607, audio_tagging_loss=0.01282, over 15009.00 frames. ], tot_loss[loss=0.0969, simple_loss=0.1127, pruned_loss=0.02927, audio_tagging_loss=0.01127, over 3041052.39 frames. ], batch size: 58, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:57:43,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=448913.3333333333, ans=0.125 2023-11-18 22:57:54,994 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.744e+01 9.483e+01 1.054e+02 1.266e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-18 22:58:05,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=449046.6666666667, ans=0.1 2023-11-18 22:58:08,835 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7250, loss[loss=0.07485, simple_loss=0.08778, pruned_loss=0.01979, audio_tagging_loss=0.01117, over 15543.00 frames. ], tot_loss[loss=0.09702, simple_loss=0.1126, pruned_loss=0.02929, audio_tagging_loss=0.01145, over 3048070.04 frames. ], batch size: 58, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:58:18,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=449113.3333333333, ans=0.1 2023-11-18 22:58:40,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=449246.6666666667, ans=0.125 2023-11-18 22:58:42,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=449313.3333333333, ans=0.125 2023-11-18 22:58:43,196 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.20 vs. limit=15.0 2023-11-18 22:59:04,606 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7300, loss[loss=0.1128, simple_loss=0.135, pruned_loss=0.03448, audio_tagging_loss=0.01086, over 14329.00 frames. ], tot_loss[loss=0.09808, simple_loss=0.1141, pruned_loss=0.02972, audio_tagging_loss=0.01132, over 3044613.12 frames. ], batch size: 55, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 22:59:13,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=449446.6666666667, ans=0.2 2023-11-18 22:59:43,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=449646.6666666667, ans=0.0 2023-11-18 22:59:45,848 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.627e+01 8.690e+01 9.830e+01 1.104e+02 1.354e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-18 22:59:59,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=449780.0, ans=0.1 2023-11-18 23:00:00,370 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=15.90 vs. limit=15.0 2023-11-18 23:00:00,636 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7350, loss[loss=0.1152, simple_loss=0.1371, pruned_loss=0.03951, audio_tagging_loss=0.007109, over 14476.00 frames. ], tot_loss[loss=0.09738, simple_loss=0.1135, pruned_loss=0.02952, audio_tagging_loss=0.01111, over 3035059.46 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:00:02,732 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.44 vs. limit=22.5 2023-11-18 23:00:07,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=449780.0, ans=0.0 2023-11-18 23:00:08,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=449780.0, ans=0.125 2023-11-18 23:00:16,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=449846.6666666667, ans=0.035 2023-11-18 23:00:28,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=449913.3333333333, ans=0.0 2023-11-18 23:00:35,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=449980.0, ans=0.0 2023-11-18 23:00:37,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=449980.0, ans=0.1 2023-11-18 23:00:46,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=450046.6666666667, ans=0.0 2023-11-18 23:00:55,969 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7400, loss[loss=0.1089, simple_loss=0.1283, pruned_loss=0.03416, audio_tagging_loss=0.01057, over 15450.00 frames. ], tot_loss[loss=0.09743, simple_loss=0.1136, pruned_loss=0.02957, audio_tagging_loss=0.01104, over 3034489.79 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:01:10,963 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.54 vs. limit=12.0 2023-11-18 23:01:11,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=450180.0, ans=0.2 2023-11-18 23:01:12,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=450180.0, ans=0.025 2023-11-18 23:01:14,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=450180.0, ans=0.0 2023-11-18 23:01:16,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=450180.0, ans=0.0 2023-11-18 23:01:28,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=450246.6666666667, ans=0.1 2023-11-18 23:01:33,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=450313.3333333333, ans=0.125 2023-11-18 23:01:35,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=450313.3333333333, ans=0.1 2023-11-18 23:01:37,429 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.674e+01 9.086e+01 1.005e+02 1.143e+02 1.555e+02, threshold=2.009e+02, percent-clipped=0.0 2023-11-18 23:01:49,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.max_positive, batch_count=450380.0, ans=0.95 2023-11-18 23:01:51,875 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7450, loss[loss=0.1051, simple_loss=0.1218, pruned_loss=0.03244, audio_tagging_loss=0.01177, over 15822.00 frames. ], tot_loss[loss=0.09764, simple_loss=0.1136, pruned_loss=0.02976, audio_tagging_loss=0.01109, over 3031487.56 frames. ], batch size: 60, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:02:18,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=450580.0, ans=0.125 2023-11-18 23:02:36,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=450713.3333333333, ans=0.0 2023-11-18 23:02:38,074 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2023-11-18 23:02:43,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=450713.3333333333, ans=0.1 2023-11-18 23:02:45,954 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=24.65 vs. limit=22.5 2023-11-18 23:02:47,573 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7500, loss[loss=0.1581, simple_loss=0.1865, pruned_loss=0.05812, audio_tagging_loss=0.006782, over 15473.00 frames. ], tot_loss[loss=0.09775, simple_loss=0.1137, pruned_loss=0.02978, audio_tagging_loss=0.01112, over 3039590.09 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:02:52,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=450780.0, ans=0.125 2023-11-18 23:02:57,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=450780.0, ans=0.125 2023-11-18 23:03:30,364 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.900e+01 8.742e+01 9.569e+01 1.067e+02 1.631e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-18 23:03:35,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=451046.6666666667, ans=0.125 2023-11-18 23:03:38,990 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.39 vs. limit=22.5 2023-11-18 23:03:43,630 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7550, loss[loss=0.08608, simple_loss=0.09625, pruned_loss=0.02533, audio_tagging_loss=0.01263, over 15534.00 frames. ], tot_loss[loss=0.09839, simple_loss=0.1145, pruned_loss=0.03008, audio_tagging_loss=0.01108, over 3044827.19 frames. ], batch size: 59, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:03:52,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=451113.3333333333, ans=0.1 2023-11-18 23:04:14,107 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.77 vs. limit=15.0 2023-11-18 23:04:26,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=451380.0, ans=0.125 2023-11-18 23:04:29,042 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.70 vs. limit=10.0 2023-11-18 23:04:38,066 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7600, loss[loss=0.12, simple_loss=0.1465, pruned_loss=0.03787, audio_tagging_loss=0.008927, over 15473.00 frames. ], tot_loss[loss=0.09813, simple_loss=0.1141, pruned_loss=0.03012, audio_tagging_loss=0.01095, over 3048053.88 frames. ], batch size: 58, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:05:01,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=451580.0, ans=0.125 2023-11-18 23:05:19,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=451646.6666666667, ans=0.125 2023-11-18 23:05:20,679 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.440e+01 8.942e+01 9.830e+01 1.127e+02 1.912e+02, threshold=1.966e+02, percent-clipped=0.0 2023-11-18 23:05:24,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=451713.3333333333, ans=0.125 2023-11-18 23:05:30,634 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.91 vs. limit=15.0 2023-11-18 23:05:33,290 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7650, loss[loss=0.09501, simple_loss=0.1141, pruned_loss=0.02821, audio_tagging_loss=0.009734, over 16098.00 frames. ], tot_loss[loss=0.09852, simple_loss=0.115, pruned_loss=0.03019, audio_tagging_loss=0.01082, over 3053071.16 frames. ], batch size: 61, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:06:26,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=452046.6666666667, ans=0.1 2023-11-18 23:06:29,819 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7700, loss[loss=0.1062, simple_loss=0.1231, pruned_loss=0.03432, audio_tagging_loss=0.01029, over 15371.00 frames. ], tot_loss[loss=0.09819, simple_loss=0.1148, pruned_loss=0.02995, audio_tagging_loss=0.01082, over 3051214.49 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:06:38,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=452113.3333333333, ans=0.2 2023-11-18 23:06:58,864 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.51 vs. limit=22.5 2023-11-18 23:07:06,525 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.00 vs. limit=22.5 2023-11-18 23:07:07,488 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.13 vs. limit=15.0 2023-11-18 23:07:13,150 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.026e+01 8.646e+01 9.561e+01 1.050e+02 1.437e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-18 23:07:21,280 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.12 vs. limit=15.0 2023-11-18 23:07:24,967 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7750, loss[loss=0.1416, simple_loss=0.169, pruned_loss=0.05042, audio_tagging_loss=0.006689, over 15612.00 frames. ], tot_loss[loss=0.09816, simple_loss=0.1147, pruned_loss=0.03, audio_tagging_loss=0.01083, over 3048147.33 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:07:40,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=452513.3333333333, ans=0.1 2023-11-18 23:07:45,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=452513.3333333333, ans=0.125 2023-11-18 23:08:04,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=452646.6666666667, ans=0.125 2023-11-18 23:08:04,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=452646.6666666667, ans=0.0 2023-11-18 23:08:20,995 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7800, loss[loss=0.1138, simple_loss=0.1414, pruned_loss=0.03275, audio_tagging_loss=0.01035, over 15886.00 frames. ], tot_loss[loss=0.09811, simple_loss=0.1147, pruned_loss=0.02986, audio_tagging_loss=0.0109, over 3044472.25 frames. ], batch size: 57, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:08:24,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=452780.0, ans=0.0 2023-11-18 23:08:40,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=452846.6666666667, ans=0.07 2023-11-18 23:09:04,314 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.221e+01 8.766e+01 9.659e+01 1.074e+02 1.731e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-18 23:09:07,766 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.35 vs. limit=15.0 2023-11-18 23:09:17,036 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7850, loss[loss=0.1164, simple_loss=0.1415, pruned_loss=0.0357, audio_tagging_loss=0.009957, over 15840.00 frames. ], tot_loss[loss=0.09805, simple_loss=0.1144, pruned_loss=0.02983, audio_tagging_loss=0.01104, over 3047592.09 frames. ], batch size: 58, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:09:18,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=453113.3333333333, ans=0.015 2023-11-18 23:09:23,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=453113.3333333333, ans=0.1 2023-11-18 23:09:40,703 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.91 vs. limit=15.0 2023-11-18 23:09:53,074 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-68000.pt 2023-11-18 23:09:57,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=453313.3333333333, ans=0.125 2023-11-18 23:09:58,641 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.699e-02 2023-11-18 23:10:10,732 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.17 vs. limit=15.0 2023-11-18 23:10:14,375 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7900, loss[loss=0.08007, simple_loss=0.09717, pruned_loss=0.02338, audio_tagging_loss=0.008104, over 15246.00 frames. ], tot_loss[loss=0.09847, simple_loss=0.115, pruned_loss=0.02997, audio_tagging_loss=0.01099, over 3048462.75 frames. ], batch size: 58, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:10:14,861 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=15.0 2023-11-18 23:10:30,275 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.10 vs. limit=10.0 2023-11-18 23:10:48,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=453646.6666666667, ans=0.125 2023-11-18 23:10:55,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=453646.6666666667, ans=0.0 2023-11-18 23:10:57,621 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.507e+01 8.743e+01 9.470e+01 1.071e+02 1.653e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-18 23:11:07,040 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-18 23:11:09,782 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 7950, loss[loss=0.1332, simple_loss=0.156, pruned_loss=0.04678, audio_tagging_loss=0.008401, over 14563.00 frames. ], tot_loss[loss=0.09839, simple_loss=0.1146, pruned_loss=0.02996, audio_tagging_loss=0.01113, over 3048214.52 frames. ], batch size: 55, lr: 1.13e-02, grad_scale: 16.0 2023-11-18 23:11:09,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=453780.0, ans=0.125 2023-11-18 23:11:11,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=453780.0, ans=0.125 2023-11-18 23:11:14,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=453780.0, ans=0.1 2023-11-18 23:11:22,628 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:11:27,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=453846.6666666667, ans=0.125 2023-11-18 23:11:41,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.19 vs. limit=15.0 2023-11-18 23:11:54,081 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=27.68 vs. limit=22.5 2023-11-18 23:12:05,813 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8000, loss[loss=0.09741, simple_loss=0.1138, pruned_loss=0.02979, audio_tagging_loss=0.01074, over 15694.00 frames. ], tot_loss[loss=0.09772, simple_loss=0.114, pruned_loss=0.02956, audio_tagging_loss=0.01119, over 3045577.89 frames. ], batch size: 58, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:12:12,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=454113.3333333333, ans=0.125 2023-11-18 23:12:22,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=454180.0, ans=0.125 2023-11-18 23:12:35,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer_ff2.min_abs, batch_count=454246.6666666667, ans=0.1 2023-11-18 23:12:45,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.34 vs. limit=15.0 2023-11-18 23:12:48,893 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.235e+01 8.670e+01 9.475e+01 1.059e+02 1.539e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-18 23:12:50,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=454380.0, ans=0.1 2023-11-18 23:12:52,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=454380.0, ans=0.0 2023-11-18 23:13:00,536 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8050, loss[loss=0.1267, simple_loss=0.1502, pruned_loss=0.04429, audio_tagging_loss=0.007258, over 15422.00 frames. ], tot_loss[loss=0.09713, simple_loss=0.1131, pruned_loss=0.02932, audio_tagging_loss=0.01124, over 3042712.05 frames. ], batch size: 57, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:13:06,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=454446.6666666667, ans=0.125 2023-11-18 23:13:10,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=454513.3333333333, ans=0.0 2023-11-18 23:13:31,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=454580.0, ans=0.125 2023-11-18 23:13:33,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.53 vs. limit=22.5 2023-11-18 23:13:36,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=454646.6666666667, ans=0.125 2023-11-18 23:13:53,763 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.29 vs. limit=6.0 2023-11-18 23:13:56,073 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8100, loss[loss=0.08258, simple_loss=0.09463, pruned_loss=0.02495, audio_tagging_loss=0.01032, over 15081.00 frames. ], tot_loss[loss=0.09755, simple_loss=0.1137, pruned_loss=0.02956, audio_tagging_loss=0.01113, over 3046525.64 frames. ], batch size: 59, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:14:25,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=454913.3333333333, ans=0.125 2023-11-18 23:14:26,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=454913.3333333333, ans=0.1 2023-11-18 23:14:27,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=454913.3333333333, ans=0.125 2023-11-18 23:14:30,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=454980.0, ans=0.0 2023-11-18 23:14:32,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=454980.0, ans=0.2 2023-11-18 23:14:36,161 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.89 vs. limit=15.0 2023-11-18 23:14:39,583 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.949e+01 9.117e+01 9.858e+01 1.091e+02 1.353e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-18 23:14:52,356 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8150, loss[loss=0.1059, simple_loss=0.1227, pruned_loss=0.03642, audio_tagging_loss=0.008158, over 14955.00 frames. ], tot_loss[loss=0.0974, simple_loss=0.1137, pruned_loss=0.02955, audio_tagging_loss=0.01101, over 3044055.60 frames. ], batch size: 56, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:15:00,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=455113.3333333333, ans=0.0 2023-11-18 23:15:00,994 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.48 vs. limit=22.5 2023-11-18 23:15:10,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=455180.0, ans=0.1 2023-11-18 23:15:18,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=455246.6666666667, ans=0.125 2023-11-18 23:15:21,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=455246.6666666667, ans=0.1 2023-11-18 23:15:24,531 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:15:30,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=455313.3333333333, ans=0.125 2023-11-18 23:15:32,337 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=455313.3333333333, ans=15.0 2023-11-18 23:15:36,243 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:15:37,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=455380.0, ans=0.04949747468305833 2023-11-18 23:15:47,088 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:15:48,098 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8200, loss[loss=0.09943, simple_loss=0.1224, pruned_loss=0.0299, audio_tagging_loss=0.00834, over 16261.00 frames. ], tot_loss[loss=0.09669, simple_loss=0.113, pruned_loss=0.02913, audio_tagging_loss=0.01108, over 3043918.21 frames. ], batch size: 58, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:15:52,422 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:15:55,839 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.28 vs. limit=15.0 2023-11-18 23:16:13,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=455580.0, ans=0.1 2023-11-18 23:16:23,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=455646.6666666667, ans=0.07 2023-11-18 23:16:30,915 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.615e+01 8.905e+01 9.848e+01 1.096e+02 1.904e+02, threshold=1.970e+02, percent-clipped=0.0 2023-11-18 23:16:37,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=455713.3333333333, ans=0.0 2023-11-18 23:16:42,990 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8250, loss[loss=0.1116, simple_loss=0.1399, pruned_loss=0.03381, audio_tagging_loss=0.007814, over 15116.00 frames. ], tot_loss[loss=0.09677, simple_loss=0.1129, pruned_loss=0.02926, audio_tagging_loss=0.01106, over 3046026.08 frames. ], batch size: 54, lr: 1.13e-02, grad_scale: 32.0 2023-11-18 23:16:49,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=455780.0, ans=0.125 2023-11-18 23:16:55,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=455846.6666666667, ans=0.125 2023-11-18 23:16:57,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=455846.6666666667, ans=0.125 2023-11-18 23:17:32,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=456046.6666666667, ans=0.0 2023-11-18 23:17:38,211 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8300, loss[loss=0.116, simple_loss=0.1336, pruned_loss=0.03868, audio_tagging_loss=0.01057, over 15007.00 frames. ], tot_loss[loss=0.09596, simple_loss=0.1119, pruned_loss=0.02892, audio_tagging_loss=0.01108, over 3045417.81 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:17:43,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=456113.3333333333, ans=0.0 2023-11-18 23:17:44,099 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.52 vs. limit=22.5 2023-11-18 23:17:58,172 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.78 vs. limit=15.0 2023-11-18 23:18:21,240 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.810e+01 9.840e+01 1.082e+02 1.589e+02, threshold=1.968e+02, percent-clipped=0.0 2023-11-18 23:18:33,342 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8350, loss[loss=0.04024, simple_loss=0.03648, pruned_loss=0.008296, audio_tagging_loss=0.0137, over 15319.00 frames. ], tot_loss[loss=0.09459, simple_loss=0.1104, pruned_loss=0.02831, audio_tagging_loss=0.01107, over 3041534.42 frames. ], batch size: 61, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:18:37,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=456446.6666666667, ans=0.0 2023-11-18 23:18:56,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=456580.0, ans=0.2 2023-11-18 23:19:28,802 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8400, loss[loss=0.109, simple_loss=0.1324, pruned_loss=0.03361, audio_tagging_loss=0.009196, over 15853.00 frames. ], tot_loss[loss=0.09407, simple_loss=0.1099, pruned_loss=0.02806, audio_tagging_loss=0.01103, over 3047971.07 frames. ], batch size: 58, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:19:37,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.71 vs. limit=15.0 2023-11-18 23:19:50,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=456913.3333333333, ans=0.125 2023-11-18 23:20:12,050 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.818e+01 8.924e+01 9.865e+01 1.104e+02 3.626e+02, threshold=1.973e+02, percent-clipped=1.0 2023-11-18 23:20:24,661 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8450, loss[loss=0.09443, simple_loss=0.1145, pruned_loss=0.02678, audio_tagging_loss=0.01038, over 14922.00 frames. ], tot_loss[loss=0.09568, simple_loss=0.1118, pruned_loss=0.02878, audio_tagging_loss=0.01101, over 3050109.68 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:20:43,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=457180.0, ans=0.0 2023-11-18 23:20:44,906 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=5.74 vs. limit=15.0 2023-11-18 23:20:58,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=457313.3333333333, ans=0.125 2023-11-18 23:20:59,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.61 vs. limit=22.5 2023-11-18 23:21:19,999 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8500, loss[loss=0.09475, simple_loss=0.1051, pruned_loss=0.03128, audio_tagging_loss=0.01091, over 14913.00 frames. ], tot_loss[loss=0.09604, simple_loss=0.1121, pruned_loss=0.02891, audio_tagging_loss=0.01107, over 3053323.89 frames. ], batch size: 59, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:21:37,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=457513.3333333333, ans=0.125 2023-11-18 23:21:43,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=457580.0, ans=0.09899494936611666 2023-11-18 23:21:44,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=457580.0, ans=0.0 2023-11-18 23:22:00,752 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.73 vs. limit=10.0 2023-11-18 23:22:04,503 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.708e+01 8.613e+01 9.508e+01 1.037e+02 1.516e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-18 23:22:15,588 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8550, loss[loss=0.09757, simple_loss=0.1265, pruned_loss=0.02649, audio_tagging_loss=0.00782, over 14531.00 frames. ], tot_loss[loss=0.09743, simple_loss=0.1137, pruned_loss=0.02945, audio_tagging_loss=0.01113, over 3050722.74 frames. ], batch size: 53, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:22:33,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=457846.6666666667, ans=0.09899494936611666 2023-11-18 23:22:36,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=457846.6666666667, ans=0.0 2023-11-18 23:22:46,090 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.25 vs. limit=15.0 2023-11-18 23:23:11,596 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8600, loss[loss=0.1219, simple_loss=0.1468, pruned_loss=0.0394, audio_tagging_loss=0.009054, over 15926.00 frames. ], tot_loss[loss=0.09737, simple_loss=0.1135, pruned_loss=0.02934, audio_tagging_loss=0.01127, over 3051545.97 frames. ], batch size: 58, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:23:14,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=458113.3333333333, ans=0.2 2023-11-18 23:23:21,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=458180.0, ans=0.125 2023-11-18 23:23:39,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=458246.6666666667, ans=0.125 2023-11-18 23:23:46,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=458313.3333333333, ans=0.2 2023-11-18 23:23:49,379 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=15.0 2023-11-18 23:23:56,308 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.985e+01 8.729e+01 9.610e+01 1.061e+02 1.458e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-18 23:23:56,900 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.75 vs. limit=22.5 2023-11-18 23:24:06,904 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8650, loss[loss=0.08183, simple_loss=0.09515, pruned_loss=0.02412, audio_tagging_loss=0.01013, over 15286.00 frames. ], tot_loss[loss=0.09717, simple_loss=0.1137, pruned_loss=0.02916, audio_tagging_loss=0.01115, over 3056103.05 frames. ], batch size: 57, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:24:53,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=458713.3333333333, ans=0.2 2023-11-18 23:24:54,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=458713.3333333333, ans=0.125 2023-11-18 23:25:02,856 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8700, loss[loss=0.1228, simple_loss=0.1484, pruned_loss=0.03891, audio_tagging_loss=0.009658, over 15384.00 frames. ], tot_loss[loss=0.09754, simple_loss=0.114, pruned_loss=0.02931, audio_tagging_loss=0.01124, over 3055671.95 frames. ], batch size: 58, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:25:04,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=458780.0, ans=0.0 2023-11-18 23:25:05,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=458780.0, ans=0.1 2023-11-18 23:25:09,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=458780.0, ans=0.125 2023-11-18 23:25:24,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=458913.3333333333, ans=0.2 2023-11-18 23:25:47,298 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.282e+01 9.228e+01 9.960e+01 1.094e+02 1.937e+02, threshold=1.992e+02, percent-clipped=1.0 2023-11-18 23:25:58,428 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8750, loss[loss=0.09809, simple_loss=0.1158, pruned_loss=0.02934, audio_tagging_loss=0.01084, over 15440.00 frames. ], tot_loss[loss=0.09793, simple_loss=0.1147, pruned_loss=0.02944, audio_tagging_loss=0.01117, over 3056657.65 frames. ], batch size: 58, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:26:03,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=459113.3333333333, ans=0.125 2023-11-18 23:26:10,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=459180.0, ans=0.2 2023-11-18 23:26:19,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=459246.6666666667, ans=0.2 2023-11-18 23:26:54,461 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8800, loss[loss=0.1008, simple_loss=0.1135, pruned_loss=0.03194, audio_tagging_loss=0.01216, over 14772.00 frames. ], tot_loss[loss=0.0978, simple_loss=0.1143, pruned_loss=0.02938, audio_tagging_loss=0.01127, over 3050316.85 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:27:10,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=459513.3333333333, ans=0.125 2023-11-18 23:27:38,832 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.027e+01 8.792e+01 9.740e+01 1.071e+02 1.410e+02, threshold=1.948e+02, percent-clipped=0.0 2023-11-18 23:27:49,276 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8850, loss[loss=0.1189, simple_loss=0.1404, pruned_loss=0.03722, audio_tagging_loss=0.01145, over 15810.00 frames. ], tot_loss[loss=0.09809, simple_loss=0.1149, pruned_loss=0.02949, audio_tagging_loss=0.01114, over 3051214.05 frames. ], batch size: 58, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:27:49,503 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:27:58,829 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:28:07,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=459846.6666666667, ans=0.125 2023-11-18 23:28:08,285 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.43 vs. limit=6.0 2023-11-18 23:28:09,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=459846.6666666667, ans=0.125 2023-11-18 23:28:13,087 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.42 vs. limit=22.5 2023-11-18 23:28:23,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=459980.0, ans=22.5 2023-11-18 23:28:37,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=460046.6666666667, ans=0.0 2023-11-18 23:28:45,807 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8900, loss[loss=0.1014, simple_loss=0.1156, pruned_loss=0.03381, audio_tagging_loss=0.00979, over 14811.00 frames. ], tot_loss[loss=0.09734, simple_loss=0.1144, pruned_loss=0.0292, audio_tagging_loss=0.01096, over 3048134.33 frames. ], batch size: 57, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:28:55,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=460113.3333333333, ans=0.125 2023-11-18 23:28:58,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=460180.0, ans=0.125 2023-11-18 23:29:00,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=460180.0, ans=0.0 2023-11-18 23:29:26,405 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.85 vs. limit=12.0 2023-11-18 23:29:30,091 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.564e+01 9.040e+01 1.017e+02 1.156e+02 1.581e+02, threshold=2.035e+02, percent-clipped=0.0 2023-11-18 23:29:41,810 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 8950, loss[loss=0.1212, simple_loss=0.1354, pruned_loss=0.04645, audio_tagging_loss=0.007065, over 14575.00 frames. ], tot_loss[loss=0.09819, simple_loss=0.1152, pruned_loss=0.02974, audio_tagging_loss=0.01084, over 3046905.35 frames. ], batch size: 55, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:29:44,682 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.35 vs. limit=6.0 2023-11-18 23:29:49,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=460446.6666666667, ans=0.125 2023-11-18 23:30:01,997 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:30:17,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=460646.6666666667, ans=0.0 2023-11-18 23:30:21,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=460646.6666666667, ans=0.0 2023-11-18 23:30:36,560 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9000, loss[loss=0.09108, simple_loss=0.1038, pruned_loss=0.02244, audio_tagging_loss=0.01674, over 14289.00 frames. ], tot_loss[loss=0.0979, simple_loss=0.1149, pruned_loss=0.02957, audio_tagging_loss=0.01088, over 3048473.77 frames. ], batch size: 54, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:30:36,562 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-18 23:31:09,104 INFO [train_asr.py:1147] (0/4) Epoch 6, validation: loss=0.07051, simple_loss=0.05865, pruned_loss=0.008039, audio_tagging_loss=0.03315, over 4681554.00 frames. 2023-11-18 23:31:09,105 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-18 23:31:09,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=460780.0, ans=0.1 2023-11-18 23:31:09,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=460780.0, ans=0.0 2023-11-18 23:31:37,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=460913.3333333333, ans=0.125 2023-11-18 23:31:38,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=460913.3333333333, ans=0.125 2023-11-18 23:31:53,345 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.759e+01 8.670e+01 9.700e+01 1.069e+02 1.408e+02, threshold=1.940e+02, percent-clipped=0.0 2023-11-18 23:32:04,120 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.90 vs. limit=22.5 2023-11-18 23:32:04,555 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9050, loss[loss=0.1091, simple_loss=0.1317, pruned_loss=0.0363, audio_tagging_loss=0.006998, over 15821.00 frames. ], tot_loss[loss=0.0975, simple_loss=0.1147, pruned_loss=0.02929, audio_tagging_loss=0.01089, over 3052625.08 frames. ], batch size: 57, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:32:05,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=461113.3333333333, ans=0.0 2023-11-18 23:32:06,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=461113.3333333333, ans=0.1 2023-11-18 23:32:33,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=461246.6666666667, ans=0.0 2023-11-18 23:32:54,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=461380.0, ans=0.125 2023-11-18 23:32:55,625 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=15.0 2023-11-18 23:32:55,792 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2023-11-18 23:32:59,358 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9100, loss[loss=0.1135, simple_loss=0.1385, pruned_loss=0.03559, audio_tagging_loss=0.008645, over 15858.00 frames. ], tot_loss[loss=0.09666, simple_loss=0.1137, pruned_loss=0.02896, audio_tagging_loss=0.01083, over 3048577.65 frames. ], batch size: 58, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:33:28,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=461580.0, ans=0.0 2023-11-18 23:33:32,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=461646.6666666667, ans=0.125 2023-11-18 23:33:33,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=461646.6666666667, ans=0.125 2023-11-18 23:33:35,954 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.83 vs. limit=22.5 2023-11-18 23:33:43,694 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.366e+01 8.817e+01 9.477e+01 1.033e+02 1.344e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-18 23:33:49,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=461713.3333333333, ans=0.125 2023-11-18 23:33:54,711 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9150, loss[loss=0.121, simple_loss=0.1467, pruned_loss=0.03948, audio_tagging_loss=0.00812, over 15621.00 frames. ], tot_loss[loss=0.0966, simple_loss=0.1138, pruned_loss=0.02893, audio_tagging_loss=0.01074, over 3051968.88 frames. ], batch size: 57, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:33:59,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=461780.0, ans=0.125 2023-11-18 23:33:59,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2023-11-18 23:34:01,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=461780.0, ans=0.125 2023-11-18 23:34:09,589 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.23 vs. limit=6.0 2023-11-18 23:34:12,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.12 vs. limit=15.0 2023-11-18 23:34:16,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=461913.3333333333, ans=0.1 2023-11-18 23:34:50,579 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9200, loss[loss=0.1159, simple_loss=0.139, pruned_loss=0.03652, audio_tagging_loss=0.009888, over 14235.00 frames. ], tot_loss[loss=0.09657, simple_loss=0.1137, pruned_loss=0.02889, audio_tagging_loss=0.01083, over 3058846.86 frames. ], batch size: 53, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:35:00,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=462180.0, ans=0.0 2023-11-18 23:35:33,914 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.421e+01 8.725e+01 9.517e+01 1.069e+02 1.464e+02, threshold=1.903e+02, percent-clipped=0.0 2023-11-18 23:35:37,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=462380.0, ans=0.5 2023-11-18 23:35:44,341 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9250, loss[loss=0.05537, simple_loss=0.05881, pruned_loss=0.01262, audio_tagging_loss=0.01334, over 15528.00 frames. ], tot_loss[loss=0.09578, simple_loss=0.1127, pruned_loss=0.02859, audio_tagging_loss=0.01084, over 3064081.80 frames. ], batch size: 59, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:35:45,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=462446.6666666667, ans=0.125 2023-11-18 23:35:46,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.23 vs. limit=15.0 2023-11-18 23:35:48,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=462446.6666666667, ans=0.125 2023-11-18 23:35:49,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2023-11-18 23:36:06,696 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:36:17,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=462646.6666666667, ans=0.125 2023-11-18 23:36:28,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=462713.3333333333, ans=0.1 2023-11-18 23:36:39,674 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9300, loss[loss=0.08241, simple_loss=0.09844, pruned_loss=0.02391, audio_tagging_loss=0.00928, over 14670.00 frames. ], tot_loss[loss=0.09609, simple_loss=0.113, pruned_loss=0.02873, audio_tagging_loss=0.01087, over 3063150.55 frames. ], batch size: 56, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:37:01,819 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.06 vs. limit=15.0 2023-11-18 23:37:23,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=463046.6666666667, ans=0.1 2023-11-18 23:37:23,841 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 8.607e+01 9.427e+01 1.041e+02 1.346e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-18 23:37:35,919 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9350, loss[loss=0.126, simple_loss=0.1577, pruned_loss=0.03966, audio_tagging_loss=0.007526, over 17011.00 frames. ], tot_loss[loss=0.0968, simple_loss=0.1139, pruned_loss=0.02902, audio_tagging_loss=0.01084, over 3068925.71 frames. ], batch size: 61, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:37:37,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=463113.3333333333, ans=0.125 2023-11-18 23:37:41,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=463113.3333333333, ans=0.1 2023-11-18 23:37:48,059 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.06 vs. limit=15.0 2023-11-18 23:37:57,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=463246.6666666667, ans=0.125 2023-11-18 23:38:12,897 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=8.60 vs. limit=15.0 2023-11-18 23:38:24,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=463380.0, ans=0.0 2023-11-18 23:38:25,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=463380.0, ans=0.125 2023-11-18 23:38:30,381 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9400, loss[loss=0.09522, simple_loss=0.0987, pruned_loss=0.02796, audio_tagging_loss=0.01791, over 15785.00 frames. ], tot_loss[loss=0.0964, simple_loss=0.1131, pruned_loss=0.02882, audio_tagging_loss=0.01103, over 3071990.45 frames. ], batch size: 60, lr: 1.12e-02, grad_scale: 32.0 2023-11-18 23:38:30,945 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.67 vs. limit=15.0 2023-11-18 23:38:32,006 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2023-11-18 23:38:44,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=463513.3333333333, ans=0.125 2023-11-18 23:39:04,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=463646.6666666667, ans=0.125 2023-11-18 23:39:15,428 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.264e+01 8.967e+01 9.961e+01 1.049e+02 1.500e+02, threshold=1.992e+02, percent-clipped=0.0 2023-11-18 23:39:21,230 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.76 vs. limit=22.5 2023-11-18 23:39:21,686 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:39:25,411 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9450, loss[loss=0.06841, simple_loss=0.0773, pruned_loss=0.01574, audio_tagging_loss=0.01402, over 14424.00 frames. ], tot_loss[loss=0.09622, simple_loss=0.1127, pruned_loss=0.02871, audio_tagging_loss=0.01117, over 3062781.49 frames. ], batch size: 57, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:39:45,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=463846.6666666667, ans=0.1 2023-11-18 23:39:48,414 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=463913.3333333333, ans=0.125 2023-11-18 23:40:03,702 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=12.0 2023-11-18 23:40:04,699 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.66 vs. limit=22.5 2023-11-18 23:40:20,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=464113.3333333333, ans=0.0 2023-11-18 23:40:20,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=464113.3333333333, ans=0.125 2023-11-18 23:40:21,508 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9500, loss[loss=0.09024, simple_loss=0.1096, pruned_loss=0.02578, audio_tagging_loss=0.009655, over 15390.00 frames. ], tot_loss[loss=0.09608, simple_loss=0.1126, pruned_loss=0.02861, audio_tagging_loss=0.01119, over 3054649.51 frames. ], batch size: 59, lr: 1.12e-02, grad_scale: 16.0 2023-11-18 23:40:21,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=464113.3333333333, ans=0.125 2023-11-18 23:40:26,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=464113.3333333333, ans=0.125 2023-11-18 23:40:39,761 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.47 vs. limit=15.0 2023-11-18 23:40:46,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=464246.6666666667, ans=0.125 2023-11-18 23:40:52,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=464246.6666666667, ans=0.2 2023-11-18 23:41:04,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=464313.3333333333, ans=0.1 2023-11-18 23:41:07,216 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 9.036e+01 9.803e+01 1.077e+02 1.985e+02, threshold=1.961e+02, percent-clipped=0.0 2023-11-18 23:41:14,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=464380.0, ans=0.09899494936611666 2023-11-18 23:41:17,269 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9550, loss[loss=0.0909, simple_loss=0.1027, pruned_loss=0.02477, audio_tagging_loss=0.01476, over 16076.00 frames. ], tot_loss[loss=0.0966, simple_loss=0.1133, pruned_loss=0.02876, audio_tagging_loss=0.01122, over 3052823.28 frames. ], batch size: 62, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:41:19,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=464446.6666666667, ans=0.0 2023-11-18 23:41:29,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=464513.3333333333, ans=0.125 2023-11-18 23:41:31,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=464513.3333333333, ans=0.125 2023-11-18 23:41:49,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=464646.6666666667, ans=0.125 2023-11-18 23:41:49,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=464646.6666666667, ans=0.2 2023-11-18 23:42:06,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=464713.3333333333, ans=0.0 2023-11-18 23:42:06,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=464713.3333333333, ans=0.015 2023-11-18 23:42:08,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=464713.3333333333, ans=0.1 2023-11-18 23:42:12,595 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9600, loss[loss=0.122, simple_loss=0.152, pruned_loss=0.03399, audio_tagging_loss=0.01204, over 16208.00 frames. ], tot_loss[loss=0.09629, simple_loss=0.1126, pruned_loss=0.02868, audio_tagging_loss=0.01129, over 3050360.21 frames. ], batch size: 60, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:42:31,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=464846.6666666667, ans=0.1 2023-11-18 23:42:32,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=464846.6666666667, ans=0.125 2023-11-18 23:42:36,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.64 vs. limit=15.0 2023-11-18 23:42:39,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=464913.3333333333, ans=0.0 2023-11-18 23:42:41,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=464913.3333333333, ans=0.0 2023-11-18 23:42:50,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=464980.0, ans=0.2 2023-11-18 23:42:50,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=464980.0, ans=15.0 2023-11-18 23:42:53,186 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.76 vs. limit=15.0 2023-11-18 23:42:56,032 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.30 vs. limit=15.0 2023-11-18 23:42:58,488 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.512e+01 8.742e+01 9.765e+01 1.051e+02 1.389e+02, threshold=1.953e+02, percent-clipped=0.0 2023-11-18 23:43:07,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=465046.6666666667, ans=0.125 2023-11-18 23:43:09,322 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9650, loss[loss=0.06414, simple_loss=0.07593, pruned_loss=0.0148, audio_tagging_loss=0.01138, over 14696.00 frames. ], tot_loss[loss=0.09618, simple_loss=0.1125, pruned_loss=0.02876, audio_tagging_loss=0.01117, over 3046220.13 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:43:19,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=465180.0, ans=0.5 2023-11-18 23:43:19,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2023-11-18 23:43:21,689 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.41 vs. limit=15.0 2023-11-18 23:43:38,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=465246.6666666667, ans=0.0 2023-11-18 23:44:00,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=465380.0, ans=0.1 2023-11-18 23:44:04,667 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9700, loss[loss=0.08808, simple_loss=0.1027, pruned_loss=0.02597, audio_tagging_loss=0.01075, over 15651.00 frames. ], tot_loss[loss=0.09538, simple_loss=0.1119, pruned_loss=0.02836, audio_tagging_loss=0.01109, over 3046963.02 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:44:17,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=465513.3333333333, ans=0.125 2023-11-18 23:44:24,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=465513.3333333333, ans=0.125 2023-11-18 23:44:25,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=465580.0, ans=0.2 2023-11-18 23:44:26,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=465580.0, ans=0.125 2023-11-18 23:44:44,493 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.12 vs. limit=6.0 2023-11-18 23:44:45,384 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.01 vs. limit=22.5 2023-11-18 23:44:50,714 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.321e+01 8.590e+01 9.606e+01 1.115e+02 1.456e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-18 23:44:53,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=465713.3333333333, ans=15.0 2023-11-18 23:45:00,259 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9750, loss[loss=0.1181, simple_loss=0.1456, pruned_loss=0.03814, audio_tagging_loss=0.007195, over 15792.00 frames. ], tot_loss[loss=0.09529, simple_loss=0.1117, pruned_loss=0.02856, audio_tagging_loss=0.0109, over 3049128.80 frames. ], batch size: 58, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:45:13,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=465846.6666666667, ans=0.1 2023-11-18 23:45:48,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=466046.6666666667, ans=0.125 2023-11-18 23:45:49,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=466046.6666666667, ans=0.125 2023-11-18 23:45:49,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=466046.6666666667, ans=0.1 2023-11-18 23:45:57,113 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9800, loss[loss=0.09057, simple_loss=0.11, pruned_loss=0.02593, audio_tagging_loss=0.009656, over 15532.00 frames. ], tot_loss[loss=0.09515, simple_loss=0.1114, pruned_loss=0.02848, audio_tagging_loss=0.01095, over 3046542.18 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:46:04,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=466113.3333333333, ans=0.125 2023-11-18 23:46:04,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=466113.3333333333, ans=0.125 2023-11-18 23:46:13,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=466180.0, ans=0.0 2023-11-18 23:46:13,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=466180.0, ans=10.0 2023-11-18 23:46:30,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=466313.3333333333, ans=0.2 2023-11-18 23:46:39,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=466313.3333333333, ans=0.1 2023-11-18 23:46:42,482 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.868e+01 8.589e+01 9.720e+01 1.056e+02 1.437e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-18 23:46:44,614 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:46:52,054 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9850, loss[loss=0.0987, simple_loss=0.1224, pruned_loss=0.02742, audio_tagging_loss=0.01008, over 14181.00 frames. ], tot_loss[loss=0.09556, simple_loss=0.1121, pruned_loss=0.02863, audio_tagging_loss=0.01087, over 3046091.34 frames. ], batch size: 53, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:46:57,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=466446.6666666667, ans=0.0 2023-11-18 23:47:02,649 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.68 vs. limit=15.0 2023-11-18 23:47:13,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=466580.0, ans=0.1 2023-11-18 23:47:25,005 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.94 vs. limit=15.0 2023-11-18 23:47:34,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=466646.6666666667, ans=0.2 2023-11-18 23:47:47,561 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9900, loss[loss=0.1104, simple_loss=0.1379, pruned_loss=0.03274, audio_tagging_loss=0.008742, over 15695.00 frames. ], tot_loss[loss=0.09544, simple_loss=0.1119, pruned_loss=0.02859, audio_tagging_loss=0.0109, over 3048234.13 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:47:58,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=466846.6666666667, ans=0.2 2023-11-18 23:48:01,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=466846.6666666667, ans=0.2 2023-11-18 23:48:16,957 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.37 vs. limit=15.0 2023-11-18 23:48:32,816 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.537e+01 8.839e+01 9.469e+01 1.066e+02 1.468e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-18 23:48:43,396 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 9950, loss[loss=0.08454, simple_loss=0.1022, pruned_loss=0.02338, audio_tagging_loss=0.01003, over 15139.00 frames. ], tot_loss[loss=0.09554, simple_loss=0.1122, pruned_loss=0.02846, audio_tagging_loss=0.01098, over 3050596.56 frames. ], batch size: 61, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:48:47,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=467113.3333333333, ans=0.125 2023-11-18 23:48:59,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=467180.0, ans=0.125 2023-11-18 23:49:05,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=467246.6666666667, ans=0.0 2023-11-18 23:49:11,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=467246.6666666667, ans=0.125 2023-11-18 23:49:29,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=467380.0, ans=0.125 2023-11-18 23:49:30,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=467380.0, ans=0.125 2023-11-18 23:49:38,713 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10000, loss[loss=0.09543, simple_loss=0.1089, pruned_loss=0.02961, audio_tagging_loss=0.01138, over 14798.00 frames. ], tot_loss[loss=0.09477, simple_loss=0.111, pruned_loss=0.02827, audio_tagging_loss=0.01102, over 3055717.84 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:49:51,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=467513.3333333333, ans=0.2 2023-11-18 23:49:56,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=467513.3333333333, ans=6.0 2023-11-18 23:50:00,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.96 vs. limit=15.0 2023-11-18 23:50:02,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=467580.0, ans=0.0 2023-11-18 23:50:06,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=467580.0, ans=0.125 2023-11-18 23:50:16,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=467646.6666666667, ans=0.0 2023-11-18 23:50:23,829 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.767e+01 8.943e+01 9.834e+01 1.077e+02 1.357e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-18 23:50:33,379 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10050, loss[loss=0.09741, simple_loss=0.1137, pruned_loss=0.03191, audio_tagging_loss=0.008629, over 15146.00 frames. ], tot_loss[loss=0.09481, simple_loss=0.111, pruned_loss=0.02825, audio_tagging_loss=0.01108, over 3057505.45 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:50:54,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=467913.3333333333, ans=0.125 2023-11-18 23:51:26,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=468046.6666666667, ans=0.0 2023-11-18 23:51:28,609 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10100, loss[loss=0.09421, simple_loss=0.1144, pruned_loss=0.02676, audio_tagging_loss=0.01026, over 15122.00 frames. ], tot_loss[loss=0.09559, simple_loss=0.1119, pruned_loss=0.02854, audio_tagging_loss=0.01108, over 3053886.27 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:51:34,696 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.040e-02 2023-11-18 23:51:37,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=468113.3333333333, ans=0.125 2023-11-18 23:51:48,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=468180.0, ans=0.125 2023-11-18 23:52:00,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=468313.3333333333, ans=0.2 2023-11-18 23:52:10,335 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:52:15,594 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.783e+01 8.875e+01 9.595e+01 1.083e+02 1.455e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-18 23:52:15,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=468380.0, ans=0.1 2023-11-18 23:52:16,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=468380.0, ans=0.025 2023-11-18 23:52:24,029 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10150, loss[loss=0.09419, simple_loss=0.1004, pruned_loss=0.03014, audio_tagging_loss=0.01384, over 16008.00 frames. ], tot_loss[loss=0.0958, simple_loss=0.1118, pruned_loss=0.02866, audio_tagging_loss=0.01123, over 3055416.40 frames. ], batch size: 64, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:52:28,785 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.98 vs. limit=15.0 2023-11-18 23:52:34,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=468513.3333333333, ans=0.125 2023-11-18 23:52:37,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=468513.3333333333, ans=0.0 2023-11-18 23:52:46,790 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:53:11,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=468713.3333333333, ans=0.09899494936611666 2023-11-18 23:53:14,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=468713.3333333333, ans=0.125 2023-11-18 23:53:18,612 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10200, loss[loss=0.1099, simple_loss=0.1295, pruned_loss=0.03504, audio_tagging_loss=0.01014, over 14310.00 frames. ], tot_loss[loss=0.0955, simple_loss=0.1116, pruned_loss=0.02852, audio_tagging_loss=0.01118, over 3048951.14 frames. ], batch size: 54, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:53:37,361 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-18 23:53:37,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=468846.6666666667, ans=0.0 2023-11-18 23:53:55,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=468980.0, ans=0.125 2023-11-18 23:53:56,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=468980.0, ans=15.0 2023-11-18 23:53:59,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=468980.0, ans=0.2 2023-11-18 23:54:01,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=469046.6666666667, ans=0.1 2023-11-18 23:54:04,805 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.557e+01 8.998e+01 1.003e+02 1.100e+02 1.354e+02, threshold=2.006e+02, percent-clipped=0.0 2023-11-18 23:54:04,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=469046.6666666667, ans=0.125 2023-11-18 23:54:05,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=469046.6666666667, ans=0.125 2023-11-18 23:54:13,823 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10250, loss[loss=0.1356, simple_loss=0.1605, pruned_loss=0.04647, audio_tagging_loss=0.008863, over 15801.00 frames. ], tot_loss[loss=0.09514, simple_loss=0.111, pruned_loss=0.02835, audio_tagging_loss=0.01132, over 3050017.13 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:54:20,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=469113.3333333333, ans=0.125 2023-11-18 23:54:32,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=469180.0, ans=0.2 2023-11-18 23:54:36,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=469246.6666666667, ans=0.125 2023-11-18 23:55:06,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=469380.0, ans=0.125 2023-11-18 23:55:10,097 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10300, loss[loss=0.1362, simple_loss=0.1465, pruned_loss=0.05262, audio_tagging_loss=0.01034, over 14679.00 frames. ], tot_loss[loss=0.09534, simple_loss=0.1109, pruned_loss=0.02853, audio_tagging_loss=0.01135, over 3051970.26 frames. ], batch size: 55, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:55:33,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2023-11-18 23:55:55,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=469713.3333333333, ans=0.2 2023-11-18 23:55:55,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=469713.3333333333, ans=0.125 2023-11-18 23:55:56,573 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.723e+01 9.104e+01 1.005e+02 1.145e+02 1.607e+02, threshold=2.009e+02, percent-clipped=0.0 2023-11-18 23:56:05,062 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10350, loss[loss=0.07907, simple_loss=0.0753, pruned_loss=0.02303, audio_tagging_loss=0.01838, over 15618.00 frames. ], tot_loss[loss=0.09667, simple_loss=0.1126, pruned_loss=0.02897, audio_tagging_loss=0.01141, over 3044971.01 frames. ], batch size: 61, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:56:08,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=469780.0, ans=0.1 2023-11-18 23:56:18,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=469846.6666666667, ans=0.0 2023-11-18 23:56:19,710 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=16.19 vs. limit=15.0 2023-11-18 23:56:28,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=469913.3333333333, ans=0.0 2023-11-18 23:56:35,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=469913.3333333333, ans=0.035 2023-11-18 23:56:36,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=469913.3333333333, ans=0.2 2023-11-18 23:56:43,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=469980.0, ans=0.125 2023-11-18 23:56:51,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=470046.6666666667, ans=0.1 2023-11-18 23:57:00,328 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10400, loss[loss=0.1061, simple_loss=0.1271, pruned_loss=0.03133, audio_tagging_loss=0.01125, over 16131.00 frames. ], tot_loss[loss=0.09664, simple_loss=0.1125, pruned_loss=0.02887, audio_tagging_loss=0.0115, over 3042945.65 frames. ], batch size: 60, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:57:18,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=470180.0, ans=0.125 2023-11-18 23:57:25,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=470246.6666666667, ans=0.125 2023-11-18 23:57:31,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=470246.6666666667, ans=0.07 2023-11-18 23:57:34,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=470313.3333333333, ans=0.0 2023-11-18 23:57:45,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=470380.0, ans=0.0 2023-11-18 23:57:47,235 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.298e+01 8.610e+01 9.344e+01 1.013e+02 2.407e+02, threshold=1.869e+02, percent-clipped=1.0 2023-11-18 23:57:48,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=470380.0, ans=0.125 2023-11-18 23:57:56,724 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10450, loss[loss=0.08, simple_loss=0.09625, pruned_loss=0.01981, audio_tagging_loss=0.01207, over 14721.00 frames. ], tot_loss[loss=0.09644, simple_loss=0.1124, pruned_loss=0.02884, audio_tagging_loss=0.01142, over 3042054.27 frames. ], batch size: 54, lr: 1.11e-02, grad_scale: 32.0 2023-11-18 23:58:21,049 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.16 vs. limit=22.5 2023-11-18 23:58:36,035 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.76 vs. limit=22.5 2023-11-18 23:58:49,292 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.35 vs. limit=15.0 2023-11-18 23:58:51,742 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10500, loss[loss=0.08936, simple_loss=0.103, pruned_loss=0.02904, audio_tagging_loss=0.008833, over 14955.00 frames. ], tot_loss[loss=0.09619, simple_loss=0.1122, pruned_loss=0.02885, audio_tagging_loss=0.01124, over 3033749.07 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 16.0 2023-11-18 23:59:02,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=470846.6666666667, ans=0.95 2023-11-18 23:59:15,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=470913.3333333333, ans=0.125 2023-11-18 23:59:24,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=10.36 vs. limit=15.0 2023-11-18 23:59:33,963 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-18 23:59:38,867 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.982e+01 8.568e+01 9.489e+01 1.065e+02 1.523e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-18 23:59:46,869 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10550, loss[loss=0.1145, simple_loss=0.1403, pruned_loss=0.03638, audio_tagging_loss=0.007908, over 15770.00 frames. ], tot_loss[loss=0.09659, simple_loss=0.1131, pruned_loss=0.02901, audio_tagging_loss=0.01102, over 3038715.39 frames. ], batch size: 59, lr: 1.11e-02, grad_scale: 16.0 2023-11-19 00:00:20,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=471313.3333333333, ans=0.0 2023-11-19 00:00:37,961 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.16 vs. limit=10.0 2023-11-19 00:00:43,111 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10600, loss[loss=0.08002, simple_loss=0.09408, pruned_loss=0.02311, audio_tagging_loss=0.009871, over 14781.00 frames. ], tot_loss[loss=0.0963, simple_loss=0.1129, pruned_loss=0.02882, audio_tagging_loss=0.01103, over 3041957.33 frames. ], batch size: 55, lr: 1.11e-02, grad_scale: 16.0 2023-11-19 00:01:05,183 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=471580.0, ans=0.0 2023-11-19 00:01:07,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=471580.0, ans=0.0 2023-11-19 00:01:13,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.11 vs. limit=22.5 2023-11-19 00:01:28,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=471713.3333333333, ans=0.125 2023-11-19 00:01:28,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=471713.3333333333, ans=0.2 2023-11-19 00:01:30,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=471713.3333333333, ans=0.125 2023-11-19 00:01:31,012 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.594e+01 9.038e+01 9.665e+01 1.088e+02 1.655e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-19 00:01:34,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=471713.3333333333, ans=0.0 2023-11-19 00:01:38,982 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10650, loss[loss=0.08344, simple_loss=0.1023, pruned_loss=0.02282, audio_tagging_loss=0.009454, over 15637.00 frames. ], tot_loss[loss=0.09571, simple_loss=0.1121, pruned_loss=0.02871, audio_tagging_loss=0.01094, over 3044903.38 frames. ], batch size: 59, lr: 1.11e-02, grad_scale: 16.0 2023-11-19 00:01:42,754 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.66 vs. limit=10.0 2023-11-19 00:02:06,551 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=8.62 vs. limit=15.0 2023-11-19 00:02:10,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=471913.3333333333, ans=0.0 2023-11-19 00:02:21,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=471980.0, ans=0.07 2023-11-19 00:02:26,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=472046.6666666667, ans=0.09899494936611666 2023-11-19 00:02:28,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=472046.6666666667, ans=0.0 2023-11-19 00:02:34,812 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10700, loss[loss=0.1031, simple_loss=0.1252, pruned_loss=0.03144, audio_tagging_loss=0.009117, over 15543.00 frames. ], tot_loss[loss=0.09572, simple_loss=0.1123, pruned_loss=0.02875, audio_tagging_loss=0.01082, over 3045755.62 frames. ], batch size: 57, lr: 1.11e-02, grad_scale: 16.0 2023-11-19 00:02:47,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=472180.0, ans=0.125 2023-11-19 00:02:48,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=472180.0, ans=10.0 2023-11-19 00:02:57,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.05 vs. limit=12.0 2023-11-19 00:03:12,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=472313.3333333333, ans=0.125 2023-11-19 00:03:15,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=472313.3333333333, ans=0.2 2023-11-19 00:03:15,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=472313.3333333333, ans=0.0 2023-11-19 00:03:17,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=472313.3333333333, ans=0.5 2023-11-19 00:03:22,300 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.388e+01 8.951e+01 9.625e+01 1.080e+02 1.426e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-19 00:03:30,898 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10750, loss[loss=0.096, simple_loss=0.1148, pruned_loss=0.03072, audio_tagging_loss=0.007879, over 15881.00 frames. ], tot_loss[loss=0.09612, simple_loss=0.1132, pruned_loss=0.02878, audio_tagging_loss=0.01076, over 3047852.35 frames. ], batch size: 60, lr: 1.11e-02, grad_scale: 16.0 2023-11-19 00:04:00,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=472580.0, ans=0.0 2023-11-19 00:04:09,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=472646.6666666667, ans=0.125 2023-11-19 00:04:24,860 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.39 vs. limit=15.0 2023-11-19 00:04:25,578 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10800, loss[loss=0.0931, simple_loss=0.1113, pruned_loss=0.02667, audio_tagging_loss=0.01079, over 15009.00 frames. ], tot_loss[loss=0.09586, simple_loss=0.1128, pruned_loss=0.02862, audio_tagging_loss=0.01083, over 3048742.84 frames. ], batch size: 56, lr: 1.11e-02, grad_scale: 32.0 2023-11-19 00:04:54,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=472913.3333333333, ans=0.1 2023-11-19 00:05:02,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=472980.0, ans=0.0 2023-11-19 00:05:05,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=472980.0, ans=0.2 2023-11-19 00:05:13,540 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.653e+01 8.608e+01 9.354e+01 1.065e+02 1.440e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-19 00:05:20,973 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10850, loss[loss=0.1354, simple_loss=0.1504, pruned_loss=0.04504, audio_tagging_loss=0.01514, over 15928.00 frames. ], tot_loss[loss=0.09639, simple_loss=0.1132, pruned_loss=0.02897, audio_tagging_loss=0.01084, over 3052128.99 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:05:30,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=473113.3333333333, ans=0.1 2023-11-19 00:05:33,316 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.00 vs. limit=15.0 2023-11-19 00:06:11,159 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:06:11,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=473380.0, ans=0.035 2023-11-19 00:06:17,539 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10900, loss[loss=0.09034, simple_loss=0.1126, pruned_loss=0.02338, audio_tagging_loss=0.01066, over 15528.00 frames. ], tot_loss[loss=0.09593, simple_loss=0.1126, pruned_loss=0.02871, audio_tagging_loss=0.01094, over 3046043.59 frames. ], batch size: 58, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:06:44,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=473580.0, ans=0.125 2023-11-19 00:06:50,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=473646.6666666667, ans=0.125 2023-11-19 00:07:05,073 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.196e+01 8.580e+01 9.550e+01 1.089e+02 1.595e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-19 00:07:12,538 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 10950, loss[loss=0.1099, simple_loss=0.1304, pruned_loss=0.03515, audio_tagging_loss=0.009552, over 15385.00 frames. ], tot_loss[loss=0.09592, simple_loss=0.1125, pruned_loss=0.02858, audio_tagging_loss=0.01108, over 3045072.18 frames. ], batch size: 55, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:07:17,999 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.823e-01 2023-11-19 00:07:22,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=473846.6666666667, ans=0.0 2023-11-19 00:07:22,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=473846.6666666667, ans=0.1 2023-11-19 00:07:24,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=473846.6666666667, ans=0.0 2023-11-19 00:07:27,342 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=4.23 vs. limit=12.0 2023-11-19 00:07:39,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=473913.3333333333, ans=0.2 2023-11-19 00:07:57,206 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.74 vs. limit=15.0 2023-11-19 00:08:00,298 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.62 vs. limit=15.0 2023-11-19 00:08:07,625 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11000, loss[loss=0.1019, simple_loss=0.1232, pruned_loss=0.03042, audio_tagging_loss=0.009852, over 16827.00 frames. ], tot_loss[loss=0.09541, simple_loss=0.112, pruned_loss=0.02834, audio_tagging_loss=0.01109, over 3046902.15 frames. ], batch size: 62, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:08:11,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.39 vs. limit=12.0 2023-11-19 00:08:14,388 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:08:14,558 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:08:31,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=474246.6666666667, ans=0.02 2023-11-19 00:08:34,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=474246.6666666667, ans=0.0 2023-11-19 00:08:46,521 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.26 vs. limit=15.0 2023-11-19 00:08:56,476 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 9.044e+01 9.862e+01 1.100e+02 1.802e+02, threshold=1.972e+02, percent-clipped=0.0 2023-11-19 00:09:03,374 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11050, loss[loss=0.1277, simple_loss=0.1641, pruned_loss=0.03603, audio_tagging_loss=0.009627, over 15081.00 frames. ], tot_loss[loss=0.09548, simple_loss=0.112, pruned_loss=0.02832, audio_tagging_loss=0.01117, over 3056383.47 frames. ], batch size: 53, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:09:06,058 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.07 vs. limit=15.0 2023-11-19 00:09:27,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=474580.0, ans=0.2 2023-11-19 00:09:29,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=474580.0, ans=0.125 2023-11-19 00:09:53,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=474713.3333333333, ans=0.125 2023-11-19 00:09:59,043 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11100, loss[loss=0.1033, simple_loss=0.1214, pruned_loss=0.03161, audio_tagging_loss=0.01099, over 14976.00 frames. ], tot_loss[loss=0.09578, simple_loss=0.1121, pruned_loss=0.02847, audio_tagging_loss=0.01125, over 3053895.22 frames. ], batch size: 54, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:10:05,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=474780.0, ans=0.125 2023-11-19 00:10:06,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=474780.0, ans=0.125 2023-11-19 00:10:28,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=474913.3333333333, ans=0.125 2023-11-19 00:10:35,576 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.99 vs. limit=15.0 2023-11-19 00:10:42,029 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=12.0 2023-11-19 00:10:46,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=475046.6666666667, ans=0.0 2023-11-19 00:10:47,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=475046.6666666667, ans=0.125 2023-11-19 00:10:47,792 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.819e+01 8.698e+01 9.641e+01 1.040e+02 1.445e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-19 00:10:51,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=475046.6666666667, ans=0.1 2023-11-19 00:10:54,133 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11150, loss[loss=0.08554, simple_loss=0.1019, pruned_loss=0.02244, audio_tagging_loss=0.01217, over 15217.00 frames. ], tot_loss[loss=0.09542, simple_loss=0.1117, pruned_loss=0.02833, audio_tagging_loss=0.01124, over 3049498.13 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:11:22,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.58 vs. limit=15.0 2023-11-19 00:11:23,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=475246.6666666667, ans=0.125 2023-11-19 00:11:33,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=475313.3333333333, ans=0.0 2023-11-19 00:11:35,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=475313.3333333333, ans=0.0 2023-11-19 00:11:49,596 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11200, loss[loss=0.08468, simple_loss=0.09446, pruned_loss=0.02756, audio_tagging_loss=0.009888, over 15381.00 frames. ], tot_loss[loss=0.0956, simple_loss=0.1116, pruned_loss=0.02856, audio_tagging_loss=0.01126, over 3051140.37 frames. ], batch size: 60, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:12:07,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.09 vs. limit=10.0 2023-11-19 00:12:30,231 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:12:38,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=475713.3333333333, ans=0.0 2023-11-19 00:12:39,480 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.261e+01 8.628e+01 9.761e+01 1.045e+02 1.473e+02, threshold=1.952e+02, percent-clipped=0.0 2023-11-19 00:12:45,837 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11250, loss[loss=0.1387, simple_loss=0.1619, pruned_loss=0.0463, audio_tagging_loss=0.01144, over 16211.00 frames. ], tot_loss[loss=0.09548, simple_loss=0.1114, pruned_loss=0.02849, audio_tagging_loss=0.01129, over 3057249.14 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:12:59,895 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=475846.6666666667, ans=0.125 2023-11-19 00:13:10,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=475913.3333333333, ans=0.125 2023-11-19 00:13:31,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.18 vs. limit=22.5 2023-11-19 00:13:41,021 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11300, loss[loss=0.1075, simple_loss=0.1275, pruned_loss=0.03334, audio_tagging_loss=0.01043, over 15570.00 frames. ], tot_loss[loss=0.09563, simple_loss=0.1118, pruned_loss=0.02866, audio_tagging_loss=0.01109, over 3058337.46 frames. ], batch size: 58, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:13:41,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=476113.3333333333, ans=0.0 2023-11-19 00:13:42,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=476113.3333333333, ans=0.125 2023-11-19 00:13:45,839 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.32 vs. limit=10.0 2023-11-19 00:13:48,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=476113.3333333333, ans=0.07 2023-11-19 00:13:55,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=476180.0, ans=0.125 2023-11-19 00:14:29,495 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.167e+01 8.711e+01 9.512e+01 1.035e+02 1.315e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 00:14:30,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=476380.0, ans=0.09899494936611666 2023-11-19 00:14:36,326 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11350, loss[loss=0.1003, simple_loss=0.1217, pruned_loss=0.02858, audio_tagging_loss=0.01088, over 16826.00 frames. ], tot_loss[loss=0.09526, simple_loss=0.1114, pruned_loss=0.02859, audio_tagging_loss=0.01096, over 3063320.62 frames. ], batch size: 61, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:14:51,142 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.18 vs. limit=10.0 2023-11-19 00:14:55,919 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=476513.3333333333, ans=0.125 2023-11-19 00:14:56,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=476513.3333333333, ans=0.125 2023-11-19 00:15:02,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=476580.0, ans=0.125 2023-11-19 00:15:28,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=476713.3333333333, ans=0.125 2023-11-19 00:15:32,740 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11400, loss[loss=0.08673, simple_loss=0.09369, pruned_loss=0.0253, audio_tagging_loss=0.01459, over 14418.00 frames. ], tot_loss[loss=0.09587, simple_loss=0.1124, pruned_loss=0.02875, audio_tagging_loss=0.01093, over 3064080.42 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:15:42,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=476846.6666666667, ans=0.0 2023-11-19 00:15:46,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=476846.6666666667, ans=0.1 2023-11-19 00:15:52,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=476846.6666666667, ans=22.5 2023-11-19 00:15:56,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=476913.3333333333, ans=0.125 2023-11-19 00:16:12,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=476980.0, ans=0.0 2023-11-19 00:16:16,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=477046.6666666667, ans=0.1 2023-11-19 00:16:20,945 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.841e+01 9.746e+01 1.056e+02 1.411e+02, threshold=1.949e+02, percent-clipped=0.0 2023-11-19 00:16:25,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=477046.6666666667, ans=0.0 2023-11-19 00:16:27,296 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11450, loss[loss=0.07794, simple_loss=0.09561, pruned_loss=0.01845, audio_tagging_loss=0.01168, over 14968.00 frames. ], tot_loss[loss=0.09486, simple_loss=0.1113, pruned_loss=0.02827, audio_tagging_loss=0.01094, over 3053718.80 frames. ], batch size: 55, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:16:34,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=477113.3333333333, ans=0.07 2023-11-19 00:16:38,939 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.47 vs. limit=15.0 2023-11-19 00:16:40,116 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.46 vs. limit=15.0 2023-11-19 00:17:01,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=477313.3333333333, ans=0.125 2023-11-19 00:17:04,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=477313.3333333333, ans=0.0 2023-11-19 00:17:05,036 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2023-11-19 00:17:22,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.91 vs. limit=22.5 2023-11-19 00:17:22,952 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11500, loss[loss=0.1086, simple_loss=0.1301, pruned_loss=0.03535, audio_tagging_loss=0.008169, over 15281.00 frames. ], tot_loss[loss=0.09473, simple_loss=0.1114, pruned_loss=0.02812, audio_tagging_loss=0.01092, over 3050626.68 frames. ], batch size: 55, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:17:29,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=477446.6666666667, ans=0.0 2023-11-19 00:17:30,467 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.66 vs. limit=22.5 2023-11-19 00:17:31,547 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.72 vs. limit=6.0 2023-11-19 00:17:33,667 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.46 vs. limit=15.0 2023-11-19 00:17:35,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=477513.3333333333, ans=0.125 2023-11-19 00:17:46,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=477580.0, ans=0.125 2023-11-19 00:17:52,660 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.00 vs. limit=15.0 2023-11-19 00:17:54,786 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.14 vs. limit=15.0 2023-11-19 00:18:00,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=477646.6666666667, ans=0.125 2023-11-19 00:18:09,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=477713.3333333333, ans=0.125 2023-11-19 00:18:11,939 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 8.969e+01 9.661e+01 1.076e+02 1.537e+02, threshold=1.932e+02, percent-clipped=0.0 2023-11-19 00:18:19,404 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11550, loss[loss=0.09908, simple_loss=0.1226, pruned_loss=0.02966, audio_tagging_loss=0.008112, over 15803.00 frames. ], tot_loss[loss=0.09599, simple_loss=0.1131, pruned_loss=0.02857, audio_tagging_loss=0.01089, over 3055135.69 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:18:21,832 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:18:32,416 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=13.81 vs. limit=15.0 2023-11-19 00:18:33,765 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.89 vs. limit=15.0 2023-11-19 00:18:49,471 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:19:14,380 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11600, loss[loss=0.09133, simple_loss=0.1015, pruned_loss=0.02961, audio_tagging_loss=0.01098, over 15131.00 frames. ], tot_loss[loss=0.09653, simple_loss=0.1137, pruned_loss=0.02881, audio_tagging_loss=0.01085, over 3063962.83 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:19:43,248 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:19:43,709 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.68 vs. limit=12.0 2023-11-19 00:19:47,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=478313.3333333333, ans=0.2 2023-11-19 00:19:50,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=478313.3333333333, ans=0.125 2023-11-19 00:20:02,995 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.140e+01 8.994e+01 9.981e+01 1.100e+02 1.554e+02, threshold=1.996e+02, percent-clipped=0.0 2023-11-19 00:20:09,878 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11650, loss[loss=0.1135, simple_loss=0.1396, pruned_loss=0.03416, audio_tagging_loss=0.009534, over 15169.00 frames. ], tot_loss[loss=0.09691, simple_loss=0.1145, pruned_loss=0.02884, audio_tagging_loss=0.01079, over 3063524.31 frames. ], batch size: 55, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:20:10,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=478446.6666666667, ans=0.125 2023-11-19 00:21:05,977 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=12.0 2023-11-19 00:21:06,348 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11700, loss[loss=0.1074, simple_loss=0.1243, pruned_loss=0.03407, audio_tagging_loss=0.01117, over 15225.00 frames. ], tot_loss[loss=0.09641, simple_loss=0.1138, pruned_loss=0.02865, audio_tagging_loss=0.01087, over 3055275.82 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:21:16,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=478846.6666666667, ans=0.125 2023-11-19 00:21:19,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=478846.6666666667, ans=0.125 2023-11-19 00:21:26,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=478846.6666666667, ans=0.0 2023-11-19 00:21:44,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=478980.0, ans=0.2 2023-11-19 00:21:55,327 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.403e+01 8.970e+01 9.668e+01 1.084e+02 1.454e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-19 00:22:01,685 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11750, loss[loss=0.1046, simple_loss=0.1272, pruned_loss=0.02973, audio_tagging_loss=0.01129, over 15346.00 frames. ], tot_loss[loss=0.09596, simple_loss=0.1129, pruned_loss=0.02857, audio_tagging_loss=0.01096, over 3044045.61 frames. ], batch size: 55, lr: 1.10e-02, grad_scale: 32.0 2023-11-19 00:22:05,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=479113.3333333333, ans=0.125 2023-11-19 00:22:31,113 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=479246.6666666667, ans=0.0 2023-11-19 00:22:34,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=479313.3333333333, ans=0.125 2023-11-19 00:22:34,718 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.45 vs. limit=15.0 2023-11-19 00:22:42,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.68 vs. limit=22.5 2023-11-19 00:22:48,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=479380.0, ans=0.125 2023-11-19 00:22:51,832 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=479380.0, ans=0.2 2023-11-19 00:22:53,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=479380.0, ans=0.125 2023-11-19 00:22:54,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=479380.0, ans=6.0 2023-11-19 00:22:56,857 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11800, loss[loss=0.1001, simple_loss=0.1153, pruned_loss=0.032, audio_tagging_loss=0.01043, over 15706.00 frames. ], tot_loss[loss=0.09566, simple_loss=0.1125, pruned_loss=0.02839, audio_tagging_loss=0.011, over 3052801.45 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 8.0 2023-11-19 00:23:25,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=479580.0, ans=0.09899494936611666 2023-11-19 00:23:39,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=479646.6666666667, ans=0.125 2023-11-19 00:23:48,144 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.514e+01 9.014e+01 9.704e+01 1.070e+02 1.627e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-19 00:23:53,378 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11850, loss[loss=0.08277, simple_loss=0.1086, pruned_loss=0.01883, audio_tagging_loss=0.009658, over 14931.00 frames. ], tot_loss[loss=0.09576, simple_loss=0.1126, pruned_loss=0.02837, audio_tagging_loss=0.0111, over 3054348.66 frames. ], batch size: 54, lr: 1.10e-02, grad_scale: 8.0 2023-11-19 00:24:06,940 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=12.0 2023-11-19 00:24:11,153 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.67 vs. limit=22.5 2023-11-19 00:24:21,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=479913.3333333333, ans=0.0 2023-11-19 00:24:28,467 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-72000.pt 2023-11-19 00:24:35,369 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=479980.0, ans=0.125 2023-11-19 00:24:39,492 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.94 vs. limit=15.0 2023-11-19 00:24:40,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=480046.6666666667, ans=0.1 2023-11-19 00:24:50,955 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11900, loss[loss=0.1535, simple_loss=0.1725, pruned_loss=0.05773, audio_tagging_loss=0.009509, over 15071.00 frames. ], tot_loss[loss=0.09542, simple_loss=0.1118, pruned_loss=0.02833, audio_tagging_loss=0.01119, over 3052778.00 frames. ], batch size: 54, lr: 1.10e-02, grad_scale: 8.0 2023-11-19 00:24:51,432 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.52 vs. limit=15.0 2023-11-19 00:24:52,737 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=7.96 vs. limit=15.0 2023-11-19 00:24:54,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=480113.3333333333, ans=0.125 2023-11-19 00:25:09,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=480180.0, ans=0.0 2023-11-19 00:25:20,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=480246.6666666667, ans=0.0 2023-11-19 00:25:22,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=480246.6666666667, ans=0.0 2023-11-19 00:25:23,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=480313.3333333333, ans=0.0 2023-11-19 00:25:28,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=480313.3333333333, ans=0.07 2023-11-19 00:25:35,872 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.36 vs. limit=22.5 2023-11-19 00:25:39,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=480380.0, ans=0.125 2023-11-19 00:25:41,559 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.576e+01 8.633e+01 9.352e+01 1.050e+02 1.397e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-19 00:25:45,905 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 11950, loss[loss=0.09546, simple_loss=0.112, pruned_loss=0.02425, audio_tagging_loss=0.01519, over 15194.00 frames. ], tot_loss[loss=0.09511, simple_loss=0.1113, pruned_loss=0.02809, audio_tagging_loss=0.01139, over 3046151.78 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 8.0 2023-11-19 00:25:52,191 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.05 vs. limit=22.5 2023-11-19 00:25:54,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=480446.6666666667, ans=0.2 2023-11-19 00:25:54,269 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=10.27 vs. limit=12.0 2023-11-19 00:25:57,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.52 vs. limit=8.0 2023-11-19 00:26:06,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=480513.3333333333, ans=0.125 2023-11-19 00:26:11,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=480580.0, ans=0.125 2023-11-19 00:26:11,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=480580.0, ans=0.125 2023-11-19 00:26:20,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=480646.6666666667, ans=0.1 2023-11-19 00:26:24,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=480646.6666666667, ans=0.1 2023-11-19 00:26:26,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=480646.6666666667, ans=0.125 2023-11-19 00:26:34,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=480713.3333333333, ans=0.0 2023-11-19 00:26:36,131 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.82 vs. limit=10.0 2023-11-19 00:26:38,377 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2023-11-19 00:26:39,767 INFO [train_asr.py:1115] (0/4) Epoch 6, batch 12000, loss[loss=0.08769, simple_loss=0.09866, pruned_loss=0.02419, audio_tagging_loss=0.01417, over 15318.00 frames. ], tot_loss[loss=0.09572, simple_loss=0.1119, pruned_loss=0.02829, audio_tagging_loss=0.01148, over 3050481.70 frames. ], batch size: 57, lr: 1.10e-02, grad_scale: 16.0 2023-11-19 00:26:39,769 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 00:26:54,905 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.5.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.8239, 3.4797, 5.2383, 3.7870], device='cuda:0') 2023-11-19 00:27:08,604 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7750, 5.7473, 5.8716, 5.9018], device='cuda:0') 2023-11-19 00:27:12,310 INFO [train_asr.py:1147] (0/4) Epoch 6, validation: loss=0.07011, simple_loss=0.05856, pruned_loss=0.008079, audio_tagging_loss=0.03275, over 4681554.00 frames. 2023-11-19 00:27:12,311 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 00:27:15,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=480780.0, ans=0.1 2023-11-19 00:27:29,992 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.93 vs. limit=12.0 2023-11-19 00:27:35,711 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-6.pt 2023-11-19 00:28:10,670 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 0, loss[loss=0.1058, simple_loss=0.1088, pruned_loss=0.02713, audio_tagging_loss=0.02422, over 15239.00 frames. ], tot_loss[loss=0.1058, simple_loss=0.1088, pruned_loss=0.02713, audio_tagging_loss=0.02422, over 15239.00 frames. ], batch size: 57, lr: 1.03e-02, grad_scale: 32.0 2023-11-19 00:28:10,672 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 00:28:37,118 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.2955, 4.9722, 4.7800, 5.0916], device='cuda:0') 2023-11-19 00:28:42,246 INFO [train_asr.py:1147] (0/4) Epoch 7, validation: loss=0.06897, simple_loss=0.05854, pruned_loss=0.008004, audio_tagging_loss=0.03169, over 4681554.00 frames. 2023-11-19 00:28:42,246 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 00:29:06,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=481060.0, ans=0.125 2023-11-19 00:29:08,501 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.131e+01 8.969e+01 9.678e+01 1.084e+02 1.742e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-19 00:29:21,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=481126.6666666667, ans=0.125 2023-11-19 00:29:26,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=481193.3333333333, ans=0.2 2023-11-19 00:29:36,959 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 50, loss[loss=0.07495, simple_loss=0.07161, pruned_loss=0.01454, audio_tagging_loss=0.0246, over 14592.00 frames. ], tot_loss[loss=0.1094, simple_loss=0.1154, pruned_loss=0.03039, audio_tagging_loss=0.02128, over 684692.47 frames. ], batch size: 58, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:29:59,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=481393.3333333333, ans=0.1 2023-11-19 00:30:04,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=481393.3333333333, ans=0.125 2023-11-19 00:30:05,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.97 vs. limit=12.0 2023-11-19 00:30:33,428 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 100, loss[loss=0.1073, simple_loss=0.1117, pruned_loss=0.03357, audio_tagging_loss=0.01789, over 15429.00 frames. ], tot_loss[loss=0.1057, simple_loss=0.1121, pruned_loss=0.02885, audio_tagging_loss=0.02076, over 1207426.46 frames. ], batch size: 56, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:30:37,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=481593.3333333333, ans=0.2 2023-11-19 00:31:01,054 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.625e+01 8.882e+01 9.750e+01 1.051e+02 1.477e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-19 00:31:08,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=481793.3333333333, ans=0.125 2023-11-19 00:31:12,949 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=481793.3333333333, ans=0.1 2023-11-19 00:31:27,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=481926.6666666667, ans=0.0 2023-11-19 00:31:28,790 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 150, loss[loss=0.1309, simple_loss=0.157, pruned_loss=0.04242, audio_tagging_loss=0.009947, over 13951.00 frames. ], tot_loss[loss=0.1026, simple_loss=0.1119, pruned_loss=0.02821, audio_tagging_loss=0.01845, over 1615682.42 frames. ], batch size: 52, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:31:50,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=482060.0, ans=0.1 2023-11-19 00:32:05,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=482126.6666666667, ans=0.125 2023-11-19 00:32:17,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=482193.3333333333, ans=0.125 2023-11-19 00:32:25,132 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 200, loss[loss=0.08665, simple_loss=0.1072, pruned_loss=0.02446, audio_tagging_loss=0.008586, over 14337.00 frames. ], tot_loss[loss=0.1017, simple_loss=0.1134, pruned_loss=0.02895, audio_tagging_loss=0.01609, over 1931474.61 frames. ], batch size: 56, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:32:27,892 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.54 vs. limit=22.5 2023-11-19 00:32:51,815 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=482393.3333333333, ans=0.1 2023-11-19 00:32:52,545 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.645e+01 9.072e+01 1.001e+02 1.087e+02 1.831e+02, threshold=2.002e+02, percent-clipped=0.0 2023-11-19 00:32:58,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=482460.0, ans=0.0 2023-11-19 00:33:10,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.70 vs. limit=15.0 2023-11-19 00:33:16,207 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:33:18,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=482526.6666666667, ans=0.1 2023-11-19 00:33:21,288 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 250, loss[loss=0.08703, simple_loss=0.1045, pruned_loss=0.02357, audio_tagging_loss=0.0112, over 15400.00 frames. ], tot_loss[loss=0.1004, simple_loss=0.1143, pruned_loss=0.02883, audio_tagging_loss=0.01443, over 2175873.49 frames. ], batch size: 60, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:33:28,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.29 vs. limit=15.0 2023-11-19 00:33:32,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=482660.0, ans=0.0 2023-11-19 00:33:43,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=482726.6666666667, ans=0.125 2023-11-19 00:33:46,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=482726.6666666667, ans=0.1 2023-11-19 00:33:50,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=482726.6666666667, ans=0.1 2023-11-19 00:34:13,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=482860.0, ans=0.1 2023-11-19 00:34:15,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=482926.6666666667, ans=0.1 2023-11-19 00:34:16,422 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 300, loss[loss=0.1307, simple_loss=0.1585, pruned_loss=0.04089, audio_tagging_loss=0.01056, over 15912.00 frames. ], tot_loss[loss=0.1002, simple_loss=0.1152, pruned_loss=0.02928, audio_tagging_loss=0.01331, over 2377426.26 frames. ], batch size: 58, lr: 1.03e-02, grad_scale: 16.0 2023-11-19 00:34:17,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=482926.6666666667, ans=0.0 2023-11-19 00:34:34,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=482993.3333333333, ans=0.1 2023-11-19 00:34:44,698 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.825e+01 8.903e+01 9.554e+01 1.061e+02 1.704e+02, threshold=1.911e+02, percent-clipped=0.0 2023-11-19 00:34:56,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=483126.6666666667, ans=0.1 2023-11-19 00:35:05,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=13.33 vs. limit=15.0 2023-11-19 00:35:06,552 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.74 vs. limit=22.5 2023-11-19 00:35:08,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=483193.3333333333, ans=0.125 2023-11-19 00:35:10,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=483260.0, ans=0.125 2023-11-19 00:35:12,344 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 350, loss[loss=0.1188, simple_loss=0.1423, pruned_loss=0.03975, audio_tagging_loss=0.007851, over 15339.00 frames. ], tot_loss[loss=0.09839, simple_loss=0.114, pruned_loss=0.02892, audio_tagging_loss=0.01245, over 2526574.73 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:35:28,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=14.54 vs. limit=15.0 2023-11-19 00:35:47,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=483460.0, ans=0.0 2023-11-19 00:35:50,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.58 vs. limit=22.5 2023-11-19 00:35:57,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=483526.6666666667, ans=0.125 2023-11-19 00:36:07,563 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 400, loss[loss=0.09969, simple_loss=0.1245, pruned_loss=0.02675, audio_tagging_loss=0.01067, over 15880.00 frames. ], tot_loss[loss=0.09767, simple_loss=0.1135, pruned_loss=0.02875, audio_tagging_loss=0.01216, over 2648156.96 frames. ], batch size: 59, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:36:13,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=483593.3333333333, ans=0.1 2023-11-19 00:36:18,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=483660.0, ans=0.125 2023-11-19 00:36:18,521 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.66 vs. limit=6.0 2023-11-19 00:36:22,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=483660.0, ans=0.2 2023-11-19 00:36:22,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=483660.0, ans=0.0 2023-11-19 00:36:23,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=483660.0, ans=0.125 2023-11-19 00:36:34,359 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.424e+01 8.614e+01 9.359e+01 1.038e+02 1.564e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 00:36:45,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=483793.3333333333, ans=0.125 2023-11-19 00:37:01,744 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 450, loss[loss=0.09846, simple_loss=0.1107, pruned_loss=0.03195, audio_tagging_loss=0.01116, over 14561.00 frames. ], tot_loss[loss=0.09648, simple_loss=0.1131, pruned_loss=0.02822, audio_tagging_loss=0.01173, over 2736291.66 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:37:04,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=483926.6666666667, ans=0.125 2023-11-19 00:37:17,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=483993.3333333333, ans=0.07 2023-11-19 00:37:19,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=483993.3333333333, ans=0.2 2023-11-19 00:37:32,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=484060.0, ans=0.125 2023-11-19 00:37:41,320 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:37:43,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=484126.6666666667, ans=0.125 2023-11-19 00:37:49,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=484193.3333333333, ans=0.1 2023-11-19 00:37:57,260 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 500, loss[loss=0.07754, simple_loss=0.07871, pruned_loss=0.02346, audio_tagging_loss=0.01472, over 15648.00 frames. ], tot_loss[loss=0.09498, simple_loss=0.1109, pruned_loss=0.02779, audio_tagging_loss=0.01174, over 2803013.18 frames. ], batch size: 60, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:37:57,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=484260.0, ans=0.125 2023-11-19 00:38:02,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=484260.0, ans=0.1 2023-11-19 00:38:06,453 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=484260.0, ans=0.125 2023-11-19 00:38:07,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2023-11-19 00:38:22,346 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.11 vs. limit=15.0 2023-11-19 00:38:24,793 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.607e+01 8.485e+01 9.298e+01 1.059e+02 1.299e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 00:38:37,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=484460.0, ans=0.0 2023-11-19 00:38:46,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=484526.6666666667, ans=0.125 2023-11-19 00:38:52,421 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 550, loss[loss=0.08987, simple_loss=0.1081, pruned_loss=0.02478, audio_tagging_loss=0.01105, over 14720.00 frames. ], tot_loss[loss=0.09484, simple_loss=0.1107, pruned_loss=0.02779, audio_tagging_loss=0.0117, over 2864362.48 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:38:52,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=484593.3333333333, ans=0.0 2023-11-19 00:38:58,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=484593.3333333333, ans=0.2 2023-11-19 00:39:01,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=484593.3333333333, ans=0.0 2023-11-19 00:39:33,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=484793.3333333333, ans=0.0 2023-11-19 00:39:39,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=484860.0, ans=10.0 2023-11-19 00:39:44,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=484860.0, ans=0.125 2023-11-19 00:39:48,103 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 600, loss[loss=0.0926, simple_loss=0.1072, pruned_loss=0.02573, audio_tagging_loss=0.01328, over 15172.00 frames. ], tot_loss[loss=0.09489, simple_loss=0.111, pruned_loss=0.02788, audio_tagging_loss=0.01149, over 2902518.76 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:40:08,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=485060.0, ans=0.125 2023-11-19 00:40:15,507 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.402e+01 8.943e+01 9.833e+01 1.134e+02 1.508e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-19 00:40:21,088 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.45 vs. limit=15.0 2023-11-19 00:40:22,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=485126.6666666667, ans=0.125 2023-11-19 00:40:42,651 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 650, loss[loss=0.06581, simple_loss=0.07749, pruned_loss=0.01587, audio_tagging_loss=0.01119, over 14027.00 frames. ], tot_loss[loss=0.09501, simple_loss=0.1114, pruned_loss=0.02794, audio_tagging_loss=0.01136, over 2931884.85 frames. ], batch size: 55, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:40:42,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=485260.0, ans=0.125 2023-11-19 00:40:47,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=485260.0, ans=0.125 2023-11-19 00:40:55,650 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.50 vs. limit=22.5 2023-11-19 00:41:15,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=485460.0, ans=0.2 2023-11-19 00:41:25,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=16.80 vs. limit=22.5 2023-11-19 00:41:38,263 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 700, loss[loss=0.116, simple_loss=0.1362, pruned_loss=0.03702, audio_tagging_loss=0.01084, over 16107.00 frames. ], tot_loss[loss=0.09433, simple_loss=0.1109, pruned_loss=0.02758, audio_tagging_loss=0.01129, over 2964445.84 frames. ], batch size: 59, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:41:42,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=485593.3333333333, ans=0.2 2023-11-19 00:41:44,957 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.81 vs. limit=22.5 2023-11-19 00:41:45,779 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2023-11-19 00:41:49,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=485660.0, ans=0.125 2023-11-19 00:42:06,502 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.220e+01 8.517e+01 9.340e+01 1.042e+02 1.556e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 00:42:31,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=485860.0, ans=0.1 2023-11-19 00:42:33,693 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 750, loss[loss=0.1369, simple_loss=0.1685, pruned_loss=0.04273, audio_tagging_loss=0.009892, over 15655.00 frames. ], tot_loss[loss=0.09528, simple_loss=0.1123, pruned_loss=0.02794, audio_tagging_loss=0.01121, over 2985701.60 frames. ], batch size: 54, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:42:36,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=485926.6666666667, ans=0.125 2023-11-19 00:42:37,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=485926.6666666667, ans=0.125 2023-11-19 00:43:12,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=486126.6666666667, ans=0.0 2023-11-19 00:43:22,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=486193.3333333333, ans=0.0 2023-11-19 00:43:28,903 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 800, loss[loss=0.1008, simple_loss=0.1147, pruned_loss=0.03174, audio_tagging_loss=0.01167, over 14578.00 frames. ], tot_loss[loss=0.09655, simple_loss=0.1136, pruned_loss=0.02852, audio_tagging_loss=0.01125, over 3002738.59 frames. ], batch size: 54, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:43:33,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=486260.0, ans=0.2 2023-11-19 00:43:43,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=486326.6666666667, ans=0.0 2023-11-19 00:43:50,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=486393.3333333333, ans=0.1 2023-11-19 00:43:54,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=486393.3333333333, ans=0.1 2023-11-19 00:43:58,562 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.127e+01 8.961e+01 9.604e+01 1.088e+02 1.734e+02, threshold=1.921e+02, percent-clipped=0.0 2023-11-19 00:43:59,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.26 vs. limit=10.0 2023-11-19 00:44:05,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=486460.0, ans=0.125 2023-11-19 00:44:07,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=486460.0, ans=0.0 2023-11-19 00:44:08,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=486460.0, ans=0.0 2023-11-19 00:44:24,838 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 850, loss[loss=0.06849, simple_loss=0.07803, pruned_loss=0.01836, audio_tagging_loss=0.01112, over 13889.00 frames. ], tot_loss[loss=0.09658, simple_loss=0.1135, pruned_loss=0.02852, audio_tagging_loss=0.01128, over 3015702.73 frames. ], batch size: 53, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:44:28,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=486593.3333333333, ans=0.125 2023-11-19 00:44:41,194 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.42 vs. limit=15.0 2023-11-19 00:44:44,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=486660.0, ans=0.05 2023-11-19 00:44:47,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=486726.6666666667, ans=0.125 2023-11-19 00:44:53,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=486726.6666666667, ans=0.125 2023-11-19 00:44:58,273 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=486793.3333333333, ans=0.2 2023-11-19 00:45:03,819 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=10.17 vs. limit=10.0 2023-11-19 00:45:05,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=486793.3333333333, ans=0.125 2023-11-19 00:45:10,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=486860.0, ans=0.1 2023-11-19 00:45:14,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=486860.0, ans=0.125 2023-11-19 00:45:20,831 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.68 vs. limit=15.0 2023-11-19 00:45:21,334 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 900, loss[loss=0.1021, simple_loss=0.1219, pruned_loss=0.03115, audio_tagging_loss=0.01, over 15858.00 frames. ], tot_loss[loss=0.09602, simple_loss=0.1127, pruned_loss=0.02822, audio_tagging_loss=0.01147, over 3021122.94 frames. ], batch size: 60, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:45:32,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=486993.3333333333, ans=0.1 2023-11-19 00:45:49,160 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.581e+01 9.444e+01 1.025e+02 1.382e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-19 00:45:49,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=487060.0, ans=0.125 2023-11-19 00:46:11,411 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2023-11-19 00:46:16,168 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 950, loss[loss=0.06311, simple_loss=0.0762, pruned_loss=0.01636, audio_tagging_loss=0.008653, over 14108.00 frames. ], tot_loss[loss=0.09537, simple_loss=0.1118, pruned_loss=0.02811, audio_tagging_loss=0.01137, over 3021856.05 frames. ], batch size: 52, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:46:18,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=487260.0, ans=0.125 2023-11-19 00:46:22,368 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.56 vs. limit=15.0 2023-11-19 00:46:35,393 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:47:10,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=487593.3333333333, ans=0.1 2023-11-19 00:47:11,644 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1000, loss[loss=0.1063, simple_loss=0.1217, pruned_loss=0.03443, audio_tagging_loss=0.01098, over 15624.00 frames. ], tot_loss[loss=0.09466, simple_loss=0.111, pruned_loss=0.0279, audio_tagging_loss=0.01128, over 3024699.46 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:47:20,652 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.11 vs. limit=15.0 2023-11-19 00:47:26,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=487660.0, ans=0.07 2023-11-19 00:47:31,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=487660.0, ans=0.1 2023-11-19 00:47:35,293 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:47:35,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=487726.6666666667, ans=0.1 2023-11-19 00:47:41,630 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.644e+01 9.195e+01 1.009e+02 1.438e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-19 00:48:07,473 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1050, loss[loss=0.104, simple_loss=0.1241, pruned_loss=0.03089, audio_tagging_loss=0.01105, over 15638.00 frames. ], tot_loss[loss=0.09471, simple_loss=0.111, pruned_loss=0.0281, audio_tagging_loss=0.0111, over 3025139.60 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:48:11,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=487926.6666666667, ans=0.125 2023-11-19 00:48:32,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=488060.0, ans=0.125 2023-11-19 00:48:38,639 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=488060.0, ans=0.125 2023-11-19 00:48:41,012 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.07 vs. limit=15.0 2023-11-19 00:48:49,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=488126.6666666667, ans=0.125 2023-11-19 00:49:00,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=488193.3333333333, ans=0.2 2023-11-19 00:49:03,262 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1100, loss[loss=0.07181, simple_loss=0.08502, pruned_loss=0.01979, audio_tagging_loss=0.009502, over 15570.00 frames. ], tot_loss[loss=0.09426, simple_loss=0.111, pruned_loss=0.02784, audio_tagging_loss=0.01093, over 3028714.96 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:49:06,403 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:49:08,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=488260.0, ans=0.2 2023-11-19 00:49:13,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=488326.6666666667, ans=0.125 2023-11-19 00:49:17,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=488326.6666666667, ans=0.1 2023-11-19 00:49:17,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=488326.6666666667, ans=0.125 2023-11-19 00:49:24,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=488393.3333333333, ans=0.125 2023-11-19 00:49:33,466 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.724e+01 8.821e+01 9.518e+01 1.052e+02 1.526e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 00:49:55,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=12.0 2023-11-19 00:49:58,983 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1150, loss[loss=0.07727, simple_loss=0.08962, pruned_loss=0.02168, audio_tagging_loss=0.01078, over 14062.00 frames. ], tot_loss[loss=0.095, simple_loss=0.1121, pruned_loss=0.02813, audio_tagging_loss=0.01082, over 3033204.72 frames. ], batch size: 55, lr: 1.02e-02, grad_scale: 16.0 2023-11-19 00:50:09,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=488660.0, ans=0.0 2023-11-19 00:50:14,230 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2023-11-19 00:50:18,801 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.18 vs. limit=15.0 2023-11-19 00:50:27,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=488726.6666666667, ans=0.1 2023-11-19 00:50:30,983 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.49 vs. limit=10.0 2023-11-19 00:50:33,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=488793.3333333333, ans=0.125 2023-11-19 00:50:36,943 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.13 vs. limit=15.0 2023-11-19 00:50:55,536 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1200, loss[loss=0.08433, simple_loss=0.09878, pruned_loss=0.02555, audio_tagging_loss=0.00939, over 15450.00 frames. ], tot_loss[loss=0.0942, simple_loss=0.111, pruned_loss=0.02777, audio_tagging_loss=0.01091, over 3031782.08 frames. ], batch size: 58, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:50:56,095 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.73 vs. limit=15.0 2023-11-19 00:50:57,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=488926.6666666667, ans=0.0 2023-11-19 00:51:00,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=488926.6666666667, ans=0.0 2023-11-19 00:51:25,293 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.550e+01 8.775e+01 9.458e+01 1.050e+02 1.338e+02, threshold=1.892e+02, percent-clipped=0.0 2023-11-19 00:51:31,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=489126.6666666667, ans=0.2 2023-11-19 00:51:45,024 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.07 vs. limit=12.0 2023-11-19 00:51:50,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=489260.0, ans=0.125 2023-11-19 00:51:50,846 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1250, loss[loss=0.09469, simple_loss=0.1165, pruned_loss=0.0234, audio_tagging_loss=0.01307, over 16751.00 frames. ], tot_loss[loss=0.09482, simple_loss=0.1115, pruned_loss=0.0281, audio_tagging_loss=0.01096, over 3042612.00 frames. ], batch size: 62, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:52:05,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=489326.6666666667, ans=0.2 2023-11-19 00:52:22,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.64 vs. limit=10.0 2023-11-19 00:52:44,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=489526.6666666667, ans=0.1 2023-11-19 00:52:47,234 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1300, loss[loss=0.0974, simple_loss=0.1134, pruned_loss=0.03079, audio_tagging_loss=0.009916, over 15325.00 frames. ], tot_loss[loss=0.09425, simple_loss=0.1108, pruned_loss=0.02786, audio_tagging_loss=0.01098, over 3034581.84 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:52:49,889 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.61 vs. limit=6.0 2023-11-19 00:52:54,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=489593.3333333333, ans=0.0 2023-11-19 00:53:00,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=489660.0, ans=0.125 2023-11-19 00:53:17,085 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.399e+01 8.502e+01 9.491e+01 1.040e+02 1.421e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-19 00:53:43,592 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1350, loss[loss=0.1184, simple_loss=0.1374, pruned_loss=0.04012, audio_tagging_loss=0.009542, over 15817.00 frames. ], tot_loss[loss=0.09573, simple_loss=0.1127, pruned_loss=0.02848, audio_tagging_loss=0.01091, over 3031527.07 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:53:48,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=489926.6666666667, ans=0.125 2023-11-19 00:54:04,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=490060.0, ans=0.125 2023-11-19 00:54:16,881 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.18 vs. limit=10.0 2023-11-19 00:54:18,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=490126.6666666667, ans=0.0 2023-11-19 00:54:24,932 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 00:54:26,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=490126.6666666667, ans=0.125 2023-11-19 00:54:32,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=490193.3333333333, ans=0.125 2023-11-19 00:54:38,695 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1400, loss[loss=0.09674, simple_loss=0.1132, pruned_loss=0.02513, audio_tagging_loss=0.01503, over 14628.00 frames. ], tot_loss[loss=0.09541, simple_loss=0.1125, pruned_loss=0.02818, audio_tagging_loss=0.01096, over 3032574.69 frames. ], batch size: 55, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:54:57,272 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=15.0 2023-11-19 00:55:09,549 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.389e+01 8.650e+01 9.368e+01 1.053e+02 1.666e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 00:55:18,253 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:55:35,059 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1450, loss[loss=0.08546, simple_loss=0.09813, pruned_loss=0.02431, audio_tagging_loss=0.01209, over 14056.00 frames. ], tot_loss[loss=0.09577, simple_loss=0.1133, pruned_loss=0.02819, audio_tagging_loss=0.01091, over 3042492.53 frames. ], batch size: 54, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:55:47,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=490660.0, ans=0.125 2023-11-19 00:55:50,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=490660.0, ans=0.0 2023-11-19 00:55:51,650 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2023-11-19 00:55:56,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=490726.6666666667, ans=0.1 2023-11-19 00:56:08,372 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 00:56:19,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=490860.0, ans=0.2 2023-11-19 00:56:30,785 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1500, loss[loss=0.1076, simple_loss=0.1301, pruned_loss=0.02865, audio_tagging_loss=0.01395, over 14699.00 frames. ], tot_loss[loss=0.09592, simple_loss=0.1133, pruned_loss=0.02829, audio_tagging_loss=0.01099, over 3034525.42 frames. ], batch size: 54, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:56:34,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=490926.6666666667, ans=0.1 2023-11-19 00:56:37,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=490926.6666666667, ans=0.025 2023-11-19 00:57:00,323 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.558e+01 8.708e+01 9.682e+01 1.052e+02 1.356e+02, threshold=1.936e+02, percent-clipped=0.0 2023-11-19 00:57:06,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=491126.6666666667, ans=0.2 2023-11-19 00:57:15,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=491193.3333333333, ans=0.0 2023-11-19 00:57:16,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=491193.3333333333, ans=0.05 2023-11-19 00:57:18,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=491193.3333333333, ans=0.125 2023-11-19 00:57:19,932 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2023-11-19 00:57:25,671 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1550, loss[loss=0.1099, simple_loss=0.1345, pruned_loss=0.03174, audio_tagging_loss=0.0109, over 15286.00 frames. ], tot_loss[loss=0.09655, simple_loss=0.1142, pruned_loss=0.02841, audio_tagging_loss=0.01104, over 3030421.74 frames. ], batch size: 56, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:57:31,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=491260.0, ans=0.125 2023-11-19 00:57:32,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491260.0, ans=0.1 2023-11-19 00:57:36,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=491326.6666666667, ans=0.125 2023-11-19 00:57:36,728 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2023-11-19 00:57:41,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=491326.6666666667, ans=0.0 2023-11-19 00:57:52,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=491393.3333333333, ans=0.1 2023-11-19 00:57:59,947 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.90 vs. limit=15.0 2023-11-19 00:58:17,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=491526.6666666667, ans=0.125 2023-11-19 00:58:20,576 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1600, loss[loss=0.1103, simple_loss=0.1218, pruned_loss=0.03444, audio_tagging_loss=0.01497, over 14395.00 frames. ], tot_loss[loss=0.09619, simple_loss=0.1135, pruned_loss=0.02834, audio_tagging_loss=0.01112, over 3035023.73 frames. ], batch size: 55, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:58:37,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=491660.0, ans=0.0 2023-11-19 00:58:38,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=491660.0, ans=0.1 2023-11-19 00:58:50,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=491726.6666666667, ans=0.1 2023-11-19 00:58:50,951 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.136e+01 8.782e+01 9.645e+01 1.086e+02 1.733e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-19 00:59:17,055 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1650, loss[loss=0.07396, simple_loss=0.08975, pruned_loss=0.02004, audio_tagging_loss=0.009036, over 16225.00 frames. ], tot_loss[loss=0.09615, simple_loss=0.1131, pruned_loss=0.02837, audio_tagging_loss=0.01123, over 3043279.42 frames. ], batch size: 65, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 00:59:32,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=491993.3333333333, ans=0.1 2023-11-19 00:59:39,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=492060.0, ans=0.07 2023-11-19 00:59:55,397 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.33 vs. limit=15.0 2023-11-19 01:00:10,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=492193.3333333333, ans=0.125 2023-11-19 01:00:12,746 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1700, loss[loss=0.07989, simple_loss=0.09934, pruned_loss=0.02019, audio_tagging_loss=0.01002, over 15191.00 frames. ], tot_loss[loss=0.09551, simple_loss=0.1125, pruned_loss=0.02805, audio_tagging_loss=0.0112, over 3042398.02 frames. ], batch size: 57, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 01:00:27,893 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=492326.6666666667, ans=0.2 2023-11-19 01:00:31,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=492326.6666666667, ans=0.125 2023-11-19 01:00:35,191 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.96 vs. limit=15.0 2023-11-19 01:00:35,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=492393.3333333333, ans=0.125 2023-11-19 01:00:43,065 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.546e+01 9.504e+01 1.048e+02 1.501e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-19 01:00:45,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=492393.3333333333, ans=0.04949747468305833 2023-11-19 01:01:04,293 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-19 01:01:08,064 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1750, loss[loss=0.08636, simple_loss=0.1077, pruned_loss=0.02195, audio_tagging_loss=0.01057, over 14304.00 frames. ], tot_loss[loss=0.09373, simple_loss=0.1102, pruned_loss=0.0274, audio_tagging_loss=0.01122, over 3037700.21 frames. ], batch size: 54, lr: 1.02e-02, grad_scale: 32.0 2023-11-19 01:01:08,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=492593.3333333333, ans=0.125 2023-11-19 01:01:09,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=492593.3333333333, ans=0.125 2023-11-19 01:01:25,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=492660.0, ans=0.0 2023-11-19 01:01:33,304 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.42 vs. limit=15.0 2023-11-19 01:01:46,665 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:01:48,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=492793.3333333333, ans=0.2 2023-11-19 01:01:57,132 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.89 vs. limit=15.0 2023-11-19 01:02:04,415 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1800, loss[loss=0.1022, simple_loss=0.1209, pruned_loss=0.02881, audio_tagging_loss=0.01294, over 15121.00 frames. ], tot_loss[loss=0.09452, simple_loss=0.1115, pruned_loss=0.0277, audio_tagging_loss=0.01108, over 3039169.32 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:02:09,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=492926.6666666667, ans=0.2 2023-11-19 01:02:20,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=492993.3333333333, ans=0.2 2023-11-19 01:02:20,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=492993.3333333333, ans=0.1 2023-11-19 01:02:21,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=492993.3333333333, ans=0.125 2023-11-19 01:02:24,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=492993.3333333333, ans=0.0 2023-11-19 01:02:27,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=493060.0, ans=0.0 2023-11-19 01:02:34,107 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.111e+01 8.598e+01 9.390e+01 1.040e+02 1.619e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-19 01:02:37,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.03 vs. limit=15.0 2023-11-19 01:02:38,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=493126.6666666667, ans=0.125 2023-11-19 01:03:00,479 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1850, loss[loss=0.1113, simple_loss=0.1182, pruned_loss=0.04062, audio_tagging_loss=0.01163, over 14143.00 frames. ], tot_loss[loss=0.09491, simple_loss=0.1121, pruned_loss=0.02792, audio_tagging_loss=0.01094, over 3037862.63 frames. ], batch size: 52, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:03:11,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=493326.6666666667, ans=0.0 2023-11-19 01:03:13,606 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:03:15,825 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=493326.6666666667, ans=0.125 2023-11-19 01:03:38,300 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.10 vs. limit=6.0 2023-11-19 01:03:42,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=493460.0, ans=0.05 2023-11-19 01:03:45,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=493526.6666666667, ans=0.125 2023-11-19 01:03:55,762 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1900, loss[loss=0.06409, simple_loss=0.074, pruned_loss=0.01693, audio_tagging_loss=0.01016, over 16446.00 frames. ], tot_loss[loss=0.0939, simple_loss=0.111, pruned_loss=0.02749, audio_tagging_loss=0.01089, over 3050885.09 frames. ], batch size: 63, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:04:17,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=493726.6666666667, ans=0.0 2023-11-19 01:04:24,399 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.16 vs. limit=12.0 2023-11-19 01:04:25,810 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.583e+01 8.519e+01 9.193e+01 1.005e+02 1.310e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-19 01:04:36,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=493793.3333333333, ans=0.0 2023-11-19 01:04:50,850 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 1950, loss[loss=0.07588, simple_loss=0.09228, pruned_loss=0.02139, audio_tagging_loss=0.008346, over 15109.00 frames. ], tot_loss[loss=0.09351, simple_loss=0.1103, pruned_loss=0.02748, audio_tagging_loss=0.01088, over 3043881.86 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:05:20,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=494060.0, ans=0.125 2023-11-19 01:05:22,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=494060.0, ans=0.2 2023-11-19 01:05:23,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=494126.6666666667, ans=0.125 2023-11-19 01:05:47,465 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2000, loss[loss=0.09013, simple_loss=0.1106, pruned_loss=0.02701, audio_tagging_loss=0.007832, over 16644.00 frames. ], tot_loss[loss=0.09254, simple_loss=0.109, pruned_loss=0.02707, audio_tagging_loss=0.01095, over 3046984.09 frames. ], batch size: 62, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:05:54,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=494260.0, ans=0.125 2023-11-19 01:05:56,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1.whitening_limit, batch_count=494260.0, ans=10.0 2023-11-19 01:06:04,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=494326.6666666667, ans=0.0 2023-11-19 01:06:06,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=494326.6666666667, ans=0.125 2023-11-19 01:06:16,506 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.706e+01 9.238e+01 1.036e+02 1.404e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 01:06:19,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=494460.0, ans=0.1 2023-11-19 01:06:37,590 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.04 vs. limit=15.0 2023-11-19 01:06:41,269 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.98 vs. limit=12.0 2023-11-19 01:06:42,721 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2050, loss[loss=0.121, simple_loss=0.1475, pruned_loss=0.03774, audio_tagging_loss=0.009464, over 15005.00 frames. ], tot_loss[loss=0.09278, simple_loss=0.1093, pruned_loss=0.02721, audio_tagging_loss=0.01093, over 3045225.39 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:06:55,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=494660.0, ans=0.125 2023-11-19 01:07:03,504 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:07:24,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=494793.3333333333, ans=0.125 2023-11-19 01:07:31,623 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.04 vs. limit=15.0 2023-11-19 01:07:39,056 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2100, loss[loss=0.09125, simple_loss=0.1175, pruned_loss=0.02425, audio_tagging_loss=0.00825, over 14809.00 frames. ], tot_loss[loss=0.09357, simple_loss=0.1103, pruned_loss=0.02753, audio_tagging_loss=0.0109, over 3043239.80 frames. ], batch size: 54, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:07:40,881 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.92 vs. limit=6.0 2023-11-19 01:07:53,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=494993.3333333333, ans=0.2 2023-11-19 01:07:53,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=494993.3333333333, ans=0.125 2023-11-19 01:08:04,763 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.59 vs. limit=15.0 2023-11-19 01:08:06,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=495060.0, ans=0.05 2023-11-19 01:08:08,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=495060.0, ans=0.125 2023-11-19 01:08:09,432 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.837e+01 9.503e+01 1.029e+02 1.417e+02, threshold=1.901e+02, percent-clipped=0.0 2023-11-19 01:08:35,532 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2150, loss[loss=0.103, simple_loss=0.1185, pruned_loss=0.03129, audio_tagging_loss=0.01249, over 15954.00 frames. ], tot_loss[loss=0.09349, simple_loss=0.11, pruned_loss=0.02755, audio_tagging_loss=0.01096, over 3042080.55 frames. ], batch size: 59, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:08:35,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=495260.0, ans=0.125 2023-11-19 01:08:52,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=495326.6666666667, ans=0.0 2023-11-19 01:08:55,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=495326.6666666667, ans=0.0 2023-11-19 01:08:56,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=495393.3333333333, ans=0.125 2023-11-19 01:09:00,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=495393.3333333333, ans=0.2 2023-11-19 01:09:10,128 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:09:11,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=495460.0, ans=0.125 2023-11-19 01:09:30,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=495593.3333333333, ans=0.05 2023-11-19 01:09:31,316 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2200, loss[loss=0.08867, simple_loss=0.1062, pruned_loss=0.02662, audio_tagging_loss=0.008946, over 14608.00 frames. ], tot_loss[loss=0.09459, simple_loss=0.1115, pruned_loss=0.02796, audio_tagging_loss=0.0109, over 3043931.12 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:10:02,325 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.926e+01 8.544e+01 9.673e+01 1.053e+02 1.527e+02, threshold=1.935e+02, percent-clipped=0.0 2023-11-19 01:10:04,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=495793.3333333333, ans=0.125 2023-11-19 01:10:09,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=495793.3333333333, ans=0.1 2023-11-19 01:10:10,531 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:10:11,575 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:10:26,497 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2250, loss[loss=0.09861, simple_loss=0.1174, pruned_loss=0.02915, audio_tagging_loss=0.01076, over 15982.00 frames. ], tot_loss[loss=0.0955, simple_loss=0.1124, pruned_loss=0.02835, audio_tagging_loss=0.01093, over 3043905.23 frames. ], batch size: 60, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:10:29,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=495926.6666666667, ans=0.2 2023-11-19 01:10:42,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=495993.3333333333, ans=0.0 2023-11-19 01:11:14,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=496193.3333333333, ans=0.125 2023-11-19 01:11:23,060 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2300, loss[loss=0.09546, simple_loss=0.1143, pruned_loss=0.02696, audio_tagging_loss=0.01134, over 15711.00 frames. ], tot_loss[loss=0.09435, simple_loss=0.111, pruned_loss=0.0278, audio_tagging_loss=0.01102, over 3035332.38 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:11:38,873 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.10 vs. limit=15.0 2023-11-19 01:11:44,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=496393.3333333333, ans=0.05 2023-11-19 01:11:47,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=496393.3333333333, ans=0.125 2023-11-19 01:11:53,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.554e+01 9.064e+01 9.790e+01 1.107e+02 1.454e+02, threshold=1.958e+02, percent-clipped=0.0 2023-11-19 01:12:04,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=496460.0, ans=0.0 2023-11-19 01:12:05,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=496460.0, ans=0.0 2023-11-19 01:12:12,940 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:12:15,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=496526.6666666667, ans=0.125 2023-11-19 01:12:18,213 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2350, loss[loss=0.09805, simple_loss=0.1119, pruned_loss=0.02911, audio_tagging_loss=0.01298, over 14847.00 frames. ], tot_loss[loss=0.0951, simple_loss=0.1118, pruned_loss=0.02809, audio_tagging_loss=0.01113, over 3036478.62 frames. ], batch size: 58, lr: 1.01e-02, grad_scale: 16.0 2023-11-19 01:12:52,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=496793.3333333333, ans=0.125 2023-11-19 01:13:14,603 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2400, loss[loss=0.1165, simple_loss=0.1383, pruned_loss=0.03829, audio_tagging_loss=0.009096, over 16325.00 frames. ], tot_loss[loss=0.09473, simple_loss=0.1114, pruned_loss=0.02785, audio_tagging_loss=0.0112, over 3034400.00 frames. ], batch size: 59, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:13:26,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=496993.3333333333, ans=0.2 2023-11-19 01:13:28,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=496993.3333333333, ans=0.125 2023-11-19 01:13:33,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=496993.3333333333, ans=0.125 2023-11-19 01:13:45,240 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.178e+01 8.739e+01 9.521e+01 1.008e+02 1.350e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 01:13:48,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.64 vs. limit=22.5 2023-11-19 01:14:02,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=497193.3333333333, ans=0.125 2023-11-19 01:14:10,592 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2450, loss[loss=0.1125, simple_loss=0.1462, pruned_loss=0.03207, audio_tagging_loss=0.007342, over 15937.00 frames. ], tot_loss[loss=0.09503, simple_loss=0.1118, pruned_loss=0.02788, audio_tagging_loss=0.01126, over 3042070.97 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:14:17,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=497260.0, ans=0.2 2023-11-19 01:14:47,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=497460.0, ans=0.1 2023-11-19 01:14:51,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=497460.0, ans=0.5 2023-11-19 01:14:52,308 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2023-11-19 01:15:00,220 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.87 vs. limit=15.0 2023-11-19 01:15:06,178 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2500, loss[loss=0.08801, simple_loss=0.1137, pruned_loss=0.02068, audio_tagging_loss=0.01049, over 14453.00 frames. ], tot_loss[loss=0.09356, simple_loss=0.1102, pruned_loss=0.02721, audio_tagging_loss=0.01128, over 3042978.41 frames. ], batch size: 53, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:15:08,984 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.28 vs. limit=15.0 2023-11-19 01:15:18,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=497660.0, ans=0.1 2023-11-19 01:15:18,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=497660.0, ans=0.0 2023-11-19 01:15:28,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=497726.6666666667, ans=0.125 2023-11-19 01:15:37,854 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.172e+01 8.612e+01 9.372e+01 1.003e+02 1.252e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 01:16:01,810 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2550, loss[loss=0.1101, simple_loss=0.126, pruned_loss=0.03715, audio_tagging_loss=0.009969, over 15587.00 frames. ], tot_loss[loss=0.09328, simple_loss=0.1098, pruned_loss=0.02712, audio_tagging_loss=0.01127, over 3043321.50 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:16:16,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=497993.3333333333, ans=0.2 2023-11-19 01:16:21,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=497993.3333333333, ans=0.2 2023-11-19 01:16:31,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=498060.0, ans=0.125 2023-11-19 01:16:36,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=498126.6666666667, ans=0.125 2023-11-19 01:16:40,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.88 vs. limit=6.0 2023-11-19 01:16:40,535 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=8.27 vs. limit=15.0 2023-11-19 01:16:48,215 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.34 vs. limit=15.0 2023-11-19 01:16:56,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=498260.0, ans=0.0 2023-11-19 01:16:57,745 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2600, loss[loss=0.0941, simple_loss=0.1046, pruned_loss=0.02553, audio_tagging_loss=0.01625, over 15003.00 frames. ], tot_loss[loss=0.09304, simple_loss=0.1094, pruned_loss=0.02714, audio_tagging_loss=0.0112, over 3039795.66 frames. ], batch size: 54, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:16:57,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=498260.0, ans=0.0 2023-11-19 01:17:02,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=498260.0, ans=0.125 2023-11-19 01:17:20,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=498393.3333333333, ans=0.07 2023-11-19 01:17:28,699 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.310e+01 9.022e+01 9.947e+01 2.048e+02, threshold=1.804e+02, percent-clipped=1.0 2023-11-19 01:17:31,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=498460.0, ans=0.125 2023-11-19 01:17:49,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=498526.6666666667, ans=0.1 2023-11-19 01:17:53,113 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2650, loss[loss=0.08387, simple_loss=0.09756, pruned_loss=0.02581, audio_tagging_loss=0.009286, over 14240.00 frames. ], tot_loss[loss=0.09255, simple_loss=0.109, pruned_loss=0.02688, audio_tagging_loss=0.01114, over 3040453.95 frames. ], batch size: 54, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:17:54,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=498593.3333333333, ans=0.2 2023-11-19 01:17:58,946 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.81 vs. limit=22.5 2023-11-19 01:18:02,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=498660.0, ans=10.0 2023-11-19 01:18:02,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=498660.0, ans=0.0 2023-11-19 01:18:17,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=498726.6666666667, ans=0.125 2023-11-19 01:18:19,106 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.72 vs. limit=15.0 2023-11-19 01:18:42,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=498860.0, ans=0.0 2023-11-19 01:18:48,520 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2700, loss[loss=0.07977, simple_loss=0.08043, pruned_loss=0.02621, audio_tagging_loss=0.01334, over 14744.00 frames. ], tot_loss[loss=0.09221, simple_loss=0.1084, pruned_loss=0.02682, audio_tagging_loss=0.0112, over 3045346.57 frames. ], batch size: 59, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:19:10,248 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.05 vs. limit=15.0 2023-11-19 01:19:10,382 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.54 vs. limit=12.0 2023-11-19 01:19:13,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=499060.0, ans=0.09899494936611666 2023-11-19 01:19:20,348 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.993e+01 8.454e+01 8.998e+01 9.749e+01 1.436e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 01:19:39,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=499193.3333333333, ans=0.125 2023-11-19 01:19:44,789 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2750, loss[loss=0.09557, simple_loss=0.1138, pruned_loss=0.0276, audio_tagging_loss=0.01107, over 15661.00 frames. ], tot_loss[loss=0.09202, simple_loss=0.1086, pruned_loss=0.02668, audio_tagging_loss=0.01106, over 3041970.27 frames. ], batch size: 60, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:19:46,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=499260.0, ans=10.0 2023-11-19 01:20:05,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=499393.3333333333, ans=0.125 2023-11-19 01:20:06,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=499393.3333333333, ans=0.0 2023-11-19 01:20:33,923 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:20:40,206 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2800, loss[loss=0.1149, simple_loss=0.1371, pruned_loss=0.03581, audio_tagging_loss=0.01054, over 14957.00 frames. ], tot_loss[loss=0.0926, simple_loss=0.1094, pruned_loss=0.02696, audio_tagging_loss=0.01092, over 3041205.86 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:20:41,631 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=22.5 2023-11-19 01:20:45,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=499593.3333333333, ans=0.125 2023-11-19 01:21:11,306 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.572e+01 8.889e+01 9.395e+01 1.013e+02 1.273e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 01:21:35,074 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2850, loss[loss=0.0788, simple_loss=0.08716, pruned_loss=0.024, audio_tagging_loss=0.01121, over 14999.00 frames. ], tot_loss[loss=0.09249, simple_loss=0.1094, pruned_loss=0.02684, audio_tagging_loss=0.01094, over 3050546.46 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:21:50,092 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.25 vs. limit=10.0 2023-11-19 01:21:57,861 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:22:02,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=500060.0, ans=0.05 2023-11-19 01:22:32,055 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2900, loss[loss=0.09237, simple_loss=0.113, pruned_loss=0.02571, audio_tagging_loss=0.01016, over 15666.00 frames. ], tot_loss[loss=0.09315, simple_loss=0.1102, pruned_loss=0.0272, audio_tagging_loss=0.01085, over 3044741.38 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:22:41,275 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=500260.0, ans=0.125 2023-11-19 01:22:53,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=500393.3333333333, ans=0.125 2023-11-19 01:22:54,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=500393.3333333333, ans=0.0 2023-11-19 01:23:02,355 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.772e+01 8.483e+01 9.561e+01 1.049e+02 1.503e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-19 01:23:16,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=500526.6666666667, ans=10.0 2023-11-19 01:23:17,393 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.74 vs. limit=22.5 2023-11-19 01:23:27,926 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 2950, loss[loss=0.1027, simple_loss=0.1232, pruned_loss=0.03019, audio_tagging_loss=0.01087, over 15808.00 frames. ], tot_loss[loss=0.09326, simple_loss=0.1101, pruned_loss=0.02723, audio_tagging_loss=0.01099, over 3045190.03 frames. ], batch size: 60, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:23:42,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=500660.0, ans=0.0 2023-11-19 01:23:49,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=500726.6666666667, ans=0.125 2023-11-19 01:24:02,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=500793.3333333333, ans=0.0 2023-11-19 01:24:08,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=500793.3333333333, ans=0.1 2023-11-19 01:24:12,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=500860.0, ans=0.125 2023-11-19 01:24:22,442 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3000, loss[loss=0.0931, simple_loss=0.1115, pruned_loss=0.02839, audio_tagging_loss=0.008956, over 15723.00 frames. ], tot_loss[loss=0.09305, simple_loss=0.1098, pruned_loss=0.02717, audio_tagging_loss=0.011, over 3042532.16 frames. ], batch size: 59, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:24:22,444 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 01:24:54,909 INFO [train_asr.py:1147] (0/4) Epoch 7, validation: loss=0.06857, simple_loss=0.05795, pruned_loss=0.007692, audio_tagging_loss=0.0319, over 4681554.00 frames. 2023-11-19 01:24:54,909 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 01:25:12,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=500993.3333333333, ans=0.125 2023-11-19 01:25:18,103 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.58 vs. limit=22.5 2023-11-19 01:25:21,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=501060.0, ans=0.125 2023-11-19 01:25:24,917 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.894e+01 8.798e+01 9.751e+01 1.102e+02 1.409e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-19 01:25:27,778 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=501126.6666666667, ans=0.0 2023-11-19 01:25:32,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=501126.6666666667, ans=0.1 2023-11-19 01:25:41,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=501193.3333333333, ans=0.1 2023-11-19 01:25:44,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=501193.3333333333, ans=0.125 2023-11-19 01:25:48,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=501193.3333333333, ans=0.125 2023-11-19 01:25:50,599 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3050, loss[loss=0.1055, simple_loss=0.1262, pruned_loss=0.03163, audio_tagging_loss=0.0107, over 15861.00 frames. ], tot_loss[loss=0.09441, simple_loss=0.1113, pruned_loss=0.02781, audio_tagging_loss=0.01094, over 3054440.11 frames. ], batch size: 62, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:26:00,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=501326.6666666667, ans=0.2 2023-11-19 01:26:20,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=501393.3333333333, ans=0.0 2023-11-19 01:26:25,177 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:26:44,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=501526.6666666667, ans=0.125 2023-11-19 01:26:46,169 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3100, loss[loss=0.09548, simple_loss=0.1147, pruned_loss=0.02831, audio_tagging_loss=0.009799, over 14922.00 frames. ], tot_loss[loss=0.09511, simple_loss=0.112, pruned_loss=0.02806, audio_tagging_loss=0.01102, over 3049386.40 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:26:46,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=501593.3333333333, ans=0.2 2023-11-19 01:26:47,733 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-11-19 01:27:17,742 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.054e+01 8.685e+01 9.204e+01 1.020e+02 1.427e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 01:27:21,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=501793.3333333333, ans=0.2 2023-11-19 01:27:30,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=501860.0, ans=0.0 2023-11-19 01:27:36,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=501860.0, ans=0.0 2023-11-19 01:27:42,159 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3150, loss[loss=0.1038, simple_loss=0.1222, pruned_loss=0.03456, audio_tagging_loss=0.008181, over 14953.00 frames. ], tot_loss[loss=0.09375, simple_loss=0.1105, pruned_loss=0.02736, audio_tagging_loss=0.01116, over 3042931.52 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:27:47,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=501926.6666666667, ans=0.09899494936611666 2023-11-19 01:27:49,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=501926.6666666667, ans=0.1 2023-11-19 01:27:49,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=501926.6666666667, ans=0.0 2023-11-19 01:27:52,694 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=501993.3333333333, ans=0.0 2023-11-19 01:28:03,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=502060.0, ans=0.125 2023-11-19 01:28:21,190 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.61 vs. limit=15.0 2023-11-19 01:28:27,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=502193.3333333333, ans=0.1 2023-11-19 01:28:32,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=502193.3333333333, ans=0.125 2023-11-19 01:28:37,938 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3200, loss[loss=0.1065, simple_loss=0.1213, pruned_loss=0.03499, audio_tagging_loss=0.01089, over 15138.00 frames. ], tot_loss[loss=0.09374, simple_loss=0.1101, pruned_loss=0.02742, audio_tagging_loss=0.01129, over 3036665.75 frames. ], batch size: 57, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:28:53,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.13 vs. limit=6.0 2023-11-19 01:29:08,662 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.725e+01 8.572e+01 9.353e+01 1.015e+02 1.372e+02, threshold=1.871e+02, percent-clipped=0.0 2023-11-19 01:29:10,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=502460.0, ans=0.125 2023-11-19 01:29:25,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.64 vs. limit=22.5 2023-11-19 01:29:33,153 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3250, loss[loss=0.09502, simple_loss=0.1118, pruned_loss=0.02669, audio_tagging_loss=0.0124, over 15795.00 frames. ], tot_loss[loss=0.09411, simple_loss=0.1104, pruned_loss=0.02754, audio_tagging_loss=0.01136, over 3038448.93 frames. ], batch size: 56, lr: 1.01e-02, grad_scale: 32.0 2023-11-19 01:29:36,846 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.25 vs. limit=12.0 2023-11-19 01:29:37,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=502593.3333333333, ans=0.125 2023-11-19 01:29:39,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=502593.3333333333, ans=0.125 2023-11-19 01:29:43,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.65 vs. limit=15.0 2023-11-19 01:29:46,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=502660.0, ans=0.125 2023-11-19 01:30:03,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=502726.6666666667, ans=0.0 2023-11-19 01:30:10,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=502793.3333333333, ans=0.125 2023-11-19 01:30:20,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=502860.0, ans=0.0 2023-11-19 01:30:29,455 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3300, loss[loss=0.07998, simple_loss=0.09414, pruned_loss=0.02101, audio_tagging_loss=0.01191, over 15280.00 frames. ], tot_loss[loss=0.09384, simple_loss=0.1102, pruned_loss=0.02734, audio_tagging_loss=0.01142, over 3038381.21 frames. ], batch size: 56, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:30:44,438 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.09 vs. limit=15.0 2023-11-19 01:31:00,699 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.005e+01 8.481e+01 9.247e+01 1.047e+02 1.658e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-19 01:31:02,040 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:31:21,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=503193.3333333333, ans=0.2 2023-11-19 01:31:26,471 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3350, loss[loss=0.07614, simple_loss=0.09438, pruned_loss=0.0186, audio_tagging_loss=0.01036, over 14816.00 frames. ], tot_loss[loss=0.09302, simple_loss=0.1095, pruned_loss=0.02697, audio_tagging_loss=0.01132, over 3043187.96 frames. ], batch size: 56, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:31:32,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=503260.0, ans=0.0 2023-11-19 01:31:40,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=503326.6666666667, ans=0.02 2023-11-19 01:31:51,963 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.68 vs. limit=6.0 2023-11-19 01:31:54,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=503393.3333333333, ans=0.125 2023-11-19 01:32:21,464 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3400, loss[loss=0.07488, simple_loss=0.08817, pruned_loss=0.01905, audio_tagging_loss=0.01174, over 15037.00 frames. ], tot_loss[loss=0.0932, simple_loss=0.1099, pruned_loss=0.0271, audio_tagging_loss=0.01113, over 3051768.35 frames. ], batch size: 58, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:32:21,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=503593.3333333333, ans=0.1 2023-11-19 01:32:24,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=503593.3333333333, ans=0.0 2023-11-19 01:32:29,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=503593.3333333333, ans=0.125 2023-11-19 01:32:45,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=503726.6666666667, ans=0.1 2023-11-19 01:32:53,221 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.375e+01 8.287e+01 9.053e+01 9.903e+01 1.231e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 01:33:17,100 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3450, loss[loss=0.07889, simple_loss=0.09196, pruned_loss=0.02047, audio_tagging_loss=0.01244, over 15317.00 frames. ], tot_loss[loss=0.09285, simple_loss=0.1094, pruned_loss=0.02712, audio_tagging_loss=0.01104, over 3041775.55 frames. ], batch size: 60, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:33:19,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=503926.6666666667, ans=0.2 2023-11-19 01:33:31,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=503993.3333333333, ans=0.125 2023-11-19 01:33:47,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=504060.0, ans=0.025 2023-11-19 01:33:56,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=504126.6666666667, ans=0.0 2023-11-19 01:34:01,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=504193.3333333333, ans=0.0 2023-11-19 01:34:04,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=504193.3333333333, ans=0.125 2023-11-19 01:34:07,057 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:34:11,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=504193.3333333333, ans=0.0 2023-11-19 01:34:12,172 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.51 vs. limit=10.0 2023-11-19 01:34:13,642 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3500, loss[loss=0.09097, simple_loss=0.1132, pruned_loss=0.02015, audio_tagging_loss=0.0142, over 15599.00 frames. ], tot_loss[loss=0.09382, simple_loss=0.1108, pruned_loss=0.02748, audio_tagging_loss=0.01096, over 3053379.53 frames. ], batch size: 60, lr: 1.00e-02, grad_scale: 16.0 2023-11-19 01:34:22,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=504260.0, ans=0.125 2023-11-19 01:34:23,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=504326.6666666667, ans=0.025 2023-11-19 01:34:38,847 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2023-11-19 01:34:43,112 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:34:45,807 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.641e+01 8.567e+01 9.282e+01 1.040e+02 1.334e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-19 01:34:50,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=504460.0, ans=0.125 2023-11-19 01:34:54,381 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.64 vs. limit=22.5 2023-11-19 01:35:09,367 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3550, loss[loss=0.09404, simple_loss=0.1032, pruned_loss=0.02457, audio_tagging_loss=0.01787, over 15182.00 frames. ], tot_loss[loss=0.09326, simple_loss=0.1102, pruned_loss=0.02726, audio_tagging_loss=0.01092, over 3044810.04 frames. ], batch size: 59, lr: 1.00e-02, grad_scale: 16.0 2023-11-19 01:35:17,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=504593.3333333333, ans=0.09899494936611666 2023-11-19 01:35:22,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=504660.0, ans=0.125 2023-11-19 01:35:22,947 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.55 vs. limit=15.0 2023-11-19 01:35:26,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=504660.0, ans=0.125 2023-11-19 01:35:38,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=504726.6666666667, ans=0.0 2023-11-19 01:35:41,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=504793.3333333333, ans=0.0 2023-11-19 01:35:49,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=504793.3333333333, ans=0.1 2023-11-19 01:36:04,822 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3600, loss[loss=0.08596, simple_loss=0.09756, pruned_loss=0.02334, audio_tagging_loss=0.01384, over 14150.00 frames. ], tot_loss[loss=0.09259, simple_loss=0.1094, pruned_loss=0.02694, audio_tagging_loss=0.01094, over 3036059.44 frames. ], batch size: 53, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:36:07,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=504926.6666666667, ans=0.1 2023-11-19 01:36:22,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=504993.3333333333, ans=0.0 2023-11-19 01:36:36,842 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.969e+01 8.561e+01 9.359e+01 1.025e+02 1.551e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 01:36:51,869 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.12 vs. limit=6.0 2023-11-19 01:36:53,962 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.984e-01 2023-11-19 01:37:00,646 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3650, loss[loss=0.09771, simple_loss=0.1221, pruned_loss=0.02448, audio_tagging_loss=0.01216, over 14770.00 frames. ], tot_loss[loss=0.09308, simple_loss=0.1102, pruned_loss=0.0272, audio_tagging_loss=0.01079, over 3038454.99 frames. ], batch size: 54, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:37:06,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.87 vs. limit=15.0 2023-11-19 01:37:14,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=505326.6666666667, ans=0.0 2023-11-19 01:37:14,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=505326.6666666667, ans=0.0 2023-11-19 01:37:14,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=6.89 vs. limit=15.0 2023-11-19 01:37:31,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=505393.3333333333, ans=0.125 2023-11-19 01:37:45,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=505526.6666666667, ans=0.0 2023-11-19 01:37:55,876 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3700, loss[loss=0.07817, simple_loss=0.08736, pruned_loss=0.0241, audio_tagging_loss=0.01039, over 15523.00 frames. ], tot_loss[loss=0.09411, simple_loss=0.1115, pruned_loss=0.02769, audio_tagging_loss=0.01066, over 3046328.48 frames. ], batch size: 59, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:38:00,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=505593.3333333333, ans=0.1 2023-11-19 01:38:02,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=505593.3333333333, ans=0.125 2023-11-19 01:38:19,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=505726.6666666667, ans=0.1 2023-11-19 01:38:21,852 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.94 vs. limit=15.0 2023-11-19 01:38:28,860 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.449e+01 8.900e+01 9.822e+01 1.122e+02 1.774e+02, threshold=1.964e+02, percent-clipped=0.0 2023-11-19 01:38:30,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=505793.3333333333, ans=0.1 2023-11-19 01:38:48,015 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.75 vs. limit=15.0 2023-11-19 01:38:51,785 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3750, loss[loss=0.1125, simple_loss=0.1283, pruned_loss=0.03872, audio_tagging_loss=0.00962, over 15599.00 frames. ], tot_loss[loss=0.09373, simple_loss=0.111, pruned_loss=0.02754, audio_tagging_loss=0.0107, over 3054393.22 frames. ], batch size: 58, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:39:07,394 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 01:39:19,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=506060.0, ans=0.125 2023-11-19 01:39:30,907 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:39:39,493 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.00 vs. limit=15.0 2023-11-19 01:39:48,460 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3800, loss[loss=0.1184, simple_loss=0.1462, pruned_loss=0.03262, audio_tagging_loss=0.01265, over 15587.00 frames. ], tot_loss[loss=0.09412, simple_loss=0.1111, pruned_loss=0.02763, audio_tagging_loss=0.01095, over 3049818.40 frames. ], batch size: 57, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:40:04,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=506326.6666666667, ans=0.125 2023-11-19 01:40:04,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=506326.6666666667, ans=0.1 2023-11-19 01:40:20,051 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.858e+01 8.899e+01 9.421e+01 1.052e+02 1.490e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-19 01:40:43,420 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3850, loss[loss=0.1013, simple_loss=0.1258, pruned_loss=0.02941, audio_tagging_loss=0.008992, over 14146.00 frames. ], tot_loss[loss=0.09329, simple_loss=0.11, pruned_loss=0.02724, audio_tagging_loss=0.01106, over 3052001.29 frames. ], batch size: 55, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:40:54,942 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-76000.pt 2023-11-19 01:40:58,412 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=506660.0, ans=0.0 2023-11-19 01:41:03,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=506660.0, ans=0.125 2023-11-19 01:41:04,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=506660.0, ans=0.125 2023-11-19 01:41:10,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=506726.6666666667, ans=0.125 2023-11-19 01:41:41,578 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3900, loss[loss=0.084, simple_loss=0.09296, pruned_loss=0.02418, audio_tagging_loss=0.01335, over 14543.00 frames. ], tot_loss[loss=0.09349, simple_loss=0.11, pruned_loss=0.02725, audio_tagging_loss=0.01122, over 3048765.10 frames. ], batch size: 57, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:42:03,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=507060.0, ans=0.125 2023-11-19 01:42:13,963 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.813e+01 8.583e+01 9.293e+01 1.003e+02 1.876e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 01:42:14,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=507126.6666666667, ans=0.125 2023-11-19 01:42:18,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=507126.6666666667, ans=0.0 2023-11-19 01:42:25,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=507193.3333333333, ans=0.1 2023-11-19 01:42:30,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=507193.3333333333, ans=0.0 2023-11-19 01:42:35,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.81 vs. limit=15.0 2023-11-19 01:42:38,335 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 3950, loss[loss=0.06562, simple_loss=0.07753, pruned_loss=0.01546, audio_tagging_loss=0.0114, over 14908.00 frames. ], tot_loss[loss=0.09324, simple_loss=0.1099, pruned_loss=0.02713, audio_tagging_loss=0.01118, over 3045060.82 frames. ], batch size: 57, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:42:51,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=507326.6666666667, ans=0.2 2023-11-19 01:43:11,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.34 vs. limit=22.5 2023-11-19 01:43:33,334 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4000, loss[loss=0.06935, simple_loss=0.07968, pruned_loss=0.01867, audio_tagging_loss=0.01084, over 14897.00 frames. ], tot_loss[loss=0.0937, simple_loss=0.1102, pruned_loss=0.02734, audio_tagging_loss=0.01124, over 3043692.97 frames. ], batch size: 57, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:43:33,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=507593.3333333333, ans=0.0 2023-11-19 01:43:37,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=507593.3333333333, ans=0.1 2023-11-19 01:43:49,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=507660.0, ans=10.0 2023-11-19 01:44:06,446 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.614e+01 9.104e+01 9.882e+01 1.124e+02 1.409e+02, threshold=1.976e+02, percent-clipped=0.0 2023-11-19 01:44:16,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=507793.3333333333, ans=0.0 2023-11-19 01:44:28,701 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4050, loss[loss=0.1329, simple_loss=0.1646, pruned_loss=0.04283, audio_tagging_loss=0.007779, over 15358.00 frames. ], tot_loss[loss=0.09468, simple_loss=0.1113, pruned_loss=0.02781, audio_tagging_loss=0.01121, over 3046374.32 frames. ], batch size: 56, lr: 1.00e-02, grad_scale: 32.0 2023-11-19 01:44:28,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=507926.6666666667, ans=0.0 2023-11-19 01:44:32,463 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:45:05,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.59 vs. limit=15.0 2023-11-19 01:45:07,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=508126.6666666667, ans=0.125 2023-11-19 01:45:24,968 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4100, loss[loss=0.1003, simple_loss=0.1209, pruned_loss=0.02689, audio_tagging_loss=0.01295, over 15785.00 frames. ], tot_loss[loss=0.09438, simple_loss=0.1112, pruned_loss=0.02764, audio_tagging_loss=0.01112, over 3047555.83 frames. ], batch size: 57, lr: 9.99e-03, grad_scale: 32.0 2023-11-19 01:45:30,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=508260.0, ans=0.0 2023-11-19 01:45:36,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=508326.6666666667, ans=0.2 2023-11-19 01:45:48,286 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.91 vs. limit=15.0 2023-11-19 01:45:56,113 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.924e+01 8.454e+01 9.161e+01 9.715e+01 1.284e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 01:46:14,429 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=508526.6666666667, ans=0.0 2023-11-19 01:46:20,588 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4150, loss[loss=0.09266, simple_loss=0.1034, pruned_loss=0.02631, audio_tagging_loss=0.01463, over 16614.00 frames. ], tot_loss[loss=0.09398, simple_loss=0.1107, pruned_loss=0.02749, audio_tagging_loss=0.01114, over 3047977.33 frames. ], batch size: 61, lr: 9.99e-03, grad_scale: 32.0 2023-11-19 01:46:34,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=508660.0, ans=0.0 2023-11-19 01:46:35,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=508660.0, ans=0.125 2023-11-19 01:47:01,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=508793.3333333333, ans=0.125 2023-11-19 01:47:01,995 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:47:02,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=508793.3333333333, ans=0.125 2023-11-19 01:47:09,629 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=508860.0, ans=0.125 2023-11-19 01:47:15,821 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4200, loss[loss=0.135, simple_loss=0.1702, pruned_loss=0.04404, audio_tagging_loss=0.005837, over 15335.00 frames. ], tot_loss[loss=0.0947, simple_loss=0.1118, pruned_loss=0.0278, audio_tagging_loss=0.01099, over 3052666.71 frames. ], batch size: 56, lr: 9.99e-03, grad_scale: 32.0 2023-11-19 01:47:15,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=508926.6666666667, ans=0.1 2023-11-19 01:47:16,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=508926.6666666667, ans=0.2 2023-11-19 01:47:21,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=508926.6666666667, ans=0.125 2023-11-19 01:47:48,448 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.868e+01 9.396e+01 1.025e+02 1.839e+02, threshold=1.879e+02, percent-clipped=1.0 2023-11-19 01:47:58,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=509126.6666666667, ans=0.125 2023-11-19 01:47:58,154 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.654e-01 2023-11-19 01:48:04,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=509193.3333333333, ans=0.0 2023-11-19 01:48:11,886 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4250, loss[loss=0.1234, simple_loss=0.1558, pruned_loss=0.03942, audio_tagging_loss=0.006121, over 15612.00 frames. ], tot_loss[loss=0.09458, simple_loss=0.1119, pruned_loss=0.02773, audio_tagging_loss=0.0109, over 3048851.46 frames. ], batch size: 56, lr: 9.98e-03, grad_scale: 32.0 2023-11-19 01:48:20,585 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=509260.0, ans=0.125 2023-11-19 01:48:21,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=509260.0, ans=0.125 2023-11-19 01:48:36,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=509393.3333333333, ans=0.5 2023-11-19 01:48:38,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=509393.3333333333, ans=0.2 2023-11-19 01:48:47,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=509460.0, ans=0.0 2023-11-19 01:48:57,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=509526.6666666667, ans=0.125 2023-11-19 01:49:06,402 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.29 vs. limit=22.5 2023-11-19 01:49:07,985 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4300, loss[loss=0.1343, simple_loss=0.1601, pruned_loss=0.04587, audio_tagging_loss=0.008401, over 15080.00 frames. ], tot_loss[loss=0.09547, simple_loss=0.1131, pruned_loss=0.02812, audio_tagging_loss=0.01078, over 3048565.63 frames. ], batch size: 59, lr: 9.98e-03, grad_scale: 32.0 2023-11-19 01:49:39,661 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.988e+01 8.902e+01 9.994e+01 1.090e+02 2.369e+02, threshold=1.999e+02, percent-clipped=2.0 2023-11-19 01:49:52,544 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.44 vs. limit=15.0 2023-11-19 01:50:02,569 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4350, loss[loss=0.1088, simple_loss=0.1373, pruned_loss=0.03137, audio_tagging_loss=0.008755, over 15786.00 frames. ], tot_loss[loss=0.09555, simple_loss=0.1137, pruned_loss=0.02815, audio_tagging_loss=0.01056, over 3054248.93 frames. ], batch size: 59, lr: 9.98e-03, grad_scale: 32.0 2023-11-19 01:50:13,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=509993.3333333333, ans=0.2 2023-11-19 01:50:37,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=510126.6666666667, ans=0.125 2023-11-19 01:50:38,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=510126.6666666667, ans=0.0 2023-11-19 01:50:54,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=510193.3333333333, ans=0.125 2023-11-19 01:50:58,222 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4400, loss[loss=0.09258, simple_loss=0.1091, pruned_loss=0.02591, audio_tagging_loss=0.01214, over 16087.00 frames. ], tot_loss[loss=0.09556, simple_loss=0.1137, pruned_loss=0.0281, audio_tagging_loss=0.01059, over 3053817.31 frames. ], batch size: 60, lr: 9.98e-03, grad_scale: 32.0 2023-11-19 01:51:23,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=510393.3333333333, ans=0.125 2023-11-19 01:51:30,396 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.846e+01 8.237e+01 9.053e+01 9.942e+01 1.233e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 01:51:33,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten.whitening_limit, batch_count=510460.0, ans=22.5 2023-11-19 01:51:38,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=510460.0, ans=0.125 2023-11-19 01:51:47,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=510526.6666666667, ans=0.0 2023-11-19 01:51:54,569 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4450, loss[loss=0.09126, simple_loss=0.09039, pruned_loss=0.02913, audio_tagging_loss=0.01695, over 15600.00 frames. ], tot_loss[loss=0.09535, simple_loss=0.113, pruned_loss=0.02809, audio_tagging_loss=0.01078, over 3053864.77 frames. ], batch size: 60, lr: 9.97e-03, grad_scale: 32.0 2023-11-19 01:52:05,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=510660.0, ans=0.125 2023-11-19 01:52:17,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=510726.6666666667, ans=0.0 2023-11-19 01:52:36,377 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.75 vs. limit=22.5 2023-11-19 01:52:37,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.67 vs. limit=6.0 2023-11-19 01:52:40,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=510860.0, ans=0.0 2023-11-19 01:52:44,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=510860.0, ans=0.125 2023-11-19 01:52:49,743 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4500, loss[loss=0.08928, simple_loss=0.1072, pruned_loss=0.0224, audio_tagging_loss=0.01328, over 16295.00 frames. ], tot_loss[loss=0.09512, simple_loss=0.1128, pruned_loss=0.02803, audio_tagging_loss=0.01071, over 3052948.70 frames. ], batch size: 61, lr: 9.97e-03, grad_scale: 32.0 2023-11-19 01:53:13,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.65 vs. limit=6.0 2023-11-19 01:53:22,563 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 9.019e+01 9.835e+01 1.060e+02 1.349e+02, threshold=1.967e+02, percent-clipped=0.0 2023-11-19 01:53:26,968 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=511126.6666666667, ans=0.125 2023-11-19 01:53:31,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=511126.6666666667, ans=0.125 2023-11-19 01:53:37,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=511193.3333333333, ans=0.2 2023-11-19 01:53:45,533 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4550, loss[loss=0.08807, simple_loss=0.1009, pruned_loss=0.02532, audio_tagging_loss=0.0123, over 15225.00 frames. ], tot_loss[loss=0.09462, simple_loss=0.1123, pruned_loss=0.02783, audio_tagging_loss=0.01066, over 3050603.02 frames. ], batch size: 59, lr: 9.97e-03, grad_scale: 32.0 2023-11-19 01:53:45,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=511260.0, ans=0.0 2023-11-19 01:54:05,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.66 vs. limit=22.5 2023-11-19 01:54:29,216 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 01:54:35,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=511526.6666666667, ans=0.0 2023-11-19 01:54:41,988 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4600, loss[loss=0.1073, simple_loss=0.1345, pruned_loss=0.03193, audio_tagging_loss=0.00811, over 15740.00 frames. ], tot_loss[loss=0.09406, simple_loss=0.1114, pruned_loss=0.02752, audio_tagging_loss=0.01082, over 3047703.03 frames. ], batch size: 56, lr: 9.96e-03, grad_scale: 32.0 2023-11-19 01:54:42,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=511593.3333333333, ans=0.125 2023-11-19 01:54:47,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=511593.3333333333, ans=0.0 2023-11-19 01:54:51,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=511593.3333333333, ans=0.1 2023-11-19 01:55:04,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=511726.6666666667, ans=0.125 2023-11-19 01:55:14,085 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.023e+01 8.730e+01 9.456e+01 1.065e+02 1.421e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-19 01:55:16,410 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.27 vs. limit=15.0 2023-11-19 01:55:18,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=511793.3333333333, ans=0.1 2023-11-19 01:55:29,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=511860.0, ans=0.125 2023-11-19 01:55:35,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=511860.0, ans=0.125 2023-11-19 01:55:37,975 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4650, loss[loss=0.07132, simple_loss=0.0827, pruned_loss=0.01669, audio_tagging_loss=0.01328, over 15675.00 frames. ], tot_loss[loss=0.09316, simple_loss=0.1098, pruned_loss=0.02721, audio_tagging_loss=0.01106, over 3039293.01 frames. ], batch size: 59, lr: 9.96e-03, grad_scale: 32.0 2023-11-19 01:55:40,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=511926.6666666667, ans=0.125 2023-11-19 01:55:42,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten.whitening_limit, batch_count=511926.6666666667, ans=22.5 2023-11-19 01:55:48,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=511993.3333333333, ans=0.5 2023-11-19 01:55:52,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=511993.3333333333, ans=0.125 2023-11-19 01:56:03,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=512060.0, ans=0.0 2023-11-19 01:56:04,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=512060.0, ans=0.0 2023-11-19 01:56:07,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=512060.0, ans=0.125 2023-11-19 01:56:10,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=512126.6666666667, ans=0.125 2023-11-19 01:56:15,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=512126.6666666667, ans=0.1 2023-11-19 01:56:22,010 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.63 vs. limit=10.0 2023-11-19 01:56:22,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=512193.3333333333, ans=0.1 2023-11-19 01:56:33,524 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4700, loss[loss=0.09364, simple_loss=0.09763, pruned_loss=0.02858, audio_tagging_loss=0.01625, over 15441.00 frames. ], tot_loss[loss=0.09347, simple_loss=0.1104, pruned_loss=0.02723, audio_tagging_loss=0.01107, over 3038466.51 frames. ], batch size: 58, lr: 9.96e-03, grad_scale: 32.0 2023-11-19 01:57:05,668 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.776e+01 8.634e+01 9.306e+01 1.008e+02 1.470e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-19 01:57:11,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=512460.0, ans=0.125 2023-11-19 01:57:26,825 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.06 vs. limit=15.0 2023-11-19 01:57:29,464 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4750, loss[loss=0.08041, simple_loss=0.101, pruned_loss=0.01957, audio_tagging_loss=0.01034, over 15849.00 frames. ], tot_loss[loss=0.09314, simple_loss=0.11, pruned_loss=0.02701, audio_tagging_loss=0.01115, over 3036127.93 frames. ], batch size: 59, lr: 9.95e-03, grad_scale: 32.0 2023-11-19 01:57:39,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=512660.0, ans=0.125 2023-11-19 01:58:08,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=512793.3333333333, ans=0.125 2023-11-19 01:58:25,660 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4800, loss[loss=0.1049, simple_loss=0.1298, pruned_loss=0.03081, audio_tagging_loss=0.009247, over 16030.00 frames. ], tot_loss[loss=0.09332, simple_loss=0.1098, pruned_loss=0.02713, audio_tagging_loss=0.01132, over 3044167.69 frames. ], batch size: 59, lr: 9.95e-03, grad_scale: 32.0 2023-11-19 01:58:28,250 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.50 vs. limit=6.0 2023-11-19 01:58:57,703 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.229e+01 8.526e+01 9.167e+01 1.022e+02 1.486e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-19 01:59:03,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=513126.6666666667, ans=0.125 2023-11-19 01:59:03,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=513126.6666666667, ans=0.125 2023-11-19 01:59:08,369 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.21 vs. limit=10.0 2023-11-19 01:59:18,831 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=10.71 vs. limit=15.0 2023-11-19 01:59:20,426 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4850, loss[loss=0.09973, simple_loss=0.123, pruned_loss=0.02788, audio_tagging_loss=0.01036, over 15717.00 frames. ], tot_loss[loss=0.09264, simple_loss=0.1091, pruned_loss=0.02667, audio_tagging_loss=0.01143, over 3042987.74 frames. ], batch size: 59, lr: 9.95e-03, grad_scale: 32.0 2023-11-19 01:59:27,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.min_positive, batch_count=513260.0, ans=0.05 2023-11-19 01:59:29,124 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.16 vs. limit=15.0 2023-11-19 02:00:17,674 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4900, loss[loss=0.1038, simple_loss=0.1291, pruned_loss=0.02862, audio_tagging_loss=0.01064, over 15762.00 frames. ], tot_loss[loss=0.09379, simple_loss=0.1108, pruned_loss=0.02715, audio_tagging_loss=0.01121, over 3041440.71 frames. ], batch size: 58, lr: 9.94e-03, grad_scale: 32.0 2023-11-19 02:00:29,935 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.00 vs. limit=15.0 2023-11-19 02:00:34,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=513660.0, ans=0.1 2023-11-19 02:00:39,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.13 vs. limit=15.0 2023-11-19 02:00:39,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=513726.6666666667, ans=0.125 2023-11-19 02:00:49,353 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.320e+01 8.524e+01 9.139e+01 9.781e+01 1.634e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 02:00:55,285 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=513793.3333333333, ans=0.125 2023-11-19 02:01:12,689 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 4950, loss[loss=0.08906, simple_loss=0.1094, pruned_loss=0.02716, audio_tagging_loss=0.007179, over 14895.00 frames. ], tot_loss[loss=0.09459, simple_loss=0.1121, pruned_loss=0.02763, audio_tagging_loss=0.01092, over 3045193.71 frames. ], batch size: 55, lr: 9.94e-03, grad_scale: 32.0 2023-11-19 02:01:40,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=514060.0, ans=0.0 2023-11-19 02:01:47,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=514126.6666666667, ans=0.125 2023-11-19 02:02:08,155 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5000, loss[loss=0.1112, simple_loss=0.131, pruned_loss=0.03796, audio_tagging_loss=0.007809, over 14966.00 frames. ], tot_loss[loss=0.09366, simple_loss=0.1113, pruned_loss=0.02719, audio_tagging_loss=0.01082, over 3045424.17 frames. ], batch size: 56, lr: 9.94e-03, grad_scale: 32.0 2023-11-19 02:02:17,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=514260.0, ans=0.0 2023-11-19 02:02:19,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=514326.6666666667, ans=0.125 2023-11-19 02:02:23,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=514326.6666666667, ans=0.035 2023-11-19 02:02:24,697 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=514326.6666666667, ans=0.0 2023-11-19 02:02:29,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=514393.3333333333, ans=0.125 2023-11-19 02:02:31,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=514393.3333333333, ans=0.0 2023-11-19 02:02:36,212 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:02:40,225 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.030e+01 8.705e+01 9.523e+01 1.061e+02 1.468e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-19 02:02:50,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=514460.0, ans=0.0 2023-11-19 02:02:50,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=514460.0, ans=0.125 2023-11-19 02:02:57,653 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.26 vs. limit=15.0 2023-11-19 02:03:03,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=514593.3333333333, ans=0.07 2023-11-19 02:03:04,442 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5050, loss[loss=0.0896, simple_loss=0.1121, pruned_loss=0.02235, audio_tagging_loss=0.01118, over 14808.00 frames. ], tot_loss[loss=0.09363, simple_loss=0.1116, pruned_loss=0.02715, audio_tagging_loss=0.0107, over 3043850.34 frames. ], batch size: 55, lr: 9.93e-03, grad_scale: 32.0 2023-11-19 02:03:06,109 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.68 vs. limit=22.5 2023-11-19 02:03:11,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=514593.3333333333, ans=0.125 2023-11-19 02:03:13,484 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.12 vs. limit=15.0 2023-11-19 02:03:24,987 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.30 vs. limit=15.0 2023-11-19 02:03:54,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=514860.0, ans=0.0 2023-11-19 02:03:59,689 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5100, loss[loss=0.09223, simple_loss=0.1035, pruned_loss=0.02983, audio_tagging_loss=0.01063, over 15981.00 frames. ], tot_loss[loss=0.0934, simple_loss=0.1111, pruned_loss=0.02715, audio_tagging_loss=0.01071, over 3047521.57 frames. ], batch size: 61, lr: 9.93e-03, grad_scale: 32.0 2023-11-19 02:04:02,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=514926.6666666667, ans=0.125 2023-11-19 02:04:02,201 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=13.69 vs. limit=22.5 2023-11-19 02:04:08,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.81 vs. limit=15.0 2023-11-19 02:04:15,560 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.09 vs. limit=15.0 2023-11-19 02:04:32,426 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.830e+01 8.394e+01 9.066e+01 1.016e+02 2.426e+02, threshold=1.813e+02, percent-clipped=1.0 2023-11-19 02:04:54,523 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5150, loss[loss=0.1042, simple_loss=0.1157, pruned_loss=0.03198, audio_tagging_loss=0.01443, over 15558.00 frames. ], tot_loss[loss=0.09254, simple_loss=0.1098, pruned_loss=0.02683, audio_tagging_loss=0.01082, over 3038929.18 frames. ], batch size: 58, lr: 9.93e-03, grad_scale: 32.0 2023-11-19 02:04:56,162 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.64 vs. limit=15.0 2023-11-19 02:04:57,773 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2023-11-19 02:04:59,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=515260.0, ans=0.125 2023-11-19 02:05:10,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=515326.6666666667, ans=0.125 2023-11-19 02:05:10,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=515326.6666666667, ans=0.125 2023-11-19 02:05:25,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=515393.3333333333, ans=0.125 2023-11-19 02:05:34,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=515460.0, ans=0.125 2023-11-19 02:05:41,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=515526.6666666667, ans=0.0 2023-11-19 02:05:41,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.10 vs. limit=10.0 2023-11-19 02:05:44,887 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.20 vs. limit=15.0 2023-11-19 02:05:51,345 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5200, loss[loss=0.06216, simple_loss=0.06902, pruned_loss=0.01779, audio_tagging_loss=0.009855, over 14543.00 frames. ], tot_loss[loss=0.09402, simple_loss=0.1116, pruned_loss=0.02744, audio_tagging_loss=0.01076, over 3041814.93 frames. ], batch size: 56, lr: 9.92e-03, grad_scale: 32.0 2023-11-19 02:05:54,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=515593.3333333333, ans=0.125 2023-11-19 02:06:01,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=515660.0, ans=0.125 2023-11-19 02:06:02,960 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.03 vs. limit=15.0 2023-11-19 02:06:21,952 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.12 vs. limit=15.0 2023-11-19 02:06:22,517 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.952e+01 8.557e+01 9.238e+01 1.015e+02 1.542e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 02:06:24,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=515793.3333333333, ans=15.0 2023-11-19 02:06:38,231 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.34 vs. limit=6.0 2023-11-19 02:06:38,692 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=515860.0, ans=0.025 2023-11-19 02:06:46,878 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5250, loss[loss=0.1072, simple_loss=0.128, pruned_loss=0.03124, audio_tagging_loss=0.01193, over 14903.00 frames. ], tot_loss[loss=0.09343, simple_loss=0.1108, pruned_loss=0.02736, audio_tagging_loss=0.01069, over 3032950.26 frames. ], batch size: 56, lr: 9.92e-03, grad_scale: 32.0 2023-11-19 02:06:59,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=515993.3333333333, ans=0.125 2023-11-19 02:07:08,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=516060.0, ans=0.2 2023-11-19 02:07:18,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=516060.0, ans=0.0 2023-11-19 02:07:26,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=516126.6666666667, ans=0.1 2023-11-19 02:07:38,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=516193.3333333333, ans=0.0 2023-11-19 02:07:41,873 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5300, loss[loss=0.1196, simple_loss=0.1435, pruned_loss=0.03966, audio_tagging_loss=0.008249, over 14673.00 frames. ], tot_loss[loss=0.09328, simple_loss=0.1107, pruned_loss=0.02724, audio_tagging_loss=0.01068, over 3038689.50 frames. ], batch size: 55, lr: 9.92e-03, grad_scale: 32.0 2023-11-19 02:07:42,483 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.74 vs. limit=12.0 2023-11-19 02:07:45,271 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=516260.0, ans=0.125 2023-11-19 02:08:13,693 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=516393.3333333333, ans=0.125 2023-11-19 02:08:14,519 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.354e+01 8.731e+01 9.566e+01 1.032e+02 1.487e+02, threshold=1.913e+02, percent-clipped=0.0 2023-11-19 02:08:25,311 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:08:37,653 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5350, loss[loss=0.09921, simple_loss=0.1096, pruned_loss=0.03032, audio_tagging_loss=0.01412, over 14618.00 frames. ], tot_loss[loss=0.09325, simple_loss=0.1108, pruned_loss=0.02714, audio_tagging_loss=0.01073, over 3037916.98 frames. ], batch size: 54, lr: 9.91e-03, grad_scale: 32.0 2023-11-19 02:08:41,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=516593.3333333333, ans=0.2 2023-11-19 02:08:43,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.whiten.whitening_limit, batch_count=516593.3333333333, ans=12.0 2023-11-19 02:08:43,801 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.41 vs. limit=5.0 2023-11-19 02:08:48,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=516660.0, ans=0.5 2023-11-19 02:08:57,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=516660.0, ans=0.0 2023-11-19 02:09:04,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=516726.6666666667, ans=0.1 2023-11-19 02:09:12,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=516793.3333333333, ans=0.0 2023-11-19 02:09:14,681 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.84 vs. limit=15.0 2023-11-19 02:09:30,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=516860.0, ans=0.125 2023-11-19 02:09:33,606 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5400, loss[loss=0.1103, simple_loss=0.124, pruned_loss=0.03645, audio_tagging_loss=0.01189, over 15155.00 frames. ], tot_loss[loss=0.0942, simple_loss=0.1124, pruned_loss=0.02732, audio_tagging_loss=0.01071, over 3037871.15 frames. ], batch size: 57, lr: 9.91e-03, grad_scale: 32.0 2023-11-19 02:09:33,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=516926.6666666667, ans=0.125 2023-11-19 02:09:56,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=517060.0, ans=0.125 2023-11-19 02:10:03,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=517060.0, ans=0.035 2023-11-19 02:10:03,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=517060.0, ans=0.2 2023-11-19 02:10:05,678 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.972e+01 8.660e+01 9.823e+01 1.115e+02 1.582e+02, threshold=1.965e+02, percent-clipped=0.0 2023-11-19 02:10:11,973 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.19 vs. limit=15.0 2023-11-19 02:10:16,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=517193.3333333333, ans=0.2 2023-11-19 02:10:18,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=517193.3333333333, ans=0.125 2023-11-19 02:10:28,359 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5450, loss[loss=0.08986, simple_loss=0.1054, pruned_loss=0.02645, audio_tagging_loss=0.01071, over 15468.00 frames. ], tot_loss[loss=0.0945, simple_loss=0.1121, pruned_loss=0.02762, audio_tagging_loss=0.01084, over 3030311.46 frames. ], batch size: 59, lr: 9.91e-03, grad_scale: 32.0 2023-11-19 02:10:50,574 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=12.93 vs. limit=15.0 2023-11-19 02:10:54,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=517393.3333333333, ans=0.0 2023-11-19 02:11:09,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.21 vs. limit=22.5 2023-11-19 02:11:14,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=517526.6666666667, ans=0.09899494936611666 2023-11-19 02:11:24,125 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5500, loss[loss=0.1227, simple_loss=0.1499, pruned_loss=0.03939, audio_tagging_loss=0.008394, over 15127.00 frames. ], tot_loss[loss=0.09458, simple_loss=0.1124, pruned_loss=0.02763, audio_tagging_loss=0.01074, over 3032513.35 frames. ], batch size: 57, lr: 9.90e-03, grad_scale: 64.0 2023-11-19 02:11:25,714 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=8.49 vs. limit=12.0 2023-11-19 02:11:28,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=517593.3333333333, ans=0.125 2023-11-19 02:11:34,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=517660.0, ans=0.125 2023-11-19 02:11:36,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=517660.0, ans=0.125 2023-11-19 02:11:37,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=517660.0, ans=0.0 2023-11-19 02:11:43,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=517660.0, ans=0.125 2023-11-19 02:11:45,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=517726.6666666667, ans=0.125 2023-11-19 02:11:55,915 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.797e+01 8.498e+01 9.493e+01 1.061e+02 1.375e+02, threshold=1.899e+02, percent-clipped=0.0 2023-11-19 02:12:08,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=517860.0, ans=0.0 2023-11-19 02:12:20,008 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5550, loss[loss=0.1093, simple_loss=0.1237, pruned_loss=0.03313, audio_tagging_loss=0.01431, over 14551.00 frames. ], tot_loss[loss=0.09442, simple_loss=0.1121, pruned_loss=0.02743, audio_tagging_loss=0.01093, over 3040897.50 frames. ], batch size: 53, lr: 9.90e-03, grad_scale: 64.0 2023-11-19 02:12:31,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=517993.3333333333, ans=0.05 2023-11-19 02:13:15,069 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5600, loss[loss=0.1198, simple_loss=0.1417, pruned_loss=0.04114, audio_tagging_loss=0.007866, over 16226.00 frames. ], tot_loss[loss=0.09429, simple_loss=0.1118, pruned_loss=0.02734, audio_tagging_loss=0.01104, over 3044448.41 frames. ], batch size: 60, lr: 9.90e-03, grad_scale: 64.0 2023-11-19 02:13:15,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=518260.0, ans=0.1 2023-11-19 02:13:27,218 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.72 vs. limit=15.0 2023-11-19 02:13:37,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=518393.3333333333, ans=0.1 2023-11-19 02:13:37,508 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=14.11 vs. limit=15.0 2023-11-19 02:13:46,977 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.885e+01 8.345e+01 9.240e+01 1.027e+02 1.400e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 02:13:55,462 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 02:14:00,244 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.55 vs. limit=15.0 2023-11-19 02:14:00,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=518526.6666666667, ans=0.125 2023-11-19 02:14:07,310 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=518526.6666666667, ans=0.125 2023-11-19 02:14:09,782 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5650, loss[loss=0.08534, simple_loss=0.1101, pruned_loss=0.01916, audio_tagging_loss=0.01111, over 16488.00 frames. ], tot_loss[loss=0.09443, simple_loss=0.1121, pruned_loss=0.02727, audio_tagging_loss=0.01113, over 3053563.74 frames. ], batch size: 61, lr: 9.90e-03, grad_scale: 64.0 2023-11-19 02:14:13,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=518593.3333333333, ans=0.125 2023-11-19 02:14:34,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=518726.6666666667, ans=0.1 2023-11-19 02:14:35,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=518726.6666666667, ans=0.125 2023-11-19 02:15:02,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=518860.0, ans=0.0 2023-11-19 02:15:02,909 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.02 vs. limit=15.0 2023-11-19 02:15:06,280 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5700, loss[loss=0.07749, simple_loss=0.0887, pruned_loss=0.0225, audio_tagging_loss=0.01064, over 15062.00 frames. ], tot_loss[loss=0.09304, simple_loss=0.1105, pruned_loss=0.02675, audio_tagging_loss=0.01103, over 3049758.95 frames. ], batch size: 56, lr: 9.89e-03, grad_scale: 64.0 2023-11-19 02:15:13,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=518926.6666666667, ans=0.125 2023-11-19 02:15:15,441 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2023-11-19 02:15:22,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=518993.3333333333, ans=0.125 2023-11-19 02:15:38,531 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.416e+01 9.360e+01 1.085e+02 2.101e+02, threshold=1.872e+02, percent-clipped=1.0 2023-11-19 02:15:43,079 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:15:45,785 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=7.00 vs. limit=8.0 2023-11-19 02:15:58,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=519193.3333333333, ans=0.0 2023-11-19 02:16:01,542 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5750, loss[loss=0.07876, simple_loss=0.09575, pruned_loss=0.01873, audio_tagging_loss=0.01215, over 15046.00 frames. ], tot_loss[loss=0.09326, simple_loss=0.111, pruned_loss=0.0269, audio_tagging_loss=0.01087, over 3059623.39 frames. ], batch size: 57, lr: 9.89e-03, grad_scale: 32.0 2023-11-19 02:16:11,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=519326.6666666667, ans=0.0 2023-11-19 02:16:20,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=519326.6666666667, ans=0.125 2023-11-19 02:16:21,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=519326.6666666667, ans=0.125 2023-11-19 02:16:45,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.64 vs. limit=15.0 2023-11-19 02:16:52,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=519526.6666666667, ans=0.95 2023-11-19 02:16:56,607 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5800, loss[loss=0.1116, simple_loss=0.1295, pruned_loss=0.03507, audio_tagging_loss=0.01178, over 14842.00 frames. ], tot_loss[loss=0.0928, simple_loss=0.1102, pruned_loss=0.02686, audio_tagging_loss=0.01082, over 3056617.47 frames. ], batch size: 55, lr: 9.89e-03, grad_scale: 16.0 2023-11-19 02:16:56,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=519593.3333333333, ans=0.125 2023-11-19 02:17:10,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=519660.0, ans=0.0 2023-11-19 02:17:20,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=519726.6666666667, ans=0.125 2023-11-19 02:17:31,533 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.960e+01 8.540e+01 9.536e+01 1.116e+02 2.278e+02, threshold=1.907e+02, percent-clipped=1.0 2023-11-19 02:17:39,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=519793.3333333333, ans=0.0 2023-11-19 02:17:53,441 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5850, loss[loss=0.1054, simple_loss=0.1258, pruned_loss=0.03342, audio_tagging_loss=0.009031, over 14687.00 frames. ], tot_loss[loss=0.0926, simple_loss=0.1104, pruned_loss=0.02669, audio_tagging_loss=0.01071, over 3055076.97 frames. ], batch size: 55, lr: 9.88e-03, grad_scale: 16.0 2023-11-19 02:18:01,702 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:18:02,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=519926.6666666667, ans=0.125 2023-11-19 02:18:03,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=519993.3333333333, ans=0.125 2023-11-19 02:18:18,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=520060.0, ans=0.0 2023-11-19 02:18:25,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=520126.6666666667, ans=0.125 2023-11-19 02:18:26,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.89 vs. limit=15.0 2023-11-19 02:18:49,463 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5900, loss[loss=0.0991, simple_loss=0.1131, pruned_loss=0.03289, audio_tagging_loss=0.009652, over 14896.00 frames. ], tot_loss[loss=0.09299, simple_loss=0.1106, pruned_loss=0.02702, audio_tagging_loss=0.01066, over 3052938.02 frames. ], batch size: 55, lr: 9.88e-03, grad_scale: 16.0 2023-11-19 02:18:50,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=520260.0, ans=0.1 2023-11-19 02:18:54,398 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.70 vs. limit=15.0 2023-11-19 02:19:16,739 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:19:17,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=520393.3333333333, ans=0.125 2023-11-19 02:19:23,929 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.519e+01 8.828e+01 9.582e+01 1.059e+02 2.362e+02, threshold=1.916e+02, percent-clipped=1.0 2023-11-19 02:19:44,675 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 5950, loss[loss=0.1003, simple_loss=0.1257, pruned_loss=0.03039, audio_tagging_loss=0.007057, over 13535.00 frames. ], tot_loss[loss=0.09355, simple_loss=0.1114, pruned_loss=0.02719, audio_tagging_loss=0.01065, over 3051464.79 frames. ], batch size: 52, lr: 9.88e-03, grad_scale: 16.0 2023-11-19 02:19:50,776 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:19:52,964 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.59 vs. limit=6.0 2023-11-19 02:20:00,596 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.82 vs. limit=22.5 2023-11-19 02:20:03,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=520660.0, ans=0.1 2023-11-19 02:20:04,781 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2023-11-19 02:20:08,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=520726.6666666667, ans=0.125 2023-11-19 02:20:38,599 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.29 vs. limit=12.0 2023-11-19 02:20:40,636 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6000, loss[loss=0.09684, simple_loss=0.1269, pruned_loss=0.02629, audio_tagging_loss=0.007116, over 15681.00 frames. ], tot_loss[loss=0.09321, simple_loss=0.1108, pruned_loss=0.02714, audio_tagging_loss=0.01065, over 3043032.77 frames. ], batch size: 56, lr: 9.87e-03, grad_scale: 32.0 2023-11-19 02:20:40,638 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 02:21:13,035 INFO [train_asr.py:1147] (0/4) Epoch 7, validation: loss=0.06924, simple_loss=0.05776, pruned_loss=0.007549, audio_tagging_loss=0.0328, over 4681554.00 frames. 2023-11-19 02:21:13,035 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 02:21:34,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=521060.0, ans=0.125 2023-11-19 02:21:36,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=521060.0, ans=0.125 2023-11-19 02:21:47,397 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.133e+01 8.768e+01 9.511e+01 1.039e+02 1.786e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 02:21:47,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=521126.6666666667, ans=0.015 2023-11-19 02:21:55,314 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 02:22:02,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=521193.3333333333, ans=0.125 2023-11-19 02:22:07,963 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6050, loss[loss=0.08316, simple_loss=0.1018, pruned_loss=0.02094, audio_tagging_loss=0.01133, over 15591.00 frames. ], tot_loss[loss=0.09253, simple_loss=0.11, pruned_loss=0.02681, audio_tagging_loss=0.01069, over 3045155.99 frames. ], batch size: 58, lr: 9.87e-03, grad_scale: 32.0 2023-11-19 02:22:37,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=521393.3333333333, ans=0.0 2023-11-19 02:22:49,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=521460.0, ans=0.0 2023-11-19 02:22:49,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=521460.0, ans=0.0 2023-11-19 02:22:51,328 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.01 vs. limit=6.0 2023-11-19 02:23:03,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=521526.6666666667, ans=0.2 2023-11-19 02:23:05,056 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6100, loss[loss=0.08368, simple_loss=0.1015, pruned_loss=0.0248, audio_tagging_loss=0.008111, over 14835.00 frames. ], tot_loss[loss=0.09215, simple_loss=0.1093, pruned_loss=0.02669, audio_tagging_loss=0.01079, over 3054212.49 frames. ], batch size: 59, lr: 9.87e-03, grad_scale: 32.0 2023-11-19 02:23:08,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=521593.3333333333, ans=0.125 2023-11-19 02:23:33,370 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.63 vs. limit=10.0 2023-11-19 02:23:38,649 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.659e+01 8.713e+01 9.332e+01 1.071e+02 1.394e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-19 02:23:41,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=521793.3333333333, ans=0.0 2023-11-19 02:23:44,399 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.29 vs. limit=12.0 2023-11-19 02:23:46,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=521793.3333333333, ans=0.0 2023-11-19 02:23:52,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=521860.0, ans=0.0 2023-11-19 02:23:52,786 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.49 vs. limit=22.5 2023-11-19 02:23:59,652 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6150, loss[loss=0.08838, simple_loss=0.1147, pruned_loss=0.01821, audio_tagging_loss=0.01283, over 14984.00 frames. ], tot_loss[loss=0.09241, simple_loss=0.1095, pruned_loss=0.02681, audio_tagging_loss=0.01087, over 3058235.27 frames. ], batch size: 57, lr: 9.86e-03, grad_scale: 32.0 2023-11-19 02:24:13,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=521993.3333333333, ans=0.1 2023-11-19 02:24:32,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=522126.6666666667, ans=0.125 2023-11-19 02:24:35,950 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.44 vs. limit=15.0 2023-11-19 02:24:37,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=522126.6666666667, ans=0.125 2023-11-19 02:24:43,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=522193.3333333333, ans=0.0 2023-11-19 02:24:47,007 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.33 vs. limit=15.0 2023-11-19 02:24:47,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=522193.3333333333, ans=0.125 2023-11-19 02:24:50,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=522193.3333333333, ans=0.125 2023-11-19 02:24:53,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=522260.0, ans=0.025 2023-11-19 02:24:54,749 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6200, loss[loss=0.1083, simple_loss=0.1343, pruned_loss=0.03253, audio_tagging_loss=0.008601, over 14811.00 frames. ], tot_loss[loss=0.09187, simple_loss=0.1085, pruned_loss=0.02669, audio_tagging_loss=0.01094, over 3057725.53 frames. ], batch size: 54, lr: 9.86e-03, grad_scale: 32.0 2023-11-19 02:25:00,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=522260.0, ans=0.1 2023-11-19 02:25:03,908 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=522260.0, ans=0.0 2023-11-19 02:25:06,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2.whitening_limit, batch_count=522326.6666666667, ans=15.0 2023-11-19 02:25:08,737 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:25:21,563 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.90 vs. limit=22.5 2023-11-19 02:25:28,327 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.684e+01 8.853e+01 9.668e+01 1.071e+02 1.653e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-19 02:25:50,580 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6250, loss[loss=0.09056, simple_loss=0.0984, pruned_loss=0.03074, audio_tagging_loss=0.01062, over 16633.00 frames. ], tot_loss[loss=0.09142, simple_loss=0.1078, pruned_loss=0.02646, audio_tagging_loss=0.01105, over 3060507.30 frames. ], batch size: 68, lr: 9.86e-03, grad_scale: 32.0 2023-11-19 02:25:51,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=522593.3333333333, ans=0.125 2023-11-19 02:25:56,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=522593.3333333333, ans=0.125 2023-11-19 02:25:59,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=522593.3333333333, ans=0.125 2023-11-19 02:26:03,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=522660.0, ans=0.0 2023-11-19 02:26:20,148 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=522726.6666666667, ans=0.125 2023-11-19 02:26:21,256 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=522726.6666666667, ans=0.0 2023-11-19 02:26:40,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=522860.0, ans=0.04949747468305833 2023-11-19 02:26:45,720 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6300, loss[loss=0.08455, simple_loss=0.09454, pruned_loss=0.02401, audio_tagging_loss=0.01327, over 14736.00 frames. ], tot_loss[loss=0.0931, simple_loss=0.1096, pruned_loss=0.02705, audio_tagging_loss=0.01123, over 3061896.43 frames. ], batch size: 57, lr: 9.85e-03, grad_scale: 32.0 2023-11-19 02:27:00,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.44 vs. limit=22.5 2023-11-19 02:27:22,023 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.961e+01 8.911e+01 9.683e+01 1.073e+02 1.509e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-19 02:27:36,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=523193.3333333333, ans=0.125 2023-11-19 02:27:36,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.01 vs. limit=10.0 2023-11-19 02:27:41,331 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.19 vs. limit=15.0 2023-11-19 02:27:41,852 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6350, loss[loss=0.08655, simple_loss=0.1078, pruned_loss=0.02267, audio_tagging_loss=0.009975, over 14043.00 frames. ], tot_loss[loss=0.0926, simple_loss=0.1091, pruned_loss=0.02677, audio_tagging_loss=0.01129, over 3053812.20 frames. ], batch size: 52, lr: 9.85e-03, grad_scale: 16.0 2023-11-19 02:27:42,126 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:27:43,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff2.min_abs, batch_count=523260.0, ans=0.1 2023-11-19 02:27:48,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=523260.0, ans=0.125 2023-11-19 02:27:53,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=523326.6666666667, ans=0.05 2023-11-19 02:28:00,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=523326.6666666667, ans=0.0 2023-11-19 02:28:02,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=523326.6666666667, ans=0.1 2023-11-19 02:28:02,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=523326.6666666667, ans=0.125 2023-11-19 02:28:11,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=523393.3333333333, ans=0.125 2023-11-19 02:28:37,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=523593.3333333333, ans=0.0 2023-11-19 02:28:38,398 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6400, loss[loss=0.09353, simple_loss=0.1198, pruned_loss=0.02368, audio_tagging_loss=0.009979, over 14794.00 frames. ], tot_loss[loss=0.09312, simple_loss=0.1097, pruned_loss=0.02699, audio_tagging_loss=0.01128, over 3045269.19 frames. ], batch size: 57, lr: 9.85e-03, grad_scale: 32.0 2023-11-19 02:28:39,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.00 vs. limit=15.0 2023-11-19 02:28:42,210 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.04 vs. limit=15.0 2023-11-19 02:29:05,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=523726.6666666667, ans=0.125 2023-11-19 02:29:13,030 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.312e+01 8.426e+01 9.075e+01 1.046e+02 1.342e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-19 02:29:16,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=523793.3333333333, ans=0.125 2023-11-19 02:29:25,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=523860.0, ans=0.0 2023-11-19 02:29:31,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=523860.0, ans=0.0 2023-11-19 02:29:33,091 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6450, loss[loss=0.1204, simple_loss=0.1352, pruned_loss=0.04372, audio_tagging_loss=0.009051, over 14585.00 frames. ], tot_loss[loss=0.09303, simple_loss=0.1097, pruned_loss=0.0269, audio_tagging_loss=0.01128, over 3052491.55 frames. ], batch size: 54, lr: 9.85e-03, grad_scale: 32.0 2023-11-19 02:29:50,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=523993.3333333333, ans=0.125 2023-11-19 02:29:56,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.57 vs. limit=10.0 2023-11-19 02:30:01,203 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.29 vs. limit=22.5 2023-11-19 02:30:22,759 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.76 vs. limit=15.0 2023-11-19 02:30:23,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=524193.3333333333, ans=0.125 2023-11-19 02:30:28,062 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.34 vs. limit=12.0 2023-11-19 02:30:28,458 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6500, loss[loss=0.09822, simple_loss=0.1108, pruned_loss=0.03122, audio_tagging_loss=0.01159, over 15722.00 frames. ], tot_loss[loss=0.09313, simple_loss=0.1096, pruned_loss=0.02716, audio_tagging_loss=0.01117, over 3051448.16 frames. ], batch size: 58, lr: 9.84e-03, grad_scale: 32.0 2023-11-19 02:31:04,419 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.918e+01 8.655e+01 9.206e+01 1.007e+02 1.269e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 02:31:07,195 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.57 vs. limit=12.0 2023-11-19 02:31:07,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=524460.0, ans=0.07 2023-11-19 02:31:13,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=524526.6666666666, ans=0.125 2023-11-19 02:31:24,664 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6550, loss[loss=0.1206, simple_loss=0.1449, pruned_loss=0.03851, audio_tagging_loss=0.009705, over 15227.00 frames. ], tot_loss[loss=0.09374, simple_loss=0.1107, pruned_loss=0.02742, audio_tagging_loss=0.01097, over 3053440.53 frames. ], batch size: 56, lr: 9.84e-03, grad_scale: 32.0 2023-11-19 02:31:30,232 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.50 vs. limit=15.0 2023-11-19 02:31:36,235 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:31:41,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=524660.0, ans=0.0 2023-11-19 02:31:44,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=524660.0, ans=0.125 2023-11-19 02:31:53,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=524726.6666666666, ans=0.2 2023-11-19 02:31:55,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=524726.6666666666, ans=0.125 2023-11-19 02:31:56,163 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=524793.3333333334, ans=0.125 2023-11-19 02:31:58,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=524793.3333333334, ans=0.125 2023-11-19 02:32:01,756 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=23.08 vs. limit=22.5 2023-11-19 02:32:10,209 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.72 vs. limit=22.5 2023-11-19 02:32:20,068 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6600, loss[loss=0.07635, simple_loss=0.08937, pruned_loss=0.01996, audio_tagging_loss=0.01171, over 15243.00 frames. ], tot_loss[loss=0.09392, simple_loss=0.1112, pruned_loss=0.0275, audio_tagging_loss=0.01083, over 3051359.30 frames. ], batch size: 58, lr: 9.84e-03, grad_scale: 32.0 2023-11-19 02:32:28,939 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.26 vs. limit=22.5 2023-11-19 02:32:41,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=525060.0, ans=0.1 2023-11-19 02:32:45,902 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.34 vs. limit=15.0 2023-11-19 02:32:51,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=525060.0, ans=0.125 2023-11-19 02:32:53,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=525126.6666666666, ans=0.07 2023-11-19 02:32:55,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=525126.6666666666, ans=0.07 2023-11-19 02:32:55,746 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.348e+01 8.622e+01 9.360e+01 1.038e+02 1.373e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 02:32:55,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=525126.6666666666, ans=0.0 2023-11-19 02:33:14,897 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6650, loss[loss=0.1055, simple_loss=0.1313, pruned_loss=0.02983, audio_tagging_loss=0.01008, over 15187.00 frames. ], tot_loss[loss=0.09394, simple_loss=0.1115, pruned_loss=0.02747, audio_tagging_loss=0.0107, over 3048626.89 frames. ], batch size: 56, lr: 9.83e-03, grad_scale: 32.0 2023-11-19 02:33:34,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=525326.6666666666, ans=0.2 2023-11-19 02:33:35,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=525326.6666666666, ans=0.0 2023-11-19 02:33:49,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=525460.0, ans=0.5 2023-11-19 02:34:10,663 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6700, loss[loss=0.07715, simple_loss=0.09904, pruned_loss=0.02157, audio_tagging_loss=0.00606, over 15086.00 frames. ], tot_loss[loss=0.09353, simple_loss=0.1111, pruned_loss=0.02719, audio_tagging_loss=0.01078, over 3054076.48 frames. ], batch size: 57, lr: 9.83e-03, grad_scale: 32.0 2023-11-19 02:34:11,652 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.50 vs. limit=15.0 2023-11-19 02:34:21,568 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.99 vs. limit=15.0 2023-11-19 02:34:45,466 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.124e+01 8.532e+01 9.161e+01 1.021e+02 1.335e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 02:35:05,523 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6750, loss[loss=0.1141, simple_loss=0.1296, pruned_loss=0.04005, audio_tagging_loss=0.009187, over 14507.00 frames. ], tot_loss[loss=0.09279, simple_loss=0.11, pruned_loss=0.02692, audio_tagging_loss=0.01085, over 3046700.35 frames. ], batch size: 55, lr: 9.83e-03, grad_scale: 32.0 2023-11-19 02:35:16,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=525993.3333333334, ans=0.0 2023-11-19 02:35:25,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=526060.0, ans=0.125 2023-11-19 02:35:27,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=526060.0, ans=0.125 2023-11-19 02:35:29,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.whiten.whitening_limit, batch_count=526060.0, ans=15.0 2023-11-19 02:35:33,296 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=15.63 vs. limit=22.5 2023-11-19 02:35:33,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=526060.0, ans=0.0 2023-11-19 02:35:43,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=526126.6666666666, ans=0.1 2023-11-19 02:35:47,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=526126.6666666666, ans=0.0 2023-11-19 02:35:51,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=526193.3333333334, ans=0.2 2023-11-19 02:35:54,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=526193.3333333334, ans=0.125 2023-11-19 02:35:59,615 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.62 vs. limit=15.0 2023-11-19 02:36:00,009 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6800, loss[loss=0.05232, simple_loss=0.0612, pruned_loss=0.01034, audio_tagging_loss=0.01138, over 13853.00 frames. ], tot_loss[loss=0.09277, simple_loss=0.11, pruned_loss=0.02696, audio_tagging_loss=0.01081, over 3054652.87 frames. ], batch size: 55, lr: 9.82e-03, grad_scale: 32.0 2023-11-19 02:36:16,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=526326.6666666666, ans=0.125 2023-11-19 02:36:24,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=526393.3333333334, ans=0.125 2023-11-19 02:36:27,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=526393.3333333334, ans=0.2 2023-11-19 02:36:35,400 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.668e+01 8.914e+01 9.908e+01 1.073e+02 1.400e+02, threshold=1.982e+02, percent-clipped=0.0 2023-11-19 02:36:45,743 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.79 vs. limit=15.0 2023-11-19 02:36:52,392 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=526526.6666666666, ans=0.0 2023-11-19 02:36:54,910 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6850, loss[loss=0.1068, simple_loss=0.1186, pruned_loss=0.03213, audio_tagging_loss=0.01532, over 16370.00 frames. ], tot_loss[loss=0.09246, simple_loss=0.1098, pruned_loss=0.02681, audio_tagging_loss=0.01077, over 3053909.30 frames. ], batch size: 63, lr: 9.82e-03, grad_scale: 32.0 2023-11-19 02:37:00,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=526593.3333333334, ans=0.1 2023-11-19 02:37:32,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.56 vs. limit=6.0 2023-11-19 02:37:47,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=526860.0, ans=0.0 2023-11-19 02:37:48,244 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=526860.0, ans=0.95 2023-11-19 02:37:48,473 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.40 vs. limit=15.0 2023-11-19 02:37:49,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=526860.0, ans=0.1 2023-11-19 02:37:51,106 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6900, loss[loss=0.0768, simple_loss=0.0906, pruned_loss=0.022, audio_tagging_loss=0.009504, over 13703.00 frames. ], tot_loss[loss=0.09185, simple_loss=0.1093, pruned_loss=0.02644, audio_tagging_loss=0.01073, over 3051129.69 frames. ], batch size: 52, lr: 9.82e-03, grad_scale: 32.0 2023-11-19 02:37:56,602 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:38:00,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=526993.3333333334, ans=0.125 2023-11-19 02:38:26,434 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.916e+01 8.284e+01 8.942e+01 9.819e+01 1.283e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-19 02:38:34,405 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 02:38:45,938 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 6950, loss[loss=0.1075, simple_loss=0.1192, pruned_loss=0.03691, audio_tagging_loss=0.01102, over 15203.00 frames. ], tot_loss[loss=0.09232, simple_loss=0.1096, pruned_loss=0.02667, audio_tagging_loss=0.01083, over 3054552.81 frames. ], batch size: 55, lr: 9.81e-03, grad_scale: 32.0 2023-11-19 02:38:49,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=527260.0, ans=0.0 2023-11-19 02:39:05,261 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=8.687e-01 2023-11-19 02:39:13,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=527393.3333333334, ans=0.1 2023-11-19 02:39:31,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=527526.6666666666, ans=0.2 2023-11-19 02:39:32,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=527526.6666666666, ans=0.07 2023-11-19 02:39:34,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=527526.6666666666, ans=0.125 2023-11-19 02:39:40,942 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7000, loss[loss=0.09968, simple_loss=0.1142, pruned_loss=0.02899, audio_tagging_loss=0.0136, over 13904.00 frames. ], tot_loss[loss=0.09267, simple_loss=0.1097, pruned_loss=0.02692, audio_tagging_loss=0.01089, over 3047899.88 frames. ], batch size: 53, lr: 9.81e-03, grad_scale: 32.0 2023-11-19 02:39:48,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=527593.3333333334, ans=0.0 2023-11-19 02:39:52,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=527660.0, ans=0.125 2023-11-19 02:39:58,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=527660.0, ans=0.0 2023-11-19 02:40:16,979 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.174e+01 8.756e+01 9.618e+01 1.077e+02 1.519e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-19 02:40:18,805 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.97 vs. limit=22.5 2023-11-19 02:40:20,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=527793.3333333334, ans=0.2 2023-11-19 02:40:25,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=527860.0, ans=0.125 2023-11-19 02:40:31,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=527860.0, ans=0.1 2023-11-19 02:40:37,803 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7050, loss[loss=0.09128, simple_loss=0.1145, pruned_loss=0.02331, audio_tagging_loss=0.01072, over 15782.00 frames. ], tot_loss[loss=0.09226, simple_loss=0.1094, pruned_loss=0.0266, audio_tagging_loss=0.01097, over 3045495.65 frames. ], batch size: 58, lr: 9.81e-03, grad_scale: 32.0 2023-11-19 02:40:51,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=527993.3333333334, ans=0.2 2023-11-19 02:41:29,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=528193.3333333334, ans=0.0 2023-11-19 02:41:33,641 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7100, loss[loss=0.0852, simple_loss=0.1005, pruned_loss=0.02403, audio_tagging_loss=0.01093, over 15890.00 frames. ], tot_loss[loss=0.09208, simple_loss=0.1089, pruned_loss=0.02661, audio_tagging_loss=0.01103, over 3047705.82 frames. ], batch size: 58, lr: 9.81e-03, grad_scale: 32.0 2023-11-19 02:41:33,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=528260.0, ans=0.125 2023-11-19 02:41:38,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=528260.0, ans=0.0 2023-11-19 02:41:41,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=528260.0, ans=0.09899494936611666 2023-11-19 02:41:41,437 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.60 vs. limit=15.0 2023-11-19 02:42:08,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=528460.0, ans=0.1 2023-11-19 02:42:09,472 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.659e+01 8.375e+01 9.204e+01 1.018e+02 1.304e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 02:42:28,590 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7150, loss[loss=0.11, simple_loss=0.147, pruned_loss=0.02703, audio_tagging_loss=0.009453, over 16712.00 frames. ], tot_loss[loss=0.09202, simple_loss=0.1089, pruned_loss=0.02654, audio_tagging_loss=0.01105, over 3045420.89 frames. ], batch size: 57, lr: 9.80e-03, grad_scale: 32.0 2023-11-19 02:42:33,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.20 vs. limit=22.5 2023-11-19 02:42:45,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=528660.0, ans=0.1 2023-11-19 02:42:45,292 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=528660.0, ans=0.1 2023-11-19 02:42:55,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=528726.6666666666, ans=0.07 2023-11-19 02:43:19,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=528860.0, ans=0.125 2023-11-19 02:43:25,072 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7200, loss[loss=0.1026, simple_loss=0.1216, pruned_loss=0.03044, audio_tagging_loss=0.0113, over 15801.00 frames. ], tot_loss[loss=0.09219, simple_loss=0.1089, pruned_loss=0.02659, audio_tagging_loss=0.01116, over 3043301.78 frames. ], batch size: 57, lr: 9.80e-03, grad_scale: 32.0 2023-11-19 02:43:28,328 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=528926.6666666666, ans=0.04949747468305833 2023-11-19 02:43:28,565 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.49 vs. limit=6.0 2023-11-19 02:43:46,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=529060.0, ans=0.09899494936611666 2023-11-19 02:43:51,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=529060.0, ans=0.125 2023-11-19 02:43:53,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=529060.0, ans=0.125 2023-11-19 02:43:59,943 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.223e+01 8.683e+01 9.470e+01 1.039e+02 1.567e+02, threshold=1.894e+02, percent-clipped=0.0 2023-11-19 02:44:07,891 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.68 vs. limit=15.0 2023-11-19 02:44:20,483 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7250, loss[loss=0.1078, simple_loss=0.1276, pruned_loss=0.03528, audio_tagging_loss=0.008688, over 15263.00 frames. ], tot_loss[loss=0.09239, simple_loss=0.1091, pruned_loss=0.02666, audio_tagging_loss=0.01119, over 3034569.57 frames. ], batch size: 57, lr: 9.80e-03, grad_scale: 32.0 2023-11-19 02:44:30,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=529326.6666666666, ans=0.1 2023-11-19 02:44:35,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=529326.6666666666, ans=0.0 2023-11-19 02:44:48,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=529393.3333333334, ans=0.125 2023-11-19 02:44:52,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=529393.3333333334, ans=0.125 2023-11-19 02:44:55,625 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.83 vs. limit=6.0 2023-11-19 02:45:15,822 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7300, loss[loss=0.08219, simple_loss=0.09206, pruned_loss=0.026, audio_tagging_loss=0.01017, over 14153.00 frames. ], tot_loss[loss=0.09257, simple_loss=0.1095, pruned_loss=0.02674, audio_tagging_loss=0.0111, over 3036470.21 frames. ], batch size: 55, lr: 9.79e-03, grad_scale: 32.0 2023-11-19 02:45:17,130 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:45:20,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.71 vs. limit=15.0 2023-11-19 02:45:29,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=529660.0, ans=0.0 2023-11-19 02:45:46,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=529726.6666666666, ans=0.0 2023-11-19 02:45:51,672 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.087e+01 8.930e+01 9.656e+01 1.070e+02 1.553e+02, threshold=1.931e+02, percent-clipped=0.0 2023-11-19 02:45:53,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=529793.3333333334, ans=0.125 2023-11-19 02:46:06,234 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=529860.0, ans=0.125 2023-11-19 02:46:07,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=529860.0, ans=0.125 2023-11-19 02:46:12,165 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7350, loss[loss=0.09554, simple_loss=0.1069, pruned_loss=0.02993, audio_tagging_loss=0.01215, over 15970.00 frames. ], tot_loss[loss=0.09309, simple_loss=0.1101, pruned_loss=0.02703, audio_tagging_loss=0.01101, over 3036929.95 frames. ], batch size: 61, lr: 9.79e-03, grad_scale: 32.0 2023-11-19 02:46:15,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.45 vs. limit=15.0 2023-11-19 02:46:20,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=529926.6666666666, ans=0.0 2023-11-19 02:46:24,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=529993.3333333334, ans=0.0 2023-11-19 02:47:04,525 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.96 vs. limit=15.0 2023-11-19 02:47:07,075 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7400, loss[loss=0.09194, simple_loss=0.1074, pruned_loss=0.02432, audio_tagging_loss=0.01394, over 15259.00 frames. ], tot_loss[loss=0.09306, simple_loss=0.1103, pruned_loss=0.0271, audio_tagging_loss=0.01082, over 3038461.49 frames. ], batch size: 56, lr: 9.79e-03, grad_scale: 32.0 2023-11-19 02:47:22,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=530326.6666666666, ans=0.125 2023-11-19 02:47:43,087 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.078e+01 8.685e+01 9.299e+01 1.013e+02 1.325e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 02:47:50,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=530526.6666666666, ans=0.0 2023-11-19 02:47:56,419 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.38 vs. limit=15.0 2023-11-19 02:48:00,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=530526.6666666666, ans=0.125 2023-11-19 02:48:02,735 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7450, loss[loss=0.07957, simple_loss=0.1026, pruned_loss=0.01824, audio_tagging_loss=0.01, over 14921.00 frames. ], tot_loss[loss=0.09302, simple_loss=0.1104, pruned_loss=0.02696, audio_tagging_loss=0.01085, over 3036502.86 frames. ], batch size: 55, lr: 9.78e-03, grad_scale: 32.0 2023-11-19 02:48:07,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=530593.3333333334, ans=0.125 2023-11-19 02:48:08,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=530593.3333333334, ans=0.0 2023-11-19 02:48:19,631 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.27 vs. limit=15.0 2023-11-19 02:48:23,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=530660.0, ans=0.0 2023-11-19 02:48:36,480 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.95 vs. limit=10.0 2023-11-19 02:48:48,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=530860.0, ans=0.125 2023-11-19 02:48:48,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=530860.0, ans=0.1 2023-11-19 02:48:59,263 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7500, loss[loss=0.0944, simple_loss=0.1155, pruned_loss=0.02741, audio_tagging_loss=0.009223, over 16722.00 frames. ], tot_loss[loss=0.09315, simple_loss=0.1108, pruned_loss=0.02697, audio_tagging_loss=0.01078, over 3044483.09 frames. ], batch size: 64, lr: 9.78e-03, grad_scale: 32.0 2023-11-19 02:48:59,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=530926.6666666666, ans=0.0 2023-11-19 02:49:05,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=530926.6666666666, ans=0.1 2023-11-19 02:49:16,518 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.23 vs. limit=12.0 2023-11-19 02:49:33,664 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.745e+01 8.748e+01 9.301e+01 1.047e+02 1.348e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 02:49:53,802 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7550, loss[loss=0.1074, simple_loss=0.1373, pruned_loss=0.02945, audio_tagging_loss=0.009237, over 15189.00 frames. ], tot_loss[loss=0.09342, simple_loss=0.1112, pruned_loss=0.02707, audio_tagging_loss=0.01074, over 3046557.32 frames. ], batch size: 57, lr: 9.78e-03, grad_scale: 32.0 2023-11-19 02:49:56,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=531260.0, ans=0.1 2023-11-19 02:49:59,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=531260.0, ans=0.125 2023-11-19 02:50:03,813 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.04 vs. limit=15.0 2023-11-19 02:50:15,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.max_abs, batch_count=531393.3333333334, ans=10.0 2023-11-19 02:50:34,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=531460.0, ans=0.125 2023-11-19 02:50:41,470 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 02:50:48,803 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7600, loss[loss=0.08595, simple_loss=0.09259, pruned_loss=0.02648, audio_tagging_loss=0.01317, over 15359.00 frames. ], tot_loss[loss=0.09366, simple_loss=0.1116, pruned_loss=0.02718, audio_tagging_loss=0.01066, over 3052272.50 frames. ], batch size: 58, lr: 9.77e-03, grad_scale: 32.0 2023-11-19 02:51:10,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=531660.0, ans=0.0 2023-11-19 02:51:10,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.39 vs. limit=22.5 2023-11-19 02:51:14,508 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=531726.6666666666, ans=0.125 2023-11-19 02:51:15,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=531726.6666666666, ans=0.125 2023-11-19 02:51:17,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=531726.6666666666, ans=0.125 2023-11-19 02:51:24,909 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.650e+01 9.572e+01 1.070e+02 1.390e+02, threshold=1.914e+02, percent-clipped=0.0 2023-11-19 02:51:30,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.62 vs. limit=15.0 2023-11-19 02:51:45,439 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7650, loss[loss=0.09676, simple_loss=0.1144, pruned_loss=0.02663, audio_tagging_loss=0.01296, over 15809.00 frames. ], tot_loss[loss=0.09197, simple_loss=0.1094, pruned_loss=0.02651, audio_tagging_loss=0.01077, over 3044945.20 frames. ], batch size: 61, lr: 9.77e-03, grad_scale: 16.0 2023-11-19 02:51:48,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=531926.6666666666, ans=0.125 2023-11-19 02:52:02,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=531993.3333333334, ans=0.125 2023-11-19 02:52:17,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer_ff3.min_abs, batch_count=532126.6666666666, ans=0.2 2023-11-19 02:52:21,689 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.83 vs. limit=15.0 2023-11-19 02:52:41,046 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7700, loss[loss=0.09943, simple_loss=0.1241, pruned_loss=0.02728, audio_tagging_loss=0.01013, over 14028.00 frames. ], tot_loss[loss=0.09266, simple_loss=0.1104, pruned_loss=0.0267, audio_tagging_loss=0.01076, over 3045393.01 frames. ], batch size: 53, lr: 9.77e-03, grad_scale: 16.0 2023-11-19 02:52:42,809 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.63 vs. limit=15.0 2023-11-19 02:52:50,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=532326.6666666666, ans=0.125 2023-11-19 02:53:04,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=532393.3333333334, ans=0.125 2023-11-19 02:53:04,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=532393.3333333334, ans=0.035 2023-11-19 02:53:07,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=532393.3333333334, ans=0.0 2023-11-19 02:53:17,952 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.475e+01 8.485e+01 9.381e+01 1.068e+02 1.739e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-19 02:53:18,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=532460.0, ans=0.1 2023-11-19 02:53:35,821 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7750, loss[loss=0.07587, simple_loss=0.09664, pruned_loss=0.01814, audio_tagging_loss=0.009408, over 14883.00 frames. ], tot_loss[loss=0.0936, simple_loss=0.1116, pruned_loss=0.02706, audio_tagging_loss=0.01074, over 3044944.54 frames. ], batch size: 57, lr: 9.77e-03, grad_scale: 16.0 2023-11-19 02:53:37,110 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=2.647e-03 2023-11-19 02:53:42,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=532593.3333333334, ans=0.04949747468305833 2023-11-19 02:53:48,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=532660.0, ans=0.0 2023-11-19 02:54:23,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=532860.0, ans=0.125 2023-11-19 02:54:31,615 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7800, loss[loss=0.0952, simple_loss=0.116, pruned_loss=0.02795, audio_tagging_loss=0.00923, over 14916.00 frames. ], tot_loss[loss=0.09435, simple_loss=0.1124, pruned_loss=0.02739, audio_tagging_loss=0.01076, over 3036787.89 frames. ], batch size: 56, lr: 9.76e-03, grad_scale: 16.0 2023-11-19 02:54:32,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=532926.6666666666, ans=0.125 2023-11-19 02:55:07,937 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.958e+01 8.565e+01 9.591e+01 1.072e+02 1.947e+02, threshold=1.918e+02, percent-clipped=1.0 2023-11-19 02:55:12,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=533126.6666666666, ans=0.125 2023-11-19 02:55:13,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=533126.6666666666, ans=0.1 2023-11-19 02:55:27,530 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7850, loss[loss=0.1053, simple_loss=0.1151, pruned_loss=0.03552, audio_tagging_loss=0.01221, over 14872.00 frames. ], tot_loss[loss=0.09336, simple_loss=0.1107, pruned_loss=0.02707, audio_tagging_loss=0.01096, over 3038106.52 frames. ], batch size: 57, lr: 9.76e-03, grad_scale: 16.0 2023-11-19 02:55:38,420 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-80000.pt 2023-11-19 02:55:42,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=533326.6666666666, ans=0.09899494936611666 2023-11-19 02:56:03,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=533460.0, ans=0.125 2023-11-19 02:56:11,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=533460.0, ans=0.125 2023-11-19 02:56:14,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=533526.6666666666, ans=0.2 2023-11-19 02:56:16,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=533526.6666666666, ans=0.0 2023-11-19 02:56:22,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=533526.6666666666, ans=0.2 2023-11-19 02:56:24,749 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7900, loss[loss=0.08967, simple_loss=0.1099, pruned_loss=0.02343, audio_tagging_loss=0.01128, over 15470.00 frames. ], tot_loss[loss=0.09374, simple_loss=0.1108, pruned_loss=0.02729, audio_tagging_loss=0.01104, over 3037565.19 frames. ], batch size: 57, lr: 9.76e-03, grad_scale: 16.0 2023-11-19 02:56:26,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=533593.3333333334, ans=0.2 2023-11-19 02:56:33,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=533593.3333333334, ans=0.125 2023-11-19 02:56:35,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=533660.0, ans=0.125 2023-11-19 02:57:01,474 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.359e+01 8.512e+01 9.298e+01 1.008e+02 1.380e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 02:57:10,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=533860.0, ans=0.2 2023-11-19 02:57:15,672 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.10 vs. limit=15.0 2023-11-19 02:57:19,990 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 7950, loss[loss=0.1303, simple_loss=0.16, pruned_loss=0.03954, audio_tagging_loss=0.01076, over 15873.00 frames. ], tot_loss[loss=0.09388, simple_loss=0.1106, pruned_loss=0.02734, audio_tagging_loss=0.01122, over 3042889.60 frames. ], batch size: 55, lr: 9.75e-03, grad_scale: 16.0 2023-11-19 02:57:34,945 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 02:57:40,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=533993.3333333334, ans=0.0 2023-11-19 02:58:05,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=534193.3333333334, ans=0.125 2023-11-19 02:58:13,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=534193.3333333334, ans=0.0 2023-11-19 02:58:13,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=534193.3333333334, ans=0.125 2023-11-19 02:58:16,024 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8000, loss[loss=0.09674, simple_loss=0.1143, pruned_loss=0.02883, audio_tagging_loss=0.01076, over 14828.00 frames. ], tot_loss[loss=0.09401, simple_loss=0.1105, pruned_loss=0.02746, audio_tagging_loss=0.0113, over 3039014.14 frames. ], batch size: 59, lr: 9.75e-03, grad_scale: 32.0 2023-11-19 02:58:16,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=534260.0, ans=0.1 2023-11-19 02:58:34,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=534326.6666666666, ans=0.125 2023-11-19 02:58:35,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=534326.6666666666, ans=0.125 2023-11-19 02:58:50,424 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=534460.0, ans=0.125 2023-11-19 02:58:52,226 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.488e+01 9.029e+01 9.898e+01 1.404e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-19 02:59:08,695 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=534526.6666666666, ans=0.1 2023-11-19 02:59:10,642 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8050, loss[loss=0.08937, simple_loss=0.1018, pruned_loss=0.02681, audio_tagging_loss=0.01167, over 15396.00 frames. ], tot_loss[loss=0.09416, simple_loss=0.1108, pruned_loss=0.02745, audio_tagging_loss=0.01131, over 3040038.55 frames. ], batch size: 56, lr: 9.75e-03, grad_scale: 32.0 2023-11-19 02:59:12,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=534593.3333333334, ans=0.0 2023-11-19 02:59:20,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=534660.0, ans=0.125 2023-11-19 02:59:31,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=534660.0, ans=0.125 2023-11-19 02:59:32,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=534726.6666666666, ans=0.0 2023-11-19 02:59:47,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=534793.3333333334, ans=0.125 2023-11-19 03:00:02,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=534860.0, ans=0.125 2023-11-19 03:00:06,533 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8100, loss[loss=0.06503, simple_loss=0.06894, pruned_loss=0.01795, audio_tagging_loss=0.0126, over 15803.00 frames. ], tot_loss[loss=0.09316, simple_loss=0.1096, pruned_loss=0.02714, audio_tagging_loss=0.01125, over 3036927.08 frames. ], batch size: 62, lr: 9.74e-03, grad_scale: 32.0 2023-11-19 03:00:11,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=534926.6666666666, ans=0.1 2023-11-19 03:00:18,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=534993.3333333334, ans=0.05 2023-11-19 03:00:42,980 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.187e+01 8.821e+01 9.637e+01 1.043e+02 1.464e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-19 03:00:43,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=535126.6666666666, ans=0.5 2023-11-19 03:01:02,687 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8150, loss[loss=0.1098, simple_loss=0.1339, pruned_loss=0.03245, audio_tagging_loss=0.01033, over 15470.00 frames. ], tot_loss[loss=0.09308, simple_loss=0.1096, pruned_loss=0.02717, audio_tagging_loss=0.01109, over 3037631.91 frames. ], batch size: 54, lr: 9.74e-03, grad_scale: 32.0 2023-11-19 03:01:05,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=535260.0, ans=0.0 2023-11-19 03:01:11,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=535260.0, ans=0.125 2023-11-19 03:01:18,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=535326.6666666666, ans=0.125 2023-11-19 03:01:18,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=535326.6666666666, ans=0.1 2023-11-19 03:01:27,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=535393.3333333334, ans=0.1 2023-11-19 03:01:33,405 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.841e-01 2023-11-19 03:01:35,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=535460.0, ans=0.1 2023-11-19 03:01:41,179 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.04 vs. limit=15.0 2023-11-19 03:01:42,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=535460.0, ans=0.0 2023-11-19 03:01:50,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=535526.6666666666, ans=0.0 2023-11-19 03:01:51,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=535526.6666666666, ans=0.0 2023-11-19 03:01:51,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=535526.6666666666, ans=0.125 2023-11-19 03:01:53,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=535526.6666666666, ans=0.125 2023-11-19 03:01:57,928 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8200, loss[loss=0.1053, simple_loss=0.1132, pruned_loss=0.03779, audio_tagging_loss=0.01092, over 15172.00 frames. ], tot_loss[loss=0.09361, simple_loss=0.1106, pruned_loss=0.02745, audio_tagging_loss=0.01085, over 3041561.57 frames. ], batch size: 56, lr: 9.74e-03, grad_scale: 32.0 2023-11-19 03:02:00,007 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:02:34,968 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.718e+01 8.630e+01 9.276e+01 1.032e+02 1.538e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-19 03:02:41,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=535860.0, ans=0.1 2023-11-19 03:02:53,480 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8250, loss[loss=0.08739, simple_loss=0.1085, pruned_loss=0.02262, audio_tagging_loss=0.01053, over 15626.00 frames. ], tot_loss[loss=0.09295, simple_loss=0.11, pruned_loss=0.0272, audio_tagging_loss=0.01077, over 3046783.36 frames. ], batch size: 57, lr: 9.74e-03, grad_scale: 32.0 2023-11-19 03:02:57,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=535926.6666666666, ans=0.125 2023-11-19 03:03:08,136 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.70 vs. limit=15.0 2023-11-19 03:03:09,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=535993.3333333334, ans=0.1 2023-11-19 03:03:32,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=536126.6666666666, ans=0.2 2023-11-19 03:03:49,901 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8300, loss[loss=0.09548, simple_loss=0.1042, pruned_loss=0.03233, audio_tagging_loss=0.01105, over 14835.00 frames. ], tot_loss[loss=0.09333, simple_loss=0.1104, pruned_loss=0.02732, audio_tagging_loss=0.01079, over 3042486.03 frames. ], batch size: 57, lr: 9.73e-03, grad_scale: 32.0 2023-11-19 03:04:01,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=536326.6666666666, ans=0.0 2023-11-19 03:04:01,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=536326.6666666666, ans=0.1 2023-11-19 03:04:08,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=536326.6666666666, ans=0.125 2023-11-19 03:04:11,732 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=536393.3333333334, ans=0.0 2023-11-19 03:04:27,420 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.080e+01 8.802e+01 9.688e+01 1.089e+02 1.659e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-19 03:04:27,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=536460.0, ans=0.125 2023-11-19 03:04:45,393 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8350, loss[loss=0.09594, simple_loss=0.1096, pruned_loss=0.02743, audio_tagging_loss=0.01371, over 16155.00 frames. ], tot_loss[loss=0.09338, simple_loss=0.1109, pruned_loss=0.02733, audio_tagging_loss=0.01062, over 3042053.84 frames. ], batch size: 62, lr: 9.73e-03, grad_scale: 16.0 2023-11-19 03:04:55,056 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=536660.0, ans=0.0 2023-11-19 03:05:25,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=536793.3333333334, ans=0.125 2023-11-19 03:05:34,426 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:05:39,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=536926.6666666666, ans=0.2 2023-11-19 03:05:40,347 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8400, loss[loss=0.08361, simple_loss=0.1102, pruned_loss=0.01867, audio_tagging_loss=0.009847, over 14600.00 frames. ], tot_loss[loss=0.09269, simple_loss=0.1099, pruned_loss=0.02705, audio_tagging_loss=0.01071, over 3040846.69 frames. ], batch size: 55, lr: 9.73e-03, grad_scale: 32.0 2023-11-19 03:05:47,802 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.37 vs. limit=15.0 2023-11-19 03:05:50,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=536993.3333333334, ans=0.0 2023-11-19 03:06:09,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=537060.0, ans=0.125 2023-11-19 03:06:18,483 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.875e+01 8.679e+01 9.349e+01 1.017e+02 2.307e+02, threshold=1.870e+02, percent-clipped=1.0 2023-11-19 03:06:31,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=537193.3333333334, ans=0.07 2023-11-19 03:06:36,894 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8450, loss[loss=0.09581, simple_loss=0.116, pruned_loss=0.02648, audio_tagging_loss=0.01132, over 15539.00 frames. ], tot_loss[loss=0.09258, simple_loss=0.1097, pruned_loss=0.02696, audio_tagging_loss=0.01075, over 3039070.24 frames. ], batch size: 56, lr: 9.72e-03, grad_scale: 32.0 2023-11-19 03:06:42,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=537260.0, ans=0.0 2023-11-19 03:06:44,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=537260.0, ans=0.125 2023-11-19 03:06:52,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=537326.6666666666, ans=0.0 2023-11-19 03:06:53,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=537326.6666666666, ans=0.09899494936611666 2023-11-19 03:07:01,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=537393.3333333334, ans=0.125 2023-11-19 03:07:23,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=537526.6666666666, ans=0.0 2023-11-19 03:07:27,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=537526.6666666666, ans=0.1 2023-11-19 03:07:31,466 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8500, loss[loss=0.096, simple_loss=0.1193, pruned_loss=0.02626, audio_tagging_loss=0.01007, over 14938.00 frames. ], tot_loss[loss=0.09313, simple_loss=0.1106, pruned_loss=0.02711, audio_tagging_loss=0.01075, over 3040954.69 frames. ], batch size: 55, lr: 9.72e-03, grad_scale: 32.0 2023-11-19 03:07:33,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=537593.3333333334, ans=0.0 2023-11-19 03:07:55,430 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.79 vs. limit=15.0 2023-11-19 03:08:09,031 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.775e+01 8.768e+01 9.309e+01 1.039e+02 1.379e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-19 03:08:10,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=537793.3333333334, ans=0.125 2023-11-19 03:08:12,497 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=537793.3333333334, ans=0.2 2023-11-19 03:08:15,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=537860.0, ans=0.125 2023-11-19 03:08:26,576 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8550, loss[loss=0.08681, simple_loss=0.09919, pruned_loss=0.02506, audio_tagging_loss=0.01216, over 16181.00 frames. ], tot_loss[loss=0.09349, simple_loss=0.111, pruned_loss=0.02719, audio_tagging_loss=0.01081, over 3040627.10 frames. ], batch size: 66, lr: 9.72e-03, grad_scale: 32.0 2023-11-19 03:08:33,835 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.99 vs. limit=22.5 2023-11-19 03:08:37,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=537993.3333333334, ans=0.125 2023-11-19 03:09:02,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=538126.6666666666, ans=0.125 2023-11-19 03:09:05,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=538126.6666666666, ans=0.09899494936611666 2023-11-19 03:09:21,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=538260.0, ans=0.07 2023-11-19 03:09:22,934 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8600, loss[loss=0.1294, simple_loss=0.1511, pruned_loss=0.04524, audio_tagging_loss=0.008619, over 15159.00 frames. ], tot_loss[loss=0.09415, simple_loss=0.1119, pruned_loss=0.02739, audio_tagging_loss=0.01081, over 3046639.18 frames. ], batch size: 55, lr: 9.71e-03, grad_scale: 32.0 2023-11-19 03:09:35,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=538326.6666666666, ans=0.125 2023-11-19 03:09:50,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.22 vs. limit=15.0 2023-11-19 03:09:51,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=538393.3333333334, ans=0.2 2023-11-19 03:09:59,803 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.620e+01 8.913e+01 9.582e+01 1.068e+02 1.371e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-19 03:10:01,538 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.46 vs. limit=15.0 2023-11-19 03:10:02,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=538460.0, ans=0.125 2023-11-19 03:10:17,869 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8650, loss[loss=0.1106, simple_loss=0.1263, pruned_loss=0.03538, audio_tagging_loss=0.01211, over 15698.00 frames. ], tot_loss[loss=0.09364, simple_loss=0.1112, pruned_loss=0.02714, audio_tagging_loss=0.01089, over 3050372.78 frames. ], batch size: 59, lr: 9.71e-03, grad_scale: 32.0 2023-11-19 03:10:21,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=538593.3333333334, ans=0.125 2023-11-19 03:10:21,660 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.18 vs. limit=15.0 2023-11-19 03:10:32,047 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.56 vs. limit=15.0 2023-11-19 03:10:41,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=538726.6666666666, ans=0.2 2023-11-19 03:10:43,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=538726.6666666666, ans=0.1 2023-11-19 03:10:49,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=538726.6666666666, ans=0.125 2023-11-19 03:11:00,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=538793.3333333334, ans=0.035 2023-11-19 03:11:12,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=538926.6666666666, ans=0.125 2023-11-19 03:11:13,649 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8700, loss[loss=0.1061, simple_loss=0.1337, pruned_loss=0.02861, audio_tagging_loss=0.01063, over 16717.00 frames. ], tot_loss[loss=0.09426, simple_loss=0.1117, pruned_loss=0.02737, audio_tagging_loss=0.01104, over 3052804.08 frames. ], batch size: 62, lr: 9.71e-03, grad_scale: 32.0 2023-11-19 03:11:28,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.29 vs. limit=15.0 2023-11-19 03:11:44,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=539060.0, ans=0.125 2023-11-19 03:11:50,664 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.247e+01 8.784e+01 9.683e+01 1.064e+02 1.511e+02, threshold=1.937e+02, percent-clipped=0.0 2023-11-19 03:12:02,403 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=539193.3333333334, ans=0.125 2023-11-19 03:12:09,097 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8750, loss[loss=0.1248, simple_loss=0.1708, pruned_loss=0.03394, audio_tagging_loss=0.005498, over 16473.00 frames. ], tot_loss[loss=0.09479, simple_loss=0.1124, pruned_loss=0.02753, audio_tagging_loss=0.01108, over 3053808.62 frames. ], batch size: 56, lr: 9.71e-03, grad_scale: 32.0 2023-11-19 03:12:16,233 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=539260.0, ans=0.1 2023-11-19 03:12:30,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=539393.3333333334, ans=0.0 2023-11-19 03:12:32,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=539393.3333333334, ans=15.0 2023-11-19 03:12:43,087 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:12:52,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=539526.6666666666, ans=0.125 2023-11-19 03:12:55,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=539526.6666666666, ans=0.1 2023-11-19 03:13:04,386 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8800, loss[loss=0.07202, simple_loss=0.09043, pruned_loss=0.01601, audio_tagging_loss=0.01079, over 14732.00 frames. ], tot_loss[loss=0.09456, simple_loss=0.1117, pruned_loss=0.02749, audio_tagging_loss=0.0112, over 3046065.26 frames. ], batch size: 58, lr: 9.70e-03, grad_scale: 32.0 2023-11-19 03:13:12,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=539593.3333333334, ans=0.0 2023-11-19 03:13:20,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.79 vs. limit=10.0 2023-11-19 03:13:31,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=539726.6666666666, ans=0.125 2023-11-19 03:13:42,593 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.511e+01 8.574e+01 9.508e+01 1.041e+02 1.765e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 03:13:42,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=539793.3333333334, ans=0.0 2023-11-19 03:13:53,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=539860.0, ans=22.5 2023-11-19 03:13:54,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=539860.0, ans=0.125 2023-11-19 03:13:59,444 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8850, loss[loss=0.08371, simple_loss=0.1055, pruned_loss=0.02326, audio_tagging_loss=0.007717, over 15081.00 frames. ], tot_loss[loss=0.09491, simple_loss=0.1124, pruned_loss=0.02757, audio_tagging_loss=0.01115, over 3050800.35 frames. ], batch size: 56, lr: 9.70e-03, grad_scale: 32.0 2023-11-19 03:14:12,410 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:14:36,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=540126.6666666666, ans=0.0 2023-11-19 03:14:47,436 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=540193.3333333334, ans=0.125 2023-11-19 03:14:55,129 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8900, loss[loss=0.09911, simple_loss=0.1195, pruned_loss=0.03021, audio_tagging_loss=0.009132, over 15622.00 frames. ], tot_loss[loss=0.09433, simple_loss=0.1121, pruned_loss=0.02733, audio_tagging_loss=0.01093, over 3057462.87 frames. ], batch size: 57, lr: 9.70e-03, grad_scale: 32.0 2023-11-19 03:15:23,613 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=2.205e-01 2023-11-19 03:15:24,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=540393.3333333334, ans=0.0 2023-11-19 03:15:32,227 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.117e+01 8.732e+01 9.510e+01 1.041e+02 1.883e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 03:15:34,134 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:15:50,760 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 8950, loss[loss=0.09969, simple_loss=0.1157, pruned_loss=0.03162, audio_tagging_loss=0.01021, over 14777.00 frames. ], tot_loss[loss=0.09451, simple_loss=0.1127, pruned_loss=0.02746, audio_tagging_loss=0.01072, over 3053786.73 frames. ], batch size: 57, lr: 9.69e-03, grad_scale: 32.0 2023-11-19 03:15:53,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=540593.3333333334, ans=0.05 2023-11-19 03:16:10,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=540660.0, ans=0.125 2023-11-19 03:16:13,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.72 vs. limit=6.0 2023-11-19 03:16:45,768 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9000, loss[loss=0.09301, simple_loss=0.1174, pruned_loss=0.02622, audio_tagging_loss=0.008079, over 15171.00 frames. ], tot_loss[loss=0.09422, simple_loss=0.1126, pruned_loss=0.02732, audio_tagging_loss=0.01061, over 3052287.76 frames. ], batch size: 55, lr: 9.69e-03, grad_scale: 32.0 2023-11-19 03:16:45,770 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 03:17:18,030 INFO [train_asr.py:1147] (0/4) Epoch 7, validation: loss=0.06875, simple_loss=0.05761, pruned_loss=0.007498, audio_tagging_loss=0.03244, over 4681554.00 frames. 2023-11-19 03:17:18,031 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 03:17:26,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=540926.6666666666, ans=0.0 2023-11-19 03:17:48,160 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.14 vs. limit=6.0 2023-11-19 03:17:49,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=541126.6666666666, ans=0.125 2023-11-19 03:17:54,257 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.255e+01 8.732e+01 9.313e+01 1.034e+02 1.719e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 03:18:00,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=541193.3333333334, ans=0.0 2023-11-19 03:18:06,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=541193.3333333334, ans=0.2 2023-11-19 03:18:08,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=541193.3333333334, ans=0.125 2023-11-19 03:18:12,232 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9050, loss[loss=0.07743, simple_loss=0.09274, pruned_loss=0.02284, audio_tagging_loss=0.008222, over 16187.00 frames. ], tot_loss[loss=0.09416, simple_loss=0.1127, pruned_loss=0.02732, audio_tagging_loss=0.01049, over 3057460.09 frames. ], batch size: 61, lr: 9.69e-03, grad_scale: 32.0 2023-11-19 03:18:17,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=541260.0, ans=0.2 2023-11-19 03:18:23,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=541326.6666666666, ans=0.1 2023-11-19 03:18:25,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=541326.6666666666, ans=0.0 2023-11-19 03:18:27,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=541326.6666666666, ans=0.1 2023-11-19 03:18:27,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=541326.6666666666, ans=0.125 2023-11-19 03:18:45,221 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=541460.0, ans=0.025 2023-11-19 03:18:51,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=541460.0, ans=0.125 2023-11-19 03:18:57,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=541526.6666666666, ans=0.0 2023-11-19 03:18:57,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=541526.6666666666, ans=0.1 2023-11-19 03:19:07,383 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9100, loss[loss=0.08451, simple_loss=0.09413, pruned_loss=0.02284, audio_tagging_loss=0.0146, over 16049.00 frames. ], tot_loss[loss=0.0933, simple_loss=0.1117, pruned_loss=0.02698, audio_tagging_loss=0.01049, over 3058600.91 frames. ], batch size: 60, lr: 9.68e-03, grad_scale: 32.0 2023-11-19 03:19:19,814 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=541660.0, ans=0.2 2023-11-19 03:19:27,507 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.16 vs. limit=12.0 2023-11-19 03:19:29,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=541726.6666666666, ans=0.0 2023-11-19 03:19:44,939 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.386e+01 8.588e+01 9.392e+01 1.039e+02 1.289e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-19 03:20:02,270 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9150, loss[loss=0.105, simple_loss=0.1197, pruned_loss=0.03627, audio_tagging_loss=0.008853, over 15101.00 frames. ], tot_loss[loss=0.09366, simple_loss=0.1119, pruned_loss=0.02725, audio_tagging_loss=0.01044, over 3058868.29 frames. ], batch size: 55, lr: 9.68e-03, grad_scale: 32.0 2023-11-19 03:20:13,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=541993.3333333334, ans=0.125 2023-11-19 03:20:27,019 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=542060.0, ans=0.125 2023-11-19 03:20:40,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=542126.6666666666, ans=0.0 2023-11-19 03:20:48,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.76 vs. limit=12.0 2023-11-19 03:20:51,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=542193.3333333334, ans=0.04949747468305833 2023-11-19 03:20:53,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.30 vs. limit=22.5 2023-11-19 03:20:57,895 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9200, loss[loss=0.0945, simple_loss=0.1088, pruned_loss=0.02715, audio_tagging_loss=0.01292, over 14688.00 frames. ], tot_loss[loss=0.09371, simple_loss=0.112, pruned_loss=0.02721, audio_tagging_loss=0.0105, over 3061745.94 frames. ], batch size: 55, lr: 9.68e-03, grad_scale: 32.0 2023-11-19 03:20:58,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.38 vs. limit=22.5 2023-11-19 03:20:59,569 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.73 vs. limit=12.0 2023-11-19 03:21:05,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=542260.0, ans=0.2 2023-11-19 03:21:05,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=542260.0, ans=0.0 2023-11-19 03:21:08,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=542326.6666666666, ans=0.125 2023-11-19 03:21:36,286 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.350e+01 8.578e+01 9.333e+01 1.009e+02 1.862e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-19 03:21:37,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.81 vs. limit=15.0 2023-11-19 03:21:48,340 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.01 vs. limit=22.5 2023-11-19 03:21:52,040 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9250, loss[loss=0.1267, simple_loss=0.1466, pruned_loss=0.04349, audio_tagging_loss=0.009883, over 14911.00 frames. ], tot_loss[loss=0.09329, simple_loss=0.1114, pruned_loss=0.02698, audio_tagging_loss=0.01063, over 3063516.98 frames. ], batch size: 57, lr: 9.68e-03, grad_scale: 32.0 2023-11-19 03:22:03,568 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.46 vs. limit=22.5 2023-11-19 03:22:21,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten.whitening_limit, batch_count=542726.6666666666, ans=15.0 2023-11-19 03:22:24,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=542793.3333333334, ans=0.125 2023-11-19 03:22:25,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=542793.3333333334, ans=0.0 2023-11-19 03:22:28,198 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.10 vs. limit=12.0 2023-11-19 03:22:32,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=542793.3333333334, ans=0.2 2023-11-19 03:22:47,243 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9300, loss[loss=0.08454, simple_loss=0.102, pruned_loss=0.02286, audio_tagging_loss=0.01067, over 14237.00 frames. ], tot_loss[loss=0.0926, simple_loss=0.1106, pruned_loss=0.02675, audio_tagging_loss=0.01056, over 3060471.58 frames. ], batch size: 56, lr: 9.67e-03, grad_scale: 32.0 2023-11-19 03:22:57,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=542993.3333333334, ans=0.0 2023-11-19 03:23:09,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=543060.0, ans=0.95 2023-11-19 03:23:12,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=543060.0, ans=0.125 2023-11-19 03:23:26,427 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 8.462e+01 9.179e+01 9.907e+01 1.156e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-19 03:23:31,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=543193.3333333334, ans=0.125 2023-11-19 03:23:31,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=543193.3333333334, ans=0.1 2023-11-19 03:23:42,870 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9350, loss[loss=0.1059, simple_loss=0.1341, pruned_loss=0.02793, audio_tagging_loss=0.01093, over 16776.00 frames. ], tot_loss[loss=0.09232, simple_loss=0.1101, pruned_loss=0.02668, audio_tagging_loss=0.01059, over 3066063.26 frames. ], batch size: 61, lr: 9.67e-03, grad_scale: 16.0 2023-11-19 03:23:47,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=543260.0, ans=0.0 2023-11-19 03:23:48,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=543260.0, ans=0.125 2023-11-19 03:23:49,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=543260.0, ans=0.125 2023-11-19 03:23:51,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=543260.0, ans=0.125 2023-11-19 03:23:53,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=543326.6666666666, ans=0.0 2023-11-19 03:24:05,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=15.39 vs. limit=15.0 2023-11-19 03:24:17,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=543460.0, ans=0.2 2023-11-19 03:24:23,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=543460.0, ans=0.125 2023-11-19 03:24:37,155 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9400, loss[loss=0.0854, simple_loss=0.103, pruned_loss=0.01946, audio_tagging_loss=0.01445, over 15369.00 frames. ], tot_loss[loss=0.09219, simple_loss=0.1098, pruned_loss=0.02652, audio_tagging_loss=0.01079, over 3057694.78 frames. ], batch size: 58, lr: 9.67e-03, grad_scale: 16.0 2023-11-19 03:24:48,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=543660.0, ans=0.125 2023-11-19 03:25:00,038 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=17.05 vs. limit=22.5 2023-11-19 03:25:14,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=543793.3333333334, ans=0.0 2023-11-19 03:25:16,705 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.406e+01 9.126e+01 1.047e+02 1.267e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 03:25:31,572 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9450, loss[loss=0.08593, simple_loss=0.1119, pruned_loss=0.01998, audio_tagging_loss=0.01003, over 14994.00 frames. ], tot_loss[loss=0.09217, simple_loss=0.1094, pruned_loss=0.02657, audio_tagging_loss=0.0109, over 3058406.68 frames. ], batch size: 56, lr: 9.66e-03, grad_scale: 16.0 2023-11-19 03:25:31,582 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:25:42,706 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.17 vs. limit=15.0 2023-11-19 03:25:49,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2023-11-19 03:25:51,896 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=16.58 vs. limit=22.5 2023-11-19 03:26:28,039 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9500, loss[loss=0.07271, simple_loss=0.08463, pruned_loss=0.01832, audio_tagging_loss=0.01208, over 16108.00 frames. ], tot_loss[loss=0.0925, simple_loss=0.1096, pruned_loss=0.02669, audio_tagging_loss=0.01099, over 3052908.99 frames. ], batch size: 63, lr: 9.66e-03, grad_scale: 16.0 2023-11-19 03:26:29,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=544260.0, ans=0.125 2023-11-19 03:26:42,086 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.70 vs. limit=22.5 2023-11-19 03:26:48,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.34 vs. limit=6.0 2023-11-19 03:27:08,271 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.048e+01 8.649e+01 9.463e+01 1.058e+02 1.966e+02, threshold=1.893e+02, percent-clipped=1.0 2023-11-19 03:27:23,687 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9550, loss[loss=0.1037, simple_loss=0.1118, pruned_loss=0.03831, audio_tagging_loss=0.009476, over 15417.00 frames. ], tot_loss[loss=0.0925, simple_loss=0.1094, pruned_loss=0.0267, audio_tagging_loss=0.01113, over 3056781.61 frames. ], batch size: 58, lr: 9.66e-03, grad_scale: 16.0 2023-11-19 03:27:26,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=544593.3333333334, ans=0.025 2023-11-19 03:27:27,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=544593.3333333334, ans=0.0 2023-11-19 03:27:33,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=544660.0, ans=0.0 2023-11-19 03:27:37,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=544660.0, ans=0.0 2023-11-19 03:27:38,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=544660.0, ans=0.05 2023-11-19 03:27:43,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=544660.0, ans=0.035 2023-11-19 03:27:47,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=544726.6666666666, ans=0.1 2023-11-19 03:28:13,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=544860.0, ans=0.125 2023-11-19 03:28:18,803 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9600, loss[loss=0.1088, simple_loss=0.1399, pruned_loss=0.03054, audio_tagging_loss=0.008308, over 15258.00 frames. ], tot_loss[loss=0.09356, simple_loss=0.1105, pruned_loss=0.02707, audio_tagging_loss=0.01122, over 3058743.93 frames. ], batch size: 55, lr: 9.66e-03, grad_scale: 32.0 2023-11-19 03:28:47,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=545060.0, ans=0.0 2023-11-19 03:28:59,143 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.567e+01 8.431e+01 9.173e+01 1.006e+02 1.337e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 03:29:15,058 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9650, loss[loss=0.0807, simple_loss=0.09829, pruned_loss=0.02149, audio_tagging_loss=0.01006, over 14835.00 frames. ], tot_loss[loss=0.09268, simple_loss=0.1096, pruned_loss=0.0267, audio_tagging_loss=0.01119, over 3058527.31 frames. ], batch size: 56, lr: 9.65e-03, grad_scale: 32.0 2023-11-19 03:29:18,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=545260.0, ans=0.125 2023-11-19 03:29:21,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=545260.0, ans=0.1 2023-11-19 03:29:44,841 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.88 vs. limit=15.0 2023-11-19 03:30:05,135 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.54 vs. limit=15.0 2023-11-19 03:30:10,038 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9700, loss[loss=0.09152, simple_loss=0.1131, pruned_loss=0.02422, audio_tagging_loss=0.01073, over 14493.00 frames. ], tot_loss[loss=0.092, simple_loss=0.1091, pruned_loss=0.02651, audio_tagging_loss=0.01095, over 3056120.82 frames. ], batch size: 56, lr: 9.65e-03, grad_scale: 32.0 2023-11-19 03:30:10,551 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=12.0 2023-11-19 03:30:26,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=545660.0, ans=0.0 2023-11-19 03:30:30,005 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.84 vs. limit=15.0 2023-11-19 03:30:35,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=545726.6666666666, ans=0.0 2023-11-19 03:30:50,515 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.966e+01 8.564e+01 9.508e+01 1.033e+02 1.418e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 03:31:05,788 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9750, loss[loss=0.08463, simple_loss=0.1032, pruned_loss=0.02189, audio_tagging_loss=0.01115, over 16633.00 frames. ], tot_loss[loss=0.09214, simple_loss=0.1098, pruned_loss=0.02638, audio_tagging_loss=0.01085, over 3054051.03 frames. ], batch size: 61, lr: 9.65e-03, grad_scale: 32.0 2023-11-19 03:31:09,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.77 vs. limit=22.5 2023-11-19 03:31:23,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=545993.3333333334, ans=0.0 2023-11-19 03:31:44,995 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.62 vs. limit=15.0 2023-11-19 03:31:48,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=546126.6666666666, ans=0.125 2023-11-19 03:31:51,068 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.69 vs. limit=10.0 2023-11-19 03:32:02,947 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9800, loss[loss=0.09145, simple_loss=0.1151, pruned_loss=0.02454, audio_tagging_loss=0.009367, over 14412.00 frames. ], tot_loss[loss=0.09279, simple_loss=0.1107, pruned_loss=0.02675, audio_tagging_loss=0.01071, over 3047071.75 frames. ], batch size: 55, lr: 9.64e-03, grad_scale: 32.0 2023-11-19 03:32:19,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=546326.6666666666, ans=0.0 2023-11-19 03:32:27,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.56 vs. limit=15.0 2023-11-19 03:32:28,304 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.08 vs. limit=15.0 2023-11-19 03:32:35,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=546460.0, ans=0.125 2023-11-19 03:32:43,159 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.045e+01 8.602e+01 9.393e+01 1.096e+02 1.685e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 03:32:50,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=546526.6666666666, ans=0.125 2023-11-19 03:32:52,701 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:32:57,966 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9850, loss[loss=0.08823, simple_loss=0.1061, pruned_loss=0.02523, audio_tagging_loss=0.009951, over 14848.00 frames. ], tot_loss[loss=0.09363, simple_loss=0.1118, pruned_loss=0.0272, audio_tagging_loss=0.01052, over 3048911.75 frames. ], batch size: 56, lr: 9.64e-03, grad_scale: 32.0 2023-11-19 03:33:22,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=546726.6666666666, ans=0.125 2023-11-19 03:33:50,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=546860.0, ans=0.0 2023-11-19 03:33:53,973 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9900, loss[loss=0.0811, simple_loss=0.09724, pruned_loss=0.02109, audio_tagging_loss=0.01139, over 15396.00 frames. ], tot_loss[loss=0.09336, simple_loss=0.1113, pruned_loss=0.02712, audio_tagging_loss=0.01057, over 3041686.05 frames. ], batch size: 57, lr: 9.64e-03, grad_scale: 32.0 2023-11-19 03:33:55,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=546926.6666666666, ans=0.125 2023-11-19 03:34:04,971 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.60 vs. limit=15.0 2023-11-19 03:34:22,059 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=547060.0, ans=0.0 2023-11-19 03:34:23,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=547060.0, ans=0.0 2023-11-19 03:34:25,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=547060.0, ans=0.125 2023-11-19 03:34:26,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=547126.6666666666, ans=0.0 2023-11-19 03:34:34,606 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.700e+01 9.311e+01 1.023e+02 1.421e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-19 03:34:49,696 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=547260.0, ans=0.125 2023-11-19 03:34:50,621 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 9950, loss[loss=0.08883, simple_loss=0.1075, pruned_loss=0.02615, audio_tagging_loss=0.008928, over 15581.00 frames. ], tot_loss[loss=0.09171, simple_loss=0.1093, pruned_loss=0.02639, audio_tagging_loss=0.01065, over 3048236.77 frames. ], batch size: 58, lr: 9.64e-03, grad_scale: 16.0 2023-11-19 03:34:52,062 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=15.0 2023-11-19 03:35:10,113 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.81 vs. limit=22.5 2023-11-19 03:35:30,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=547460.0, ans=0.2 2023-11-19 03:35:36,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=547526.6666666666, ans=0.125 2023-11-19 03:35:44,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=547593.3333333334, ans=0.125 2023-11-19 03:35:45,520 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10000, loss[loss=0.07586, simple_loss=0.09192, pruned_loss=0.01899, audio_tagging_loss=0.01091, over 14450.00 frames. ], tot_loss[loss=0.0925, simple_loss=0.1107, pruned_loss=0.02669, audio_tagging_loss=0.01046, over 3046336.61 frames. ], batch size: 54, lr: 9.63e-03, grad_scale: 32.0 2023-11-19 03:36:26,884 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.019e+01 8.651e+01 9.520e+01 1.034e+02 1.455e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 03:36:40,557 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10050, loss[loss=0.1157, simple_loss=0.1247, pruned_loss=0.04187, audio_tagging_loss=0.01147, over 14512.00 frames. ], tot_loss[loss=0.09195, simple_loss=0.1097, pruned_loss=0.02651, audio_tagging_loss=0.01058, over 3051781.84 frames. ], batch size: 53, lr: 9.63e-03, grad_scale: 32.0 2023-11-19 03:36:45,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=547926.6666666666, ans=0.0 2023-11-19 03:36:51,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.prob, batch_count=547993.3333333334, ans=0.125 2023-11-19 03:37:10,115 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.96 vs. limit=15.0 2023-11-19 03:37:12,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=548060.0, ans=10.0 2023-11-19 03:37:20,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=548126.6666666666, ans=0.0 2023-11-19 03:37:22,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=548126.6666666666, ans=0.0 2023-11-19 03:37:36,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=548260.0, ans=0.0 2023-11-19 03:37:37,701 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10100, loss[loss=0.09838, simple_loss=0.1197, pruned_loss=0.02802, audio_tagging_loss=0.01049, over 15373.00 frames. ], tot_loss[loss=0.09186, simple_loss=0.1096, pruned_loss=0.02649, audio_tagging_loss=0.01059, over 3048322.39 frames. ], batch size: 56, lr: 9.63e-03, grad_scale: 32.0 2023-11-19 03:37:55,438 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:38:12,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=548460.0, ans=0.125 2023-11-19 03:38:14,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=548460.0, ans=0.0 2023-11-19 03:38:18,428 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.744e+01 8.585e+01 9.588e+01 1.090e+02 1.708e+02, threshold=1.918e+02, percent-clipped=0.0 2023-11-19 03:38:23,269 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:38:32,764 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10150, loss[loss=0.1015, simple_loss=0.1232, pruned_loss=0.03111, audio_tagging_loss=0.008801, over 15775.00 frames. ], tot_loss[loss=0.09171, simple_loss=0.1093, pruned_loss=0.02629, audio_tagging_loss=0.01078, over 3054212.42 frames. ], batch size: 58, lr: 9.62e-03, grad_scale: 32.0 2023-11-19 03:38:44,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=548660.0, ans=0.125 2023-11-19 03:38:45,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=548660.0, ans=0.125 2023-11-19 03:38:55,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=548726.6666666666, ans=0.0 2023-11-19 03:38:59,170 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:39:05,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=548793.3333333334, ans=0.125 2023-11-19 03:39:12,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=548793.3333333334, ans=0.1 2023-11-19 03:39:19,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=548860.0, ans=0.0 2023-11-19 03:39:20,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=548860.0, ans=0.2 2023-11-19 03:39:27,543 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10200, loss[loss=0.1083, simple_loss=0.1259, pruned_loss=0.03, audio_tagging_loss=0.01535, over 15034.00 frames. ], tot_loss[loss=0.09248, simple_loss=0.1099, pruned_loss=0.02664, audio_tagging_loss=0.0109, over 3053266.52 frames. ], batch size: 56, lr: 9.62e-03, grad_scale: 32.0 2023-11-19 03:39:38,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=548993.3333333334, ans=0.125 2023-11-19 03:39:49,369 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:39:51,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=549060.0, ans=0.1 2023-11-19 03:39:56,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=549060.0, ans=0.0 2023-11-19 03:40:02,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=549126.6666666666, ans=0.125 2023-11-19 03:40:06,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=549126.6666666666, ans=0.125 2023-11-19 03:40:08,249 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.328e+01 8.839e+01 9.897e+01 1.124e+02 1.590e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-19 03:40:09,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=549126.6666666666, ans=0.125 2023-11-19 03:40:20,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=549193.3333333334, ans=0.125 2023-11-19 03:40:22,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=549260.0, ans=0.125 2023-11-19 03:40:23,227 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10250, loss[loss=0.1002, simple_loss=0.1085, pruned_loss=0.03299, audio_tagging_loss=0.01301, over 14591.00 frames. ], tot_loss[loss=0.09268, simple_loss=0.1096, pruned_loss=0.02675, audio_tagging_loss=0.01111, over 3054642.38 frames. ], batch size: 57, lr: 9.62e-03, grad_scale: 32.0 2023-11-19 03:40:27,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.86 vs. limit=15.0 2023-11-19 03:40:35,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.min_positive, batch_count=549326.6666666666, ans=0.025 2023-11-19 03:40:35,555 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:40:39,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=549326.6666666666, ans=0.125 2023-11-19 03:40:42,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=549326.6666666666, ans=0.0 2023-11-19 03:40:43,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=549326.6666666666, ans=0.125 2023-11-19 03:40:48,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=549393.3333333334, ans=0.1 2023-11-19 03:40:53,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=549393.3333333334, ans=0.125 2023-11-19 03:40:55,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=549460.0, ans=0.1 2023-11-19 03:41:19,416 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10300, loss[loss=0.09933, simple_loss=0.1058, pruned_loss=0.02816, audio_tagging_loss=0.01825, over 14685.00 frames. ], tot_loss[loss=0.09344, simple_loss=0.1109, pruned_loss=0.02698, audio_tagging_loss=0.01101, over 3057381.44 frames. ], batch size: 55, lr: 9.61e-03, grad_scale: 32.0 2023-11-19 03:41:19,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=549593.3333333334, ans=0.1 2023-11-19 03:41:21,192 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.01 vs. limit=15.0 2023-11-19 03:41:34,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=549660.0, ans=0.125 2023-11-19 03:41:36,436 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:42:00,391 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.361e+01 8.478e+01 9.203e+01 9.958e+01 1.173e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 03:42:04,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=549860.0, ans=0.0 2023-11-19 03:42:04,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=549860.0, ans=0.0 2023-11-19 03:42:14,017 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10350, loss[loss=0.1076, simple_loss=0.1182, pruned_loss=0.03836, audio_tagging_loss=0.01015, over 14445.00 frames. ], tot_loss[loss=0.09377, simple_loss=0.1112, pruned_loss=0.02713, audio_tagging_loss=0.01102, over 3055020.36 frames. ], batch size: 56, lr: 9.61e-03, grad_scale: 32.0 2023-11-19 03:42:14,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=549926.6666666666, ans=0.1 2023-11-19 03:42:14,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.57 vs. limit=22.5 2023-11-19 03:42:31,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=549993.3333333334, ans=0.125 2023-11-19 03:42:31,906 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.99 vs. limit=6.0 2023-11-19 03:42:34,559 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.24 vs. limit=10.0 2023-11-19 03:42:36,251 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:42:46,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=550126.6666666666, ans=0.125 2023-11-19 03:42:52,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.min_positive, batch_count=550126.6666666666, ans=0.05 2023-11-19 03:43:08,766 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10400, loss[loss=0.07617, simple_loss=0.08935, pruned_loss=0.01984, audio_tagging_loss=0.01166, over 15680.00 frames. ], tot_loss[loss=0.09379, simple_loss=0.1111, pruned_loss=0.02708, audio_tagging_loss=0.01114, over 3051980.67 frames. ], batch size: 62, lr: 9.61e-03, grad_scale: 32.0 2023-11-19 03:43:23,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=550326.6666666666, ans=0.0 2023-11-19 03:43:43,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=550460.0, ans=0.0 2023-11-19 03:43:51,065 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.078e+01 8.573e+01 9.410e+01 1.023e+02 1.490e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-19 03:44:04,823 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10450, loss[loss=0.09388, simple_loss=0.1177, pruned_loss=0.02771, audio_tagging_loss=0.007305, over 14811.00 frames. ], tot_loss[loss=0.09342, simple_loss=0.1109, pruned_loss=0.02688, audio_tagging_loss=0.01111, over 3054428.25 frames. ], batch size: 53, lr: 9.61e-03, grad_scale: 32.0 2023-11-19 03:44:19,303 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.06 vs. limit=22.5 2023-11-19 03:44:25,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=550726.6666666666, ans=0.0 2023-11-19 03:44:46,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=550793.3333333334, ans=0.0 2023-11-19 03:44:59,702 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10500, loss[loss=0.1076, simple_loss=0.125, pruned_loss=0.03352, audio_tagging_loss=0.01161, over 15860.00 frames. ], tot_loss[loss=0.09343, simple_loss=0.111, pruned_loss=0.02693, audio_tagging_loss=0.01099, over 3059284.50 frames. ], batch size: 58, lr: 9.60e-03, grad_scale: 32.0 2023-11-19 03:45:06,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=550926.6666666666, ans=0.125 2023-11-19 03:45:07,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=550926.6666666666, ans=0.125 2023-11-19 03:45:30,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=551060.0, ans=0.035 2023-11-19 03:45:41,934 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.670e+01 8.439e+01 9.051e+01 1.036e+02 1.339e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 03:45:55,189 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10550, loss[loss=0.05286, simple_loss=0.06467, pruned_loss=0.01013, audio_tagging_loss=0.01039, over 13988.00 frames. ], tot_loss[loss=0.09253, simple_loss=0.1098, pruned_loss=0.02665, audio_tagging_loss=0.01097, over 3056117.84 frames. ], batch size: 54, lr: 9.60e-03, grad_scale: 32.0 2023-11-19 03:46:12,500 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=6.88 vs. limit=12.0 2023-11-19 03:46:30,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=551460.0, ans=0.0 2023-11-19 03:46:34,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=551460.0, ans=0.0 2023-11-19 03:46:51,181 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10600, loss[loss=0.1037, simple_loss=0.132, pruned_loss=0.02952, audio_tagging_loss=0.008186, over 15632.00 frames. ], tot_loss[loss=0.09252, simple_loss=0.11, pruned_loss=0.02677, audio_tagging_loss=0.01076, over 3051153.50 frames. ], batch size: 55, lr: 9.60e-03, grad_scale: 32.0 2023-11-19 03:47:26,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=551793.3333333334, ans=0.05 2023-11-19 03:47:27,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=551793.3333333334, ans=0.1 2023-11-19 03:47:33,794 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.037e+01 8.515e+01 9.253e+01 1.023e+02 1.317e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-19 03:47:47,255 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10650, loss[loss=0.05668, simple_loss=0.05774, pruned_loss=0.01492, audio_tagging_loss=0.01289, over 14507.00 frames. ], tot_loss[loss=0.09219, simple_loss=0.1099, pruned_loss=0.02654, audio_tagging_loss=0.0107, over 3048390.66 frames. ], batch size: 55, lr: 9.59e-03, grad_scale: 32.0 2023-11-19 03:47:48,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=551926.6666666666, ans=0.09899494936611666 2023-11-19 03:47:51,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=551926.6666666666, ans=0.125 2023-11-19 03:47:51,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=551926.6666666666, ans=0.2 2023-11-19 03:48:11,285 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.94 vs. limit=15.0 2023-11-19 03:48:28,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=552126.6666666666, ans=0.1 2023-11-19 03:48:43,086 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10700, loss[loss=0.1052, simple_loss=0.1222, pruned_loss=0.03072, audio_tagging_loss=0.01339, over 15160.00 frames. ], tot_loss[loss=0.09235, simple_loss=0.1103, pruned_loss=0.02662, audio_tagging_loss=0.0106, over 3049811.61 frames. ], batch size: 57, lr: 9.59e-03, grad_scale: 32.0 2023-11-19 03:49:16,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=552460.0, ans=0.125 2023-11-19 03:49:25,174 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.213e+01 8.579e+01 9.318e+01 1.032e+02 2.166e+02, threshold=1.864e+02, percent-clipped=1.0 2023-11-19 03:49:33,262 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.80 vs. limit=22.5 2023-11-19 03:49:33,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=552526.6666666666, ans=0.125 2023-11-19 03:49:36,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=552526.6666666666, ans=0.05 2023-11-19 03:49:38,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=552593.3333333334, ans=0.5 2023-11-19 03:49:39,635 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10750, loss[loss=0.0884, simple_loss=0.1085, pruned_loss=0.02569, audio_tagging_loss=0.008471, over 16308.00 frames. ], tot_loss[loss=0.09196, simple_loss=0.1096, pruned_loss=0.02652, audio_tagging_loss=0.01066, over 3054169.42 frames. ], batch size: 59, lr: 9.59e-03, grad_scale: 32.0 2023-11-19 03:49:55,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=552660.0, ans=0.1 2023-11-19 03:50:17,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=552793.3333333334, ans=0.125 2023-11-19 03:50:19,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=552793.3333333334, ans=0.0 2023-11-19 03:50:22,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=552793.3333333334, ans=0.07 2023-11-19 03:50:28,857 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=13.75 vs. limit=15.0 2023-11-19 03:50:34,596 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10800, loss[loss=0.0828, simple_loss=0.08818, pruned_loss=0.02687, audio_tagging_loss=0.01183, over 15956.00 frames. ], tot_loss[loss=0.09175, simple_loss=0.1092, pruned_loss=0.02649, audio_tagging_loss=0.01068, over 3054120.83 frames. ], batch size: 60, lr: 9.59e-03, grad_scale: 32.0 2023-11-19 03:51:00,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=553060.0, ans=0.125 2023-11-19 03:51:10,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=553126.6666666666, ans=0.125 2023-11-19 03:51:16,715 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.479e+01 8.594e+01 9.337e+01 1.055e+02 1.336e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-19 03:51:30,085 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10850, loss[loss=0.08812, simple_loss=0.1102, pruned_loss=0.02461, audio_tagging_loss=0.008414, over 14994.00 frames. ], tot_loss[loss=0.09197, simple_loss=0.1097, pruned_loss=0.02652, audio_tagging_loss=0.01061, over 3053884.81 frames. ], batch size: 57, lr: 9.58e-03, grad_scale: 32.0 2023-11-19 03:51:31,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=553260.0, ans=0.125 2023-11-19 03:51:31,349 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=553260.0, ans=0.125 2023-11-19 03:51:52,932 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.38 vs. limit=15.0 2023-11-19 03:52:03,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=553460.0, ans=0.1 2023-11-19 03:52:24,133 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:52:27,321 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10900, loss[loss=0.08455, simple_loss=0.1013, pruned_loss=0.02451, audio_tagging_loss=0.009392, over 14903.00 frames. ], tot_loss[loss=0.09175, simple_loss=0.1094, pruned_loss=0.02634, audio_tagging_loss=0.0107, over 3052579.28 frames. ], batch size: 57, lr: 9.58e-03, grad_scale: 32.0 2023-11-19 03:52:43,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=553660.0, ans=0.0 2023-11-19 03:53:09,521 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.290e+01 8.895e+01 9.757e+01 1.197e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-19 03:53:15,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=553860.0, ans=0.125 2023-11-19 03:53:17,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten.whitening_limit, batch_count=553860.0, ans=22.5 2023-11-19 03:53:22,211 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 10950, loss[loss=0.08867, simple_loss=0.1064, pruned_loss=0.02453, audio_tagging_loss=0.01094, over 15414.00 frames. ], tot_loss[loss=0.09158, simple_loss=0.1091, pruned_loss=0.02633, audio_tagging_loss=0.01072, over 3054911.29 frames. ], batch size: 56, lr: 9.58e-03, grad_scale: 32.0 2023-11-19 03:53:27,910 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.26 vs. limit=10.0 2023-11-19 03:53:41,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=553993.3333333334, ans=0.0 2023-11-19 03:53:52,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=554060.0, ans=0.125 2023-11-19 03:53:57,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=554126.6666666666, ans=0.0 2023-11-19 03:53:59,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.62 vs. limit=15.0 2023-11-19 03:54:09,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=554193.3333333334, ans=0.125 2023-11-19 03:54:16,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=554260.0, ans=0.95 2023-11-19 03:54:17,519 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11000, loss[loss=0.08703, simple_loss=0.09712, pruned_loss=0.02214, audio_tagging_loss=0.01634, over 15549.00 frames. ], tot_loss[loss=0.09197, simple_loss=0.1097, pruned_loss=0.02634, audio_tagging_loss=0.0108, over 3054228.43 frames. ], batch size: 58, lr: 9.57e-03, grad_scale: 32.0 2023-11-19 03:54:23,226 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.59 vs. limit=10.0 2023-11-19 03:54:26,555 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 03:54:27,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=554326.6666666666, ans=0.125 2023-11-19 03:54:44,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=554393.3333333334, ans=0.125 2023-11-19 03:54:51,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=554460.0, ans=0.125 2023-11-19 03:54:56,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=554460.0, ans=0.125 2023-11-19 03:54:58,748 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.653e+01 8.672e+01 9.432e+01 1.068e+02 1.333e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-19 03:55:13,613 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11050, loss[loss=0.07177, simple_loss=0.08736, pruned_loss=0.01672, audio_tagging_loss=0.01137, over 15139.00 frames. ], tot_loss[loss=0.09233, simple_loss=0.1099, pruned_loss=0.02653, audio_tagging_loss=0.01086, over 3055814.49 frames. ], batch size: 57, lr: 9.57e-03, grad_scale: 32.0 2023-11-19 03:55:16,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=554593.3333333334, ans=0.0 2023-11-19 03:55:18,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.94 vs. limit=12.0 2023-11-19 03:55:27,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=554660.0, ans=0.1 2023-11-19 03:55:46,825 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.13 vs. limit=15.0 2023-11-19 03:55:54,101 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.46 vs. limit=15.0 2023-11-19 03:55:57,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=554860.0, ans=0.0 2023-11-19 03:56:08,866 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11100, loss[loss=0.08893, simple_loss=0.1091, pruned_loss=0.02461, audio_tagging_loss=0.009759, over 14064.00 frames. ], tot_loss[loss=0.09178, simple_loss=0.1089, pruned_loss=0.0263, audio_tagging_loss=0.01105, over 3045711.71 frames. ], batch size: 54, lr: 9.57e-03, grad_scale: 32.0 2023-11-19 03:56:09,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=22.5 2023-11-19 03:56:34,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=555060.0, ans=0.125 2023-11-19 03:56:35,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=555060.0, ans=0.2 2023-11-19 03:56:36,218 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.28 vs. limit=22.5 2023-11-19 03:56:40,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=555060.0, ans=0.0 2023-11-19 03:56:45,263 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.74 vs. limit=10.0 2023-11-19 03:56:48,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=555126.6666666666, ans=0.0 2023-11-19 03:56:48,258 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:56:51,124 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.615e+01 9.620e+01 1.023e+02 1.432e+02, threshold=1.924e+02, percent-clipped=0.0 2023-11-19 03:56:53,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=555193.3333333334, ans=0.2 2023-11-19 03:56:54,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=555193.3333333334, ans=0.0 2023-11-19 03:57:00,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=555193.3333333334, ans=0.125 2023-11-19 03:57:00,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=555193.3333333334, ans=0.125 2023-11-19 03:57:03,796 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11150, loss[loss=0.1117, simple_loss=0.1356, pruned_loss=0.033, audio_tagging_loss=0.01086, over 15848.00 frames. ], tot_loss[loss=0.09245, simple_loss=0.1094, pruned_loss=0.02663, audio_tagging_loss=0.01111, over 3055937.12 frames. ], batch size: 59, lr: 9.57e-03, grad_scale: 32.0 2023-11-19 03:57:19,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=555326.6666666666, ans=0.125 2023-11-19 03:57:19,499 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.48 vs. limit=15.0 2023-11-19 03:57:25,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=555393.3333333334, ans=10.0 2023-11-19 03:57:29,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=555393.3333333334, ans=0.125 2023-11-19 03:57:59,487 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11200, loss[loss=0.09129, simple_loss=0.1055, pruned_loss=0.02831, audio_tagging_loss=0.01023, over 14229.00 frames. ], tot_loss[loss=0.09286, simple_loss=0.1098, pruned_loss=0.02677, audio_tagging_loss=0.0112, over 3053897.65 frames. ], batch size: 54, lr: 9.56e-03, grad_scale: 32.0 2023-11-19 03:58:09,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=555660.0, ans=10.0 2023-11-19 03:58:11,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=555660.0, ans=0.125 2023-11-19 03:58:18,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=555660.0, ans=0.1 2023-11-19 03:58:41,679 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.070e+01 8.552e+01 9.021e+01 1.004e+02 1.285e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 03:58:44,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=555860.0, ans=0.125 2023-11-19 03:58:52,320 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2023-11-19 03:58:55,029 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11250, loss[loss=0.09633, simple_loss=0.1228, pruned_loss=0.02467, audio_tagging_loss=0.01027, over 16149.00 frames. ], tot_loss[loss=0.09127, simple_loss=0.1077, pruned_loss=0.02611, audio_tagging_loss=0.01132, over 3043482.74 frames. ], batch size: 60, lr: 9.56e-03, grad_scale: 32.0 2023-11-19 03:59:08,627 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2023-11-19 03:59:27,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=556060.0, ans=0.1 2023-11-19 03:59:27,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=556060.0, ans=0.125 2023-11-19 03:59:30,959 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.14 vs. limit=22.5 2023-11-19 03:59:37,885 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 03:59:38,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=556193.3333333334, ans=0.025 2023-11-19 03:59:50,344 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11300, loss[loss=0.1163, simple_loss=0.1531, pruned_loss=0.03539, audio_tagging_loss=0.004292, over 15861.00 frames. ], tot_loss[loss=0.09167, simple_loss=0.1085, pruned_loss=0.02638, audio_tagging_loss=0.01105, over 3042098.96 frames. ], batch size: 57, lr: 9.56e-03, grad_scale: 32.0 2023-11-19 04:00:18,993 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.32 vs. limit=15.0 2023-11-19 04:00:19,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.59 vs. limit=12.0 2023-11-19 04:00:26,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=556460.0, ans=0.0 2023-11-19 04:00:26,418 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.00 vs. limit=10.0 2023-11-19 04:00:30,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=556460.0, ans=0.04949747468305833 2023-11-19 04:00:31,479 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:00:32,261 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 8.772e+01 9.510e+01 1.073e+02 1.316e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 04:00:34,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=556526.6666666666, ans=0.125 2023-11-19 04:00:46,037 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11350, loss[loss=0.06943, simple_loss=0.07226, pruned_loss=0.01748, audio_tagging_loss=0.01582, over 14824.00 frames. ], tot_loss[loss=0.09154, simple_loss=0.1085, pruned_loss=0.02632, audio_tagging_loss=0.01099, over 3039876.54 frames. ], batch size: 58, lr: 9.55e-03, grad_scale: 32.0 2023-11-19 04:00:46,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=556593.3333333334, ans=0.2 2023-11-19 04:00:49,323 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.99 vs. limit=12.0 2023-11-19 04:01:07,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.32 vs. limit=15.0 2023-11-19 04:01:17,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=556793.3333333334, ans=0.0 2023-11-19 04:01:40,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=556926.6666666666, ans=0.2 2023-11-19 04:01:41,531 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11400, loss[loss=0.09248, simple_loss=0.1142, pruned_loss=0.02596, audio_tagging_loss=0.009424, over 15097.00 frames. ], tot_loss[loss=0.0916, simple_loss=0.1087, pruned_loss=0.02638, audio_tagging_loss=0.01086, over 3035247.02 frames. ], batch size: 55, lr: 9.55e-03, grad_scale: 32.0 2023-11-19 04:01:44,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=556926.6666666666, ans=0.2 2023-11-19 04:02:19,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=557126.6666666666, ans=0.0 2023-11-19 04:02:23,539 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.302e+01 8.621e+01 9.378e+01 1.036e+02 2.217e+02, threshold=1.876e+02, percent-clipped=1.0 2023-11-19 04:02:36,359 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11450, loss[loss=0.09316, simple_loss=0.1216, pruned_loss=0.02475, audio_tagging_loss=0.007628, over 14918.00 frames. ], tot_loss[loss=0.09192, simple_loss=0.1089, pruned_loss=0.02657, audio_tagging_loss=0.01088, over 3034104.41 frames. ], batch size: 52, lr: 9.55e-03, grad_scale: 32.0 2023-11-19 04:02:45,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=557260.0, ans=0.125 2023-11-19 04:02:45,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=557260.0, ans=0.125 2023-11-19 04:02:58,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=557393.3333333334, ans=0.2 2023-11-19 04:03:17,029 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.37 vs. limit=10.0 2023-11-19 04:03:32,398 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11500, loss[loss=0.07846, simple_loss=0.09806, pruned_loss=0.01917, audio_tagging_loss=0.01026, over 14530.00 frames. ], tot_loss[loss=0.09175, simple_loss=0.1088, pruned_loss=0.02655, audio_tagging_loss=0.01082, over 3038369.03 frames. ], batch size: 53, lr: 9.55e-03, grad_scale: 16.0 2023-11-19 04:03:38,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=557593.3333333334, ans=0.0 2023-11-19 04:03:47,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=557660.0, ans=0.125 2023-11-19 04:03:58,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=557726.6666666666, ans=0.025 2023-11-19 04:04:15,904 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.885e+01 8.787e+01 9.889e+01 1.125e+02 1.791e+02, threshold=1.978e+02, percent-clipped=0.0 2023-11-19 04:04:20,880 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=557860.0, ans=0.125 2023-11-19 04:04:28,847 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11550, loss[loss=0.1367, simple_loss=0.1743, pruned_loss=0.04357, audio_tagging_loss=0.005955, over 14547.00 frames. ], tot_loss[loss=0.09197, simple_loss=0.1087, pruned_loss=0.02675, audio_tagging_loss=0.01089, over 3039642.06 frames. ], batch size: 53, lr: 9.54e-03, grad_scale: 16.0 2023-11-19 04:04:49,244 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.26 vs. limit=6.0 2023-11-19 04:04:49,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=558060.0, ans=0.125 2023-11-19 04:04:57,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=558060.0, ans=0.125 2023-11-19 04:05:02,031 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:05:04,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=558126.6666666666, ans=0.125 2023-11-19 04:05:06,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=558126.6666666666, ans=0.125 2023-11-19 04:05:16,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff2.min_abs, batch_count=558193.3333333334, ans=0.1 2023-11-19 04:05:23,637 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11600, loss[loss=0.07164, simple_loss=0.08323, pruned_loss=0.01984, audio_tagging_loss=0.01018, over 15554.00 frames. ], tot_loss[loss=0.09223, simple_loss=0.1094, pruned_loss=0.02676, audio_tagging_loss=0.01077, over 3047612.93 frames. ], batch size: 59, lr: 9.54e-03, grad_scale: 32.0 2023-11-19 04:05:23,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=558260.0, ans=0.125 2023-11-19 04:05:30,594 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.67 vs. limit=15.0 2023-11-19 04:05:52,450 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.22 vs. limit=15.0 2023-11-19 04:06:06,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=558460.0, ans=0.1 2023-11-19 04:06:07,038 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.984e+01 8.676e+01 9.320e+01 1.048e+02 1.345e+02, threshold=1.864e+02, percent-clipped=0.0 2023-11-19 04:06:14,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=558526.6666666666, ans=0.0 2023-11-19 04:06:18,607 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11650, loss[loss=0.11, simple_loss=0.131, pruned_loss=0.03613, audio_tagging_loss=0.008375, over 15437.00 frames. ], tot_loss[loss=0.09182, simple_loss=0.1087, pruned_loss=0.02662, audio_tagging_loss=0.01083, over 3042472.71 frames. ], batch size: 58, lr: 9.54e-03, grad_scale: 32.0 2023-11-19 04:06:35,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=558660.0, ans=0.07 2023-11-19 04:06:37,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=558660.0, ans=0.09899494936611666 2023-11-19 04:06:40,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=558726.6666666666, ans=0.1 2023-11-19 04:07:02,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=558860.0, ans=0.125 2023-11-19 04:07:14,576 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11700, loss[loss=0.08085, simple_loss=0.0917, pruned_loss=0.02412, audio_tagging_loss=0.01088, over 15574.00 frames. ], tot_loss[loss=0.09151, simple_loss=0.1083, pruned_loss=0.02653, audio_tagging_loss=0.01082, over 3041169.43 frames. ], batch size: 59, lr: 9.53e-03, grad_scale: 32.0 2023-11-19 04:07:31,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=558993.3333333334, ans=0.1 2023-11-19 04:07:38,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=559060.0, ans=0.0 2023-11-19 04:07:40,681 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=559060.0, ans=0.0 2023-11-19 04:07:48,725 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=5.187e-01 2023-11-19 04:07:57,443 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.729e+01 8.689e+01 9.449e+01 1.084e+02 2.126e+02, threshold=1.890e+02, percent-clipped=1.0 2023-11-19 04:08:09,619 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11750, loss[loss=0.07318, simple_loss=0.08478, pruned_loss=0.02006, audio_tagging_loss=0.01073, over 14130.00 frames. ], tot_loss[loss=0.09073, simple_loss=0.1074, pruned_loss=0.02618, audio_tagging_loss=0.01085, over 3038260.10 frames. ], batch size: 53, lr: 9.53e-03, grad_scale: 32.0 2023-11-19 04:09:03,943 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11800, loss[loss=0.1021, simple_loss=0.1231, pruned_loss=0.03095, audio_tagging_loss=0.00965, over 13952.00 frames. ], tot_loss[loss=0.09106, simple_loss=0.1074, pruned_loss=0.02632, audio_tagging_loss=0.01103, over 3037986.80 frames. ], batch size: 56, lr: 9.53e-03, grad_scale: 32.0 2023-11-19 04:09:06,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=559593.3333333334, ans=0.125 2023-11-19 04:09:09,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=559593.3333333334, ans=0.0 2023-11-19 04:09:11,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=559593.3333333334, ans=0.0 2023-11-19 04:09:12,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=559593.3333333334, ans=0.1 2023-11-19 04:09:22,336 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.18 vs. limit=15.0 2023-11-19 04:09:23,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=559660.0, ans=0.125 2023-11-19 04:09:24,276 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:09:29,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=559726.6666666666, ans=0.125 2023-11-19 04:09:46,610 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.370e+01 8.768e+01 9.397e+01 1.015e+02 1.463e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 04:09:59,778 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11850, loss[loss=0.1011, simple_loss=0.1183, pruned_loss=0.03336, audio_tagging_loss=0.008589, over 16457.00 frames. ], tot_loss[loss=0.09192, simple_loss=0.1087, pruned_loss=0.02655, audio_tagging_loss=0.011, over 3041348.42 frames. ], batch size: 61, lr: 9.53e-03, grad_scale: 32.0 2023-11-19 04:10:07,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=559926.6666666666, ans=0.0 2023-11-19 04:10:10,574 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-84000.pt 2023-11-19 04:10:41,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=560126.6666666666, ans=0.0 2023-11-19 04:10:45,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=560193.3333333334, ans=0.125 2023-11-19 04:10:47,847 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.62 vs. limit=15.0 2023-11-19 04:10:56,862 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11900, loss[loss=0.07871, simple_loss=0.09974, pruned_loss=0.01833, audio_tagging_loss=0.01052, over 15681.00 frames. ], tot_loss[loss=0.09195, simple_loss=0.1086, pruned_loss=0.02657, audio_tagging_loss=0.01108, over 3039769.14 frames. ], batch size: 60, lr: 9.52e-03, grad_scale: 32.0 2023-11-19 04:11:10,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=560326.6666666666, ans=0.125 2023-11-19 04:11:13,901 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.72 vs. limit=15.0 2023-11-19 04:11:22,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=560393.3333333334, ans=0.0 2023-11-19 04:11:24,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=560393.3333333334, ans=0.0 2023-11-19 04:11:26,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=560393.3333333334, ans=0.2 2023-11-19 04:11:27,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=560393.3333333334, ans=0.0 2023-11-19 04:11:28,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=560393.3333333334, ans=0.125 2023-11-19 04:11:28,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=560393.3333333334, ans=0.125 2023-11-19 04:11:34,039 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:11:34,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=560460.0, ans=0.0 2023-11-19 04:11:38,445 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.87 vs. limit=15.0 2023-11-19 04:11:40,164 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.784e+01 9.525e+01 1.032e+02 1.390e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-19 04:11:41,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=560526.6666666666, ans=0.2 2023-11-19 04:11:47,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=560526.6666666666, ans=0.125 2023-11-19 04:11:52,356 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 11950, loss[loss=0.07615, simple_loss=0.08571, pruned_loss=0.01966, audio_tagging_loss=0.01363, over 15129.00 frames. ], tot_loss[loss=0.09154, simple_loss=0.1079, pruned_loss=0.02632, audio_tagging_loss=0.01127, over 3044269.01 frames. ], batch size: 57, lr: 9.52e-03, grad_scale: 32.0 2023-11-19 04:12:03,481 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=17.45 vs. limit=22.5 2023-11-19 04:12:45,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=560926.6666666666, ans=0.125 2023-11-19 04:12:46,244 INFO [train_asr.py:1115] (0/4) Epoch 7, batch 12000, loss[loss=0.1213, simple_loss=0.1442, pruned_loss=0.03702, audio_tagging_loss=0.01216, over 15397.00 frames. ], tot_loss[loss=0.09271, simple_loss=0.1093, pruned_loss=0.02677, audio_tagging_loss=0.01131, over 3042278.37 frames. ], batch size: 57, lr: 9.52e-03, grad_scale: 32.0 2023-11-19 04:12:46,246 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 04:12:59,049 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.4.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.1517, 4.8268, 3.6209, 3.9981], device='cuda:0') 2023-11-19 04:13:04,720 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([0.8480, 3.5972, 2.8179, 2.9224, 3.7774, 3.6651, 3.1615, 3.6581], device='cuda:0') 2023-11-19 04:13:15,036 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.6389, 4.1394, 3.6901, 2.9241], device='cuda:0') 2023-11-19 04:13:19,250 INFO [train_asr.py:1147] (0/4) Epoch 7, validation: loss=0.0682, simple_loss=0.05751, pruned_loss=0.007422, audio_tagging_loss=0.03202, over 4681554.00 frames. 2023-11-19 04:13:19,251 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 04:13:43,665 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-7.pt 2023-11-19 04:14:20,128 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 0, loss[loss=0.09405, simple_loss=0.105, pruned_loss=0.0202, audio_tagging_loss=0.02134, over 14641.00 frames. ], tot_loss[loss=0.09405, simple_loss=0.105, pruned_loss=0.0202, audio_tagging_loss=0.02134, over 14641.00 frames. ], batch size: 54, lr: 8.97e-03, grad_scale: 32.0 2023-11-19 04:14:20,130 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 04:14:51,786 INFO [train_asr.py:1147] (0/4) Epoch 8, validation: loss=0.06722, simple_loss=0.05736, pruned_loss=0.007334, audio_tagging_loss=0.0312, over 4681554.00 frames. 2023-11-19 04:14:51,786 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 04:15:10,733 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.068e+01 9.365e+01 1.076e+02 1.160e+02 2.715e+02, threshold=2.151e+02, percent-clipped=1.0 2023-11-19 04:15:10,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=561146.6666666666, ans=0.125 2023-11-19 04:15:32,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=561280.0, ans=0.0 2023-11-19 04:15:47,564 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 50, loss[loss=0.1004, simple_loss=0.1075, pruned_loss=0.02695, audio_tagging_loss=0.01976, over 15558.00 frames. ], tot_loss[loss=0.1037, simple_loss=0.112, pruned_loss=0.02703, audio_tagging_loss=0.02064, over 691529.77 frames. ], batch size: 57, lr: 8.96e-03, grad_scale: 32.0 2023-11-19 04:16:00,951 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.03 vs. limit=22.5 2023-11-19 04:16:11,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=561546.6666666666, ans=0.125 2023-11-19 04:16:29,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=561613.3333333334, ans=0.2 2023-11-19 04:16:43,675 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 100, loss[loss=0.07187, simple_loss=0.06243, pruned_loss=0.0163, audio_tagging_loss=0.02435, over 14499.00 frames. ], tot_loss[loss=0.1009, simple_loss=0.1092, pruned_loss=0.02595, audio_tagging_loss=0.0203, over 1208889.88 frames. ], batch size: 55, lr: 8.96e-03, grad_scale: 32.0 2023-11-19 04:16:53,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=561813.3333333334, ans=0.125 2023-11-19 04:16:55,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=561813.3333333334, ans=0.1 2023-11-19 04:17:02,330 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.971e+01 9.025e+01 9.629e+01 1.101e+02 1.552e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-19 04:17:15,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=561880.0, ans=0.1 2023-11-19 04:17:16,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=561946.6666666666, ans=0.125 2023-11-19 04:17:17,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=561946.6666666666, ans=0.125 2023-11-19 04:17:22,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=561946.6666666666, ans=0.0 2023-11-19 04:17:23,376 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=561946.6666666666, ans=0.0 2023-11-19 04:17:39,012 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 150, loss[loss=0.09184, simple_loss=0.1143, pruned_loss=0.02068, audio_tagging_loss=0.014, over 16692.00 frames. ], tot_loss[loss=0.09911, simple_loss=0.1097, pruned_loss=0.02601, audio_tagging_loss=0.01825, over 1610670.97 frames. ], batch size: 63, lr: 8.96e-03, grad_scale: 32.0 2023-11-19 04:18:14,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=562280.0, ans=0.125 2023-11-19 04:18:35,266 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 200, loss[loss=0.07423, simple_loss=0.08242, pruned_loss=0.01936, audio_tagging_loss=0.01367, over 15571.00 frames. ], tot_loss[loss=0.09704, simple_loss=0.1097, pruned_loss=0.0261, audio_tagging_loss=0.01608, over 1925485.53 frames. ], batch size: 60, lr: 8.96e-03, grad_scale: 32.0 2023-11-19 04:18:37,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=562413.3333333334, ans=0.0 2023-11-19 04:18:52,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=562480.0, ans=0.0 2023-11-19 04:18:54,365 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.820e+01 8.627e+01 9.281e+01 9.933e+01 1.355e+02, threshold=1.856e+02, percent-clipped=0.0 2023-11-19 04:18:55,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=562480.0, ans=0.0 2023-11-19 04:18:55,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=562480.0, ans=0.1 2023-11-19 04:19:02,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.31 vs. limit=15.0 2023-11-19 04:19:12,473 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.07 vs. limit=15.0 2023-11-19 04:19:29,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=562680.0, ans=0.0 2023-11-19 04:19:31,611 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 250, loss[loss=0.1039, simple_loss=0.118, pruned_loss=0.03175, audio_tagging_loss=0.01313, over 13784.00 frames. ], tot_loss[loss=0.09545, simple_loss=0.1098, pruned_loss=0.0261, audio_tagging_loss=0.01444, over 2177419.37 frames. ], batch size: 52, lr: 8.95e-03, grad_scale: 32.0 2023-11-19 04:19:51,244 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=8.75 vs. limit=15.0 2023-11-19 04:20:01,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=562880.0, ans=0.125 2023-11-19 04:20:09,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=562946.6666666666, ans=0.125 2023-11-19 04:20:16,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=563013.3333333334, ans=0.05 2023-11-19 04:20:26,695 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 300, loss[loss=0.1182, simple_loss=0.1439, pruned_loss=0.03716, audio_tagging_loss=0.00907, over 16163.00 frames. ], tot_loss[loss=0.09507, simple_loss=0.1108, pruned_loss=0.02637, audio_tagging_loss=0.01329, over 2376497.12 frames. ], batch size: 59, lr: 8.95e-03, grad_scale: 32.0 2023-11-19 04:20:30,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=563080.0, ans=0.0 2023-11-19 04:20:33,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=563080.0, ans=0.0 2023-11-19 04:20:35,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=22.5 2023-11-19 04:20:45,697 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.292e+01 8.672e+01 9.179e+01 1.018e+02 1.268e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-19 04:20:49,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=563213.3333333334, ans=0.125 2023-11-19 04:21:22,023 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 350, loss[loss=0.0886, simple_loss=0.09938, pruned_loss=0.02878, audio_tagging_loss=0.01013, over 15437.00 frames. ], tot_loss[loss=0.09416, simple_loss=0.1106, pruned_loss=0.02633, audio_tagging_loss=0.01252, over 2528420.02 frames. ], batch size: 59, lr: 8.95e-03, grad_scale: 32.0 2023-11-19 04:21:30,896 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=563413.3333333334, ans=0.5 2023-11-19 04:21:37,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=563480.0, ans=0.0 2023-11-19 04:21:46,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=563546.6666666666, ans=0.1 2023-11-19 04:21:58,398 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.93 vs. limit=15.0 2023-11-19 04:22:08,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten.whitening_limit, batch_count=563680.0, ans=22.5 2023-11-19 04:22:18,677 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 400, loss[loss=0.08532, simple_loss=0.1087, pruned_loss=0.02213, audio_tagging_loss=0.008831, over 15157.00 frames. ], tot_loss[loss=0.09299, simple_loss=0.1097, pruned_loss=0.02594, audio_tagging_loss=0.01218, over 2639814.23 frames. ], batch size: 57, lr: 8.94e-03, grad_scale: 32.0 2023-11-19 04:22:30,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=563813.3333333334, ans=0.125 2023-11-19 04:22:31,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=563813.3333333334, ans=0.09899494936611666 2023-11-19 04:22:34,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=563813.3333333334, ans=0.1 2023-11-19 04:22:36,771 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.154e+01 8.502e+01 9.440e+01 1.057e+02 1.683e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-19 04:22:42,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=563880.0, ans=0.2 2023-11-19 04:22:52,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.76 vs. limit=22.5 2023-11-19 04:23:08,373 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=564013.3333333334, ans=0.125 2023-11-19 04:23:11,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=564013.3333333334, ans=10.0 2023-11-19 04:23:12,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=564080.0, ans=0.0 2023-11-19 04:23:13,481 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 450, loss[loss=0.06687, simple_loss=0.0776, pruned_loss=0.01744, audio_tagging_loss=0.01064, over 14738.00 frames. ], tot_loss[loss=0.09243, simple_loss=0.1094, pruned_loss=0.02597, audio_tagging_loss=0.01176, over 2731439.94 frames. ], batch size: 57, lr: 8.94e-03, grad_scale: 32.0 2023-11-19 04:23:13,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=564080.0, ans=0.125 2023-11-19 04:23:27,741 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.82 vs. limit=15.0 2023-11-19 04:23:46,699 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=11.86 vs. limit=15.0 2023-11-19 04:23:47,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=564280.0, ans=0.1 2023-11-19 04:23:49,019 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=12.0 2023-11-19 04:23:56,449 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.87 vs. limit=15.0 2023-11-19 04:24:01,732 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.95 vs. limit=6.0 2023-11-19 04:24:08,563 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 500, loss[loss=0.09063, simple_loss=0.1095, pruned_loss=0.02717, audio_tagging_loss=0.008723, over 14718.00 frames. ], tot_loss[loss=0.09208, simple_loss=0.1092, pruned_loss=0.02599, audio_tagging_loss=0.01148, over 2790524.81 frames. ], batch size: 57, lr: 8.94e-03, grad_scale: 32.0 2023-11-19 04:24:12,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=564413.3333333334, ans=0.2 2023-11-19 04:24:13,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=564413.3333333334, ans=0.0 2023-11-19 04:24:15,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=564413.3333333334, ans=0.035 2023-11-19 04:24:28,810 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.084e+01 8.601e+01 9.237e+01 1.002e+02 1.241e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 04:24:42,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=564613.3333333334, ans=0.1 2023-11-19 04:25:04,837 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 550, loss[loss=0.07022, simple_loss=0.07357, pruned_loss=0.02034, audio_tagging_loss=0.01309, over 15392.00 frames. ], tot_loss[loss=0.09138, simple_loss=0.1085, pruned_loss=0.02581, audio_tagging_loss=0.01132, over 2851285.06 frames. ], batch size: 59, lr: 8.94e-03, grad_scale: 32.0 2023-11-19 04:25:12,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=564746.6666666666, ans=0.1 2023-11-19 04:26:00,651 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 600, loss[loss=0.09566, simple_loss=0.1122, pruned_loss=0.03001, audio_tagging_loss=0.00953, over 16086.00 frames. ], tot_loss[loss=0.09176, simple_loss=0.1094, pruned_loss=0.02598, audio_tagging_loss=0.0111, over 2899519.77 frames. ], batch size: 61, lr: 8.93e-03, grad_scale: 32.0 2023-11-19 04:26:15,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=565146.6666666666, ans=0.125 2023-11-19 04:26:18,643 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.289e+01 8.437e+01 9.383e+01 9.998e+01 1.583e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-19 04:26:43,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=565280.0, ans=0.035 2023-11-19 04:26:45,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=565346.6666666666, ans=0.2 2023-11-19 04:26:56,078 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 650, loss[loss=0.08187, simple_loss=0.09049, pruned_loss=0.02215, audio_tagging_loss=0.01448, over 14631.00 frames. ], tot_loss[loss=0.09158, simple_loss=0.1088, pruned_loss=0.02608, audio_tagging_loss=0.01109, over 2930808.77 frames. ], batch size: 57, lr: 8.93e-03, grad_scale: 32.0 2023-11-19 04:27:04,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=565413.3333333334, ans=0.125 2023-11-19 04:27:11,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=565480.0, ans=0.125 2023-11-19 04:27:23,097 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.27 vs. limit=15.0 2023-11-19 04:27:41,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=565680.0, ans=0.1 2023-11-19 04:27:42,365 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.18 vs. limit=22.5 2023-11-19 04:27:52,227 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 700, loss[loss=0.08949, simple_loss=0.1027, pruned_loss=0.02479, audio_tagging_loss=0.01333, over 14663.00 frames. ], tot_loss[loss=0.09143, simple_loss=0.1088, pruned_loss=0.02598, audio_tagging_loss=0.01103, over 2960720.48 frames. ], batch size: 57, lr: 8.93e-03, grad_scale: 32.0 2023-11-19 04:27:59,306 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:28:02,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.24 vs. limit=12.0 2023-11-19 04:28:10,735 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.869e+01 8.560e+01 9.295e+01 1.024e+02 1.604e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 04:28:20,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=565880.0, ans=0.0 2023-11-19 04:28:39,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=566013.3333333334, ans=0.0 2023-11-19 04:28:44,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.05 vs. limit=10.0 2023-11-19 04:28:47,741 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 750, loss[loss=0.1064, simple_loss=0.1362, pruned_loss=0.0291, audio_tagging_loss=0.009186, over 14383.00 frames. ], tot_loss[loss=0.09229, simple_loss=0.1101, pruned_loss=0.02625, audio_tagging_loss=0.011, over 2981815.74 frames. ], batch size: 52, lr: 8.93e-03, grad_scale: 16.0 2023-11-19 04:28:56,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=566080.0, ans=0.0 2023-11-19 04:28:59,465 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:29:08,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=566213.3333333334, ans=0.07 2023-11-19 04:29:15,702 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.84 vs. limit=15.0 2023-11-19 04:29:34,667 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.63 vs. limit=15.0 2023-11-19 04:29:36,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=566346.6666666666, ans=0.0 2023-11-19 04:29:42,383 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 800, loss[loss=0.1017, simple_loss=0.1353, pruned_loss=0.02838, audio_tagging_loss=0.005678, over 15541.00 frames. ], tot_loss[loss=0.09173, simple_loss=0.1094, pruned_loss=0.02603, audio_tagging_loss=0.01102, over 2993250.25 frames. ], batch size: 56, lr: 8.92e-03, grad_scale: 32.0 2023-11-19 04:29:52,607 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=566480.0, ans=0.125 2023-11-19 04:30:02,735 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.155e+01 8.557e+01 9.435e+01 1.048e+02 1.522e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-19 04:30:05,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=566546.6666666666, ans=0.0 2023-11-19 04:30:09,287 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.62 vs. limit=12.0 2023-11-19 04:30:14,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=566546.6666666666, ans=0.2 2023-11-19 04:30:29,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=566680.0, ans=0.0 2023-11-19 04:30:33,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=566680.0, ans=0.125 2023-11-19 04:30:34,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=566680.0, ans=0.1 2023-11-19 04:30:34,927 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=566680.0, ans=0.125 2023-11-19 04:30:36,033 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:30:38,442 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 850, loss[loss=0.09574, simple_loss=0.1148, pruned_loss=0.02704, audio_tagging_loss=0.0113, over 14984.00 frames. ], tot_loss[loss=0.09146, simple_loss=0.109, pruned_loss=0.02588, audio_tagging_loss=0.01108, over 3009664.38 frames. ], batch size: 58, lr: 8.92e-03, grad_scale: 32.0 2023-11-19 04:31:01,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=566880.0, ans=0.1 2023-11-19 04:31:01,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=566880.0, ans=0.0 2023-11-19 04:31:02,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=566880.0, ans=0.0 2023-11-19 04:31:02,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=566880.0, ans=0.2 2023-11-19 04:31:06,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=566880.0, ans=0.125 2023-11-19 04:31:07,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.29 vs. limit=12.0 2023-11-19 04:31:08,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=566880.0, ans=0.0 2023-11-19 04:31:19,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=566946.6666666666, ans=0.09899494936611666 2023-11-19 04:31:22,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=567013.3333333334, ans=0.125 2023-11-19 04:31:23,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=8.96 vs. limit=12.0 2023-11-19 04:31:25,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=567013.3333333334, ans=0.125 2023-11-19 04:31:30,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=567013.3333333334, ans=0.125 2023-11-19 04:31:34,306 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 900, loss[loss=0.1318, simple_loss=0.1513, pruned_loss=0.04722, audio_tagging_loss=0.008992, over 15553.00 frames. ], tot_loss[loss=0.09134, simple_loss=0.1088, pruned_loss=0.02584, audio_tagging_loss=0.01112, over 3019365.62 frames. ], batch size: 57, lr: 8.92e-03, grad_scale: 32.0 2023-11-19 04:31:34,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=567080.0, ans=0.0 2023-11-19 04:31:37,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=567080.0, ans=0.125 2023-11-19 04:31:53,696 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.715e+01 8.468e+01 9.134e+01 1.003e+02 1.510e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 04:31:57,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=567213.3333333334, ans=0.1 2023-11-19 04:31:59,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=567213.3333333334, ans=0.1 2023-11-19 04:32:00,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=567213.3333333334, ans=0.0 2023-11-19 04:32:20,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=567346.6666666666, ans=0.015 2023-11-19 04:32:27,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=567346.6666666666, ans=0.0 2023-11-19 04:32:29,741 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 950, loss[loss=0.09187, simple_loss=0.1294, pruned_loss=0.02063, audio_tagging_loss=0.006552, over 14829.00 frames. ], tot_loss[loss=0.09139, simple_loss=0.109, pruned_loss=0.0258, audio_tagging_loss=0.01111, over 3028810.75 frames. ], batch size: 58, lr: 8.92e-03, grad_scale: 32.0 2023-11-19 04:32:32,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.78 vs. limit=10.0 2023-11-19 04:32:34,121 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=567413.3333333334, ans=0.035 2023-11-19 04:32:34,502 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.61 vs. limit=6.0 2023-11-19 04:32:50,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=567480.0, ans=0.125 2023-11-19 04:32:51,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=567546.6666666666, ans=0.125 2023-11-19 04:32:53,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten.whitening_limit, batch_count=567546.6666666666, ans=15.0 2023-11-19 04:32:53,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=567546.6666666666, ans=0.125 2023-11-19 04:33:25,108 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1000, loss[loss=0.06687, simple_loss=0.07478, pruned_loss=0.01602, audio_tagging_loss=0.01347, over 14696.00 frames. ], tot_loss[loss=0.09064, simple_loss=0.1082, pruned_loss=0.0256, audio_tagging_loss=0.01097, over 3035117.25 frames. ], batch size: 55, lr: 8.91e-03, grad_scale: 16.0 2023-11-19 04:33:25,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=567746.6666666666, ans=0.0 2023-11-19 04:33:27,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=567746.6666666666, ans=0.125 2023-11-19 04:33:38,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=567813.3333333334, ans=0.125 2023-11-19 04:33:46,511 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.56 vs. limit=12.0 2023-11-19 04:33:47,008 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.670e+01 9.529e+01 1.041e+02 1.429e+02, threshold=1.906e+02, percent-clipped=0.0 2023-11-19 04:33:49,202 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:33:50,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=567880.0, ans=0.125 2023-11-19 04:33:55,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=567880.0, ans=0.025 2023-11-19 04:33:56,034 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.81 vs. limit=15.0 2023-11-19 04:34:01,164 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.02 vs. limit=15.0 2023-11-19 04:34:04,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=567946.6666666666, ans=0.0 2023-11-19 04:34:21,671 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1050, loss[loss=0.07155, simple_loss=0.0972, pruned_loss=0.01395, audio_tagging_loss=0.008998, over 15146.00 frames. ], tot_loss[loss=0.09059, simple_loss=0.1081, pruned_loss=0.02568, audio_tagging_loss=0.01083, over 3034645.08 frames. ], batch size: 56, lr: 8.91e-03, grad_scale: 16.0 2023-11-19 04:34:28,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=568080.0, ans=0.1 2023-11-19 04:34:33,366 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.46 vs. limit=15.0 2023-11-19 04:34:34,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=568146.6666666666, ans=0.0 2023-11-19 04:34:50,224 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=568213.3333333334, ans=0.125 2023-11-19 04:34:54,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=568280.0, ans=0.0 2023-11-19 04:35:10,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=568346.6666666666, ans=0.0 2023-11-19 04:35:14,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten.whitening_limit, batch_count=568346.6666666666, ans=15.0 2023-11-19 04:35:17,068 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1100, loss[loss=0.08473, simple_loss=0.1028, pruned_loss=0.02046, audio_tagging_loss=0.01284, over 14967.00 frames. ], tot_loss[loss=0.09042, simple_loss=0.1081, pruned_loss=0.02571, audio_tagging_loss=0.01068, over 3044723.99 frames. ], batch size: 56, lr: 8.91e-03, grad_scale: 16.0 2023-11-19 04:35:19,230 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:35:22,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=568413.3333333334, ans=0.0 2023-11-19 04:35:23,837 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.36 vs. limit=15.0 2023-11-19 04:35:31,528 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=568480.0, ans=0.125 2023-11-19 04:35:38,082 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.186e+01 8.818e+01 9.664e+01 1.074e+02 1.667e+02, threshold=1.933e+02, percent-clipped=0.0 2023-11-19 04:35:41,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=568546.6666666666, ans=0.125 2023-11-19 04:35:48,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=568546.6666666666, ans=0.125 2023-11-19 04:35:54,270 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=568613.3333333334, ans=0.0 2023-11-19 04:36:08,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=568680.0, ans=0.2 2023-11-19 04:36:12,554 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1150, loss[loss=0.08467, simple_loss=0.09826, pruned_loss=0.02135, audio_tagging_loss=0.01419, over 14357.00 frames. ], tot_loss[loss=0.09096, simple_loss=0.1087, pruned_loss=0.02597, audio_tagging_loss=0.01062, over 3046966.73 frames. ], batch size: 53, lr: 8.91e-03, grad_scale: 16.0 2023-11-19 04:36:21,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=568746.6666666666, ans=0.125 2023-11-19 04:36:44,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.21 vs. limit=10.0 2023-11-19 04:36:57,709 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.36 vs. limit=15.0 2023-11-19 04:37:04,522 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.34 vs. limit=22.5 2023-11-19 04:37:05,817 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:37:08,822 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1200, loss[loss=0.09898, simple_loss=0.1291, pruned_loss=0.02586, audio_tagging_loss=0.008593, over 16448.00 frames. ], tot_loss[loss=0.09146, simple_loss=0.1095, pruned_loss=0.02614, audio_tagging_loss=0.01056, over 3040810.30 frames. ], batch size: 57, lr: 8.90e-03, grad_scale: 32.0 2023-11-19 04:37:13,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=569080.0, ans=0.025 2023-11-19 04:37:28,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=569146.6666666666, ans=0.0 2023-11-19 04:37:29,365 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.648e+01 9.273e+01 1.051e+02 1.425e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-19 04:37:36,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=569213.3333333334, ans=0.0 2023-11-19 04:37:44,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=569280.0, ans=0.0 2023-11-19 04:38:04,297 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1250, loss[loss=0.08919, simple_loss=0.11, pruned_loss=0.02426, audio_tagging_loss=0.009932, over 15866.00 frames. ], tot_loss[loss=0.09152, simple_loss=0.1095, pruned_loss=0.02622, audio_tagging_loss=0.01054, over 3043086.93 frames. ], batch size: 56, lr: 8.90e-03, grad_scale: 32.0 2023-11-19 04:38:08,437 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.60 vs. limit=22.5 2023-11-19 04:38:27,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=569546.6666666666, ans=0.2 2023-11-19 04:38:42,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=569613.3333333334, ans=0.0 2023-11-19 04:38:59,916 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1300, loss[loss=0.08692, simple_loss=0.1028, pruned_loss=0.02314, audio_tagging_loss=0.01236, over 14170.00 frames. ], tot_loss[loss=0.09017, simple_loss=0.1079, pruned_loss=0.02559, audio_tagging_loss=0.01065, over 3038932.51 frames. ], batch size: 54, lr: 8.90e-03, grad_scale: 32.0 2023-11-19 04:39:14,782 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.41 vs. limit=10.0 2023-11-19 04:39:19,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=569813.3333333334, ans=0.1 2023-11-19 04:39:21,656 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.412e+01 8.385e+01 9.003e+01 9.844e+01 1.320e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-19 04:39:25,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.06 vs. limit=10.0 2023-11-19 04:39:39,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=569946.6666666666, ans=0.125 2023-11-19 04:39:56,373 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1350, loss[loss=0.09978, simple_loss=0.1223, pruned_loss=0.02991, audio_tagging_loss=0.008738, over 14979.00 frames. ], tot_loss[loss=0.09055, simple_loss=0.1082, pruned_loss=0.0258, audio_tagging_loss=0.01065, over 3043032.54 frames. ], batch size: 55, lr: 8.90e-03, grad_scale: 32.0 2023-11-19 04:40:04,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=570080.0, ans=0.125 2023-11-19 04:40:17,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=570213.3333333334, ans=0.0 2023-11-19 04:40:23,764 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=570213.3333333334, ans=0.2 2023-11-19 04:40:30,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=570280.0, ans=0.5 2023-11-19 04:40:34,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=570280.0, ans=0.1 2023-11-19 04:40:36,361 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:40:51,811 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1400, loss[loss=0.1174, simple_loss=0.1404, pruned_loss=0.03637, audio_tagging_loss=0.01086, over 15156.00 frames. ], tot_loss[loss=0.09103, simple_loss=0.1086, pruned_loss=0.02589, audio_tagging_loss=0.01085, over 3050998.21 frames. ], batch size: 56, lr: 8.89e-03, grad_scale: 32.0 2023-11-19 04:40:54,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.83 vs. limit=15.0 2023-11-19 04:40:55,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=570413.3333333334, ans=0.5 2023-11-19 04:41:13,426 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.205e+01 8.757e+01 9.593e+01 1.066e+02 1.571e+02, threshold=1.919e+02, percent-clipped=0.0 2023-11-19 04:41:14,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=570546.6666666666, ans=0.0 2023-11-19 04:41:21,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=570546.6666666666, ans=0.0 2023-11-19 04:41:39,143 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.85 vs. limit=10.0 2023-11-19 04:41:40,058 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.06 vs. limit=15.0 2023-11-19 04:41:46,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=570746.6666666666, ans=0.0 2023-11-19 04:41:47,487 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1450, loss[loss=0.07559, simple_loss=0.0966, pruned_loss=0.01439, audio_tagging_loss=0.0129, over 14801.00 frames. ], tot_loss[loss=0.09169, simple_loss=0.1096, pruned_loss=0.02614, audio_tagging_loss=0.01078, over 3053932.67 frames. ], batch size: 59, lr: 8.89e-03, grad_scale: 32.0 2023-11-19 04:42:06,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=570813.3333333334, ans=0.125 2023-11-19 04:42:21,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=570946.6666666666, ans=0.2 2023-11-19 04:42:31,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=571013.3333333334, ans=0.125 2023-11-19 04:42:37,936 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.77 vs. limit=22.5 2023-11-19 04:42:38,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=571013.3333333334, ans=0.015 2023-11-19 04:42:43,695 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1500, loss[loss=0.1019, simple_loss=0.1084, pruned_loss=0.0305, audio_tagging_loss=0.01722, over 14926.00 frames. ], tot_loss[loss=0.09189, simple_loss=0.1096, pruned_loss=0.0262, audio_tagging_loss=0.0109, over 3052797.05 frames. ], batch size: 56, lr: 8.89e-03, grad_scale: 32.0 2023-11-19 04:42:43,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=571080.0, ans=0.125 2023-11-19 04:42:47,087 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.20 vs. limit=15.0 2023-11-19 04:42:47,673 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:42:54,319 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.35 vs. limit=22.5 2023-11-19 04:43:04,350 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 8.390e+01 9.200e+01 9.780e+01 1.571e+02, threshold=1.840e+02, percent-clipped=0.0 2023-11-19 04:43:08,807 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:43:32,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=571346.6666666666, ans=0.1 2023-11-19 04:43:37,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=571346.6666666666, ans=0.2 2023-11-19 04:43:39,288 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1550, loss[loss=0.09227, simple_loss=0.119, pruned_loss=0.02334, audio_tagging_loss=0.009424, over 15449.00 frames. ], tot_loss[loss=0.09134, simple_loss=0.1087, pruned_loss=0.02592, audio_tagging_loss=0.01107, over 3050540.64 frames. ], batch size: 58, lr: 8.88e-03, grad_scale: 32.0 2023-11-19 04:43:46,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=571413.3333333334, ans=0.0 2023-11-19 04:44:25,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=571680.0, ans=0.125 2023-11-19 04:44:32,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=571680.0, ans=0.125 2023-11-19 04:44:33,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=571746.6666666666, ans=0.0 2023-11-19 04:44:34,455 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1600, loss[loss=0.08755, simple_loss=0.1003, pruned_loss=0.02823, audio_tagging_loss=0.009148, over 14857.00 frames. ], tot_loss[loss=0.09175, simple_loss=0.1093, pruned_loss=0.02609, audio_tagging_loss=0.011, over 3051996.96 frames. ], batch size: 57, lr: 8.88e-03, grad_scale: 32.0 2023-11-19 04:44:37,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=571746.6666666666, ans=0.0 2023-11-19 04:44:39,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=571746.6666666666, ans=0.1 2023-11-19 04:44:41,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=571746.6666666666, ans=0.07 2023-11-19 04:44:52,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=571813.3333333334, ans=0.125 2023-11-19 04:44:54,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=571813.3333333334, ans=15.0 2023-11-19 04:44:56,105 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.975e+01 9.863e+01 1.094e+02 1.850e+02, threshold=1.973e+02, percent-clipped=1.0 2023-11-19 04:44:59,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten.whitening_limit, batch_count=571880.0, ans=15.0 2023-11-19 04:45:01,241 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.85 vs. limit=15.0 2023-11-19 04:45:01,963 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=10.82 vs. limit=15.0 2023-11-19 04:45:23,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=572013.3333333334, ans=0.125 2023-11-19 04:45:31,002 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1650, loss[loss=0.07437, simple_loss=0.08966, pruned_loss=0.01887, audio_tagging_loss=0.01067, over 14063.00 frames. ], tot_loss[loss=0.09148, simple_loss=0.1091, pruned_loss=0.02587, audio_tagging_loss=0.01106, over 3049643.55 frames. ], batch size: 55, lr: 8.88e-03, grad_scale: 32.0 2023-11-19 04:45:36,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=572080.0, ans=0.125 2023-11-19 04:45:38,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.14 vs. limit=15.0 2023-11-19 04:45:54,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=572213.3333333334, ans=0.125 2023-11-19 04:45:55,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=572213.3333333334, ans=0.1 2023-11-19 04:46:12,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=572280.0, ans=0.025 2023-11-19 04:46:17,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn1.whiten.whitening_limit, batch_count=572346.6666666666, ans=22.5 2023-11-19 04:46:21,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=572346.6666666666, ans=0.05 2023-11-19 04:46:26,832 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1700, loss[loss=0.08004, simple_loss=0.09469, pruned_loss=0.02247, audio_tagging_loss=0.01022, over 14744.00 frames. ], tot_loss[loss=0.09144, simple_loss=0.109, pruned_loss=0.02587, audio_tagging_loss=0.01109, over 3051725.60 frames. ], batch size: 58, lr: 8.88e-03, grad_scale: 32.0 2023-11-19 04:46:31,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=572413.3333333334, ans=0.2 2023-11-19 04:46:35,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=572413.3333333334, ans=0.2 2023-11-19 04:46:36,878 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.25 vs. limit=15.0 2023-11-19 04:46:39,985 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2023-11-19 04:46:47,239 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.124e+01 8.408e+01 9.171e+01 1.022e+02 1.332e+02, threshold=1.834e+02, percent-clipped=0.0 2023-11-19 04:46:49,357 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.36 vs. limit=15.0 2023-11-19 04:46:59,467 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.48 vs. limit=12.0 2023-11-19 04:47:19,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.96 vs. limit=15.0 2023-11-19 04:47:21,115 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1750, loss[loss=0.1051, simple_loss=0.1339, pruned_loss=0.03148, audio_tagging_loss=0.006693, over 14893.00 frames. ], tot_loss[loss=0.09073, simple_loss=0.1083, pruned_loss=0.02567, audio_tagging_loss=0.01091, over 3054049.49 frames. ], batch size: 55, lr: 8.87e-03, grad_scale: 32.0 2023-11-19 04:47:34,893 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.69 vs. limit=6.0 2023-11-19 04:47:47,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=572880.0, ans=0.125 2023-11-19 04:48:03,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=572946.6666666666, ans=0.125 2023-11-19 04:48:08,539 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.54 vs. limit=15.0 2023-11-19 04:48:15,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=573013.3333333334, ans=0.0 2023-11-19 04:48:16,381 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.76 vs. limit=22.5 2023-11-19 04:48:17,892 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1800, loss[loss=0.08121, simple_loss=0.09481, pruned_loss=0.02609, audio_tagging_loss=0.007712, over 14182.00 frames. ], tot_loss[loss=0.09137, simple_loss=0.1095, pruned_loss=0.02584, audio_tagging_loss=0.01079, over 3056926.99 frames. ], batch size: 55, lr: 8.87e-03, grad_scale: 32.0 2023-11-19 04:48:21,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=573080.0, ans=0.125 2023-11-19 04:48:38,403 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.757e+01 8.476e+01 9.222e+01 1.009e+02 1.227e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 04:48:43,858 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=573213.3333333334, ans=0.0 2023-11-19 04:48:47,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=573213.3333333334, ans=0.0 2023-11-19 04:48:52,168 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=6.45 vs. limit=15.0 2023-11-19 04:49:03,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=573346.6666666666, ans=10.0 2023-11-19 04:49:06,596 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=573346.6666666666, ans=0.09899494936611666 2023-11-19 04:49:13,805 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1850, loss[loss=0.08372, simple_loss=0.103, pruned_loss=0.02176, audio_tagging_loss=0.01044, over 14562.00 frames. ], tot_loss[loss=0.09152, simple_loss=0.1095, pruned_loss=0.0261, audio_tagging_loss=0.01069, over 3049585.40 frames. ], batch size: 58, lr: 8.87e-03, grad_scale: 32.0 2023-11-19 04:49:28,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=573480.0, ans=0.0 2023-11-19 04:49:31,007 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=573480.0, ans=0.125 2023-11-19 04:49:32,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=573480.0, ans=0.0 2023-11-19 04:50:00,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=573680.0, ans=0.2 2023-11-19 04:50:09,107 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1900, loss[loss=0.09136, simple_loss=0.09804, pruned_loss=0.03057, audio_tagging_loss=0.01177, over 16074.00 frames. ], tot_loss[loss=0.09047, simple_loss=0.1081, pruned_loss=0.02573, audio_tagging_loss=0.01071, over 3047809.32 frames. ], batch size: 64, lr: 8.87e-03, grad_scale: 32.0 2023-11-19 04:50:27,061 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=573813.3333333334, ans=0.125 2023-11-19 04:50:31,202 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.045e+01 8.643e+01 9.371e+01 1.051e+02 1.561e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 04:50:34,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=573880.0, ans=0.5 2023-11-19 04:50:36,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=573880.0, ans=0.125 2023-11-19 04:50:41,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=573880.0, ans=0.1 2023-11-19 04:50:50,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=573946.6666666666, ans=0.07 2023-11-19 04:51:05,278 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 1950, loss[loss=0.09472, simple_loss=0.128, pruned_loss=0.02263, audio_tagging_loss=0.008116, over 15170.00 frames. ], tot_loss[loss=0.09073, simple_loss=0.1087, pruned_loss=0.02575, audio_tagging_loss=0.01061, over 3048497.02 frames. ], batch size: 58, lr: 8.86e-03, grad_scale: 32.0 2023-11-19 04:51:08,971 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.07 vs. limit=22.5 2023-11-19 04:51:09,721 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=574080.0, ans=0.125 2023-11-19 04:51:25,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=574146.6666666666, ans=0.0 2023-11-19 04:51:27,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=574213.3333333334, ans=0.1 2023-11-19 04:51:32,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=574213.3333333334, ans=0.2 2023-11-19 04:51:41,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=574280.0, ans=0.0 2023-11-19 04:52:01,539 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2000, loss[loss=0.08262, simple_loss=0.09912, pruned_loss=0.02195, audio_tagging_loss=0.0111, over 15412.00 frames. ], tot_loss[loss=0.09027, simple_loss=0.108, pruned_loss=0.02546, audio_tagging_loss=0.01079, over 3039540.02 frames. ], batch size: 56, lr: 8.86e-03, grad_scale: 32.0 2023-11-19 04:52:02,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=574413.3333333334, ans=0.0 2023-11-19 04:52:03,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=574413.3333333334, ans=0.1 2023-11-19 04:52:09,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=574413.3333333334, ans=0.0 2023-11-19 04:52:09,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=574413.3333333334, ans=0.1 2023-11-19 04:52:21,799 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.904e+01 9.748e+01 1.142e+02 1.614e+02, threshold=1.950e+02, percent-clipped=0.0 2023-11-19 04:52:33,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=574613.3333333334, ans=0.09899494936611666 2023-11-19 04:52:39,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=574613.3333333334, ans=0.07 2023-11-19 04:52:51,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=574680.0, ans=0.0 2023-11-19 04:52:52,340 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.81 vs. limit=15.0 2023-11-19 04:52:57,095 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2050, loss[loss=0.1048, simple_loss=0.1255, pruned_loss=0.0324, audio_tagging_loss=0.009651, over 14938.00 frames. ], tot_loss[loss=0.09106, simple_loss=0.1087, pruned_loss=0.02586, audio_tagging_loss=0.01086, over 3038594.46 frames. ], batch size: 57, lr: 8.86e-03, grad_scale: 32.0 2023-11-19 04:53:02,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=574746.6666666666, ans=0.2 2023-11-19 04:53:31,157 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=574946.6666666666, ans=0.125 2023-11-19 04:53:33,198 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=574946.6666666666, ans=0.125 2023-11-19 04:53:34,625 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.57 vs. limit=22.5 2023-11-19 04:53:43,759 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 04:53:52,673 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2100, loss[loss=0.08628, simple_loss=0.104, pruned_loss=0.0247, audio_tagging_loss=0.009564, over 15527.00 frames. ], tot_loss[loss=0.09113, simple_loss=0.1088, pruned_loss=0.02583, audio_tagging_loss=0.0109, over 3038450.57 frames. ], batch size: 57, lr: 8.86e-03, grad_scale: 32.0 2023-11-19 04:53:55,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=575080.0, ans=0.125 2023-11-19 04:54:00,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=575080.0, ans=0.1 2023-11-19 04:54:14,253 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.268e+01 8.570e+01 9.138e+01 1.001e+02 1.384e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 04:54:19,151 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.22 vs. limit=15.0 2023-11-19 04:54:33,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=575280.0, ans=0.0 2023-11-19 04:54:48,408 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2150, loss[loss=0.09713, simple_loss=0.1191, pruned_loss=0.03043, audio_tagging_loss=0.007144, over 16365.00 frames. ], tot_loss[loss=0.09101, simple_loss=0.1087, pruned_loss=0.02577, audio_tagging_loss=0.0109, over 3045318.79 frames. ], batch size: 59, lr: 8.85e-03, grad_scale: 32.0 2023-11-19 04:55:01,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=575480.0, ans=0.125 2023-11-19 04:55:20,949 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:55:43,940 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2200, loss[loss=0.09757, simple_loss=0.1133, pruned_loss=0.02783, audio_tagging_loss=0.01308, over 15259.00 frames. ], tot_loss[loss=0.09024, simple_loss=0.1077, pruned_loss=0.0255, audio_tagging_loss=0.01092, over 3041511.74 frames. ], batch size: 55, lr: 8.85e-03, grad_scale: 32.0 2023-11-19 04:55:44,534 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.90 vs. limit=15.0 2023-11-19 04:56:04,955 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.523e+01 9.283e+01 9.995e+01 1.354e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-19 04:56:06,530 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.50 vs. limit=15.0 2023-11-19 04:56:17,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.26 vs. limit=15.0 2023-11-19 04:56:23,215 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=575946.6666666666, ans=0.125 2023-11-19 04:56:25,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=575946.6666666666, ans=0.125 2023-11-19 04:56:27,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=576013.3333333334, ans=0.0 2023-11-19 04:56:33,372 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.16 vs. limit=15.0 2023-11-19 04:56:39,879 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2250, loss[loss=0.1047, simple_loss=0.1311, pruned_loss=0.03074, audio_tagging_loss=0.00838, over 15977.00 frames. ], tot_loss[loss=0.09065, simple_loss=0.1081, pruned_loss=0.02572, audio_tagging_loss=0.01089, over 3050028.56 frames. ], batch size: 59, lr: 8.85e-03, grad_scale: 32.0 2023-11-19 04:56:42,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=576080.0, ans=0.05 2023-11-19 04:56:46,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=576080.0, ans=0.0 2023-11-19 04:56:49,146 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=576080.0, ans=0.0 2023-11-19 04:57:03,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=576213.3333333334, ans=0.2 2023-11-19 04:57:10,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=576213.3333333334, ans=0.125 2023-11-19 04:57:24,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=576346.6666666666, ans=0.0 2023-11-19 04:57:35,791 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2300, loss[loss=0.09067, simple_loss=0.1133, pruned_loss=0.02517, audio_tagging_loss=0.008867, over 15277.00 frames. ], tot_loss[loss=0.08997, simple_loss=0.107, pruned_loss=0.02548, audio_tagging_loss=0.01099, over 3051455.86 frames. ], batch size: 57, lr: 8.85e-03, grad_scale: 16.0 2023-11-19 04:57:42,695 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.76 vs. limit=10.0 2023-11-19 04:57:47,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=576480.0, ans=0.125 2023-11-19 04:57:55,421 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=576480.0, ans=0.125 2023-11-19 04:57:57,358 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.832e+01 8.335e+01 9.344e+01 1.048e+02 1.433e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-19 04:57:58,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=576546.6666666666, ans=0.0 2023-11-19 04:58:10,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=576613.3333333334, ans=0.5 2023-11-19 04:58:16,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.19 vs. limit=12.0 2023-11-19 04:58:21,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=576680.0, ans=0.0 2023-11-19 04:58:22,655 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.22 vs. limit=15.0 2023-11-19 04:58:23,147 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 04:58:24,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=576680.0, ans=0.0 2023-11-19 04:58:30,965 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2350, loss[loss=0.121, simple_loss=0.139, pruned_loss=0.04082, audio_tagging_loss=0.01069, over 15858.00 frames. ], tot_loss[loss=0.09073, simple_loss=0.1081, pruned_loss=0.02575, audio_tagging_loss=0.01091, over 3053481.55 frames. ], batch size: 60, lr: 8.84e-03, grad_scale: 16.0 2023-11-19 04:58:36,756 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.38 vs. limit=10.0 2023-11-19 04:58:45,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=576813.3333333334, ans=0.125 2023-11-19 04:58:55,745 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=22.5 2023-11-19 04:59:19,519 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.69 vs. limit=15.0 2023-11-19 04:59:23,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=577013.3333333334, ans=0.125 2023-11-19 04:59:26,755 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2400, loss[loss=0.08722, simple_loss=0.1039, pruned_loss=0.02501, audio_tagging_loss=0.01028, over 14539.00 frames. ], tot_loss[loss=0.09012, simple_loss=0.1074, pruned_loss=0.02544, audio_tagging_loss=0.01096, over 3041343.47 frames. ], batch size: 56, lr: 8.84e-03, grad_scale: 32.0 2023-11-19 04:59:27,413 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.64 vs. limit=15.0 2023-11-19 04:59:28,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=577080.0, ans=0.125 2023-11-19 04:59:48,961 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.418e+01 8.521e+01 9.139e+01 1.013e+02 1.981e+02, threshold=1.828e+02, percent-clipped=1.0 2023-11-19 04:59:50,181 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=577213.3333333334, ans=0.125 2023-11-19 04:59:55,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=577213.3333333334, ans=0.125 2023-11-19 05:00:23,020 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2450, loss[loss=0.07612, simple_loss=0.08686, pruned_loss=0.01819, audio_tagging_loss=0.01451, over 13563.00 frames. ], tot_loss[loss=0.09065, simple_loss=0.1081, pruned_loss=0.02554, audio_tagging_loss=0.01106, over 3047486.47 frames. ], batch size: 52, lr: 8.84e-03, grad_scale: 32.0 2023-11-19 05:00:34,851 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:01:18,011 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2500, loss[loss=0.1068, simple_loss=0.1375, pruned_loss=0.03012, audio_tagging_loss=0.007874, over 15230.00 frames. ], tot_loss[loss=0.09112, simple_loss=0.1086, pruned_loss=0.02579, audio_tagging_loss=0.01101, over 3046680.11 frames. ], batch size: 56, lr: 8.84e-03, grad_scale: 16.0 2023-11-19 05:01:29,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=577813.3333333334, ans=0.95 2023-11-19 05:01:32,775 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.53 vs. limit=15.0 2023-11-19 05:01:41,714 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.089e+01 8.392e+01 9.155e+01 9.880e+01 1.151e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 05:01:49,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=577880.0, ans=0.0 2023-11-19 05:02:13,212 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2550, loss[loss=0.08643, simple_loss=0.1037, pruned_loss=0.02422, audio_tagging_loss=0.01037, over 14778.00 frames. ], tot_loss[loss=0.09141, simple_loss=0.1092, pruned_loss=0.02596, audio_tagging_loss=0.01084, over 3051600.46 frames. ], batch size: 56, lr: 8.83e-03, grad_scale: 16.0 2023-11-19 05:02:18,600 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.60 vs. limit=22.5 2023-11-19 05:02:27,624 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2023-11-19 05:02:36,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=578213.3333333334, ans=0.1 2023-11-19 05:02:37,163 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:02:58,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=578346.6666666666, ans=0.125 2023-11-19 05:03:00,938 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.02 vs. limit=15.0 2023-11-19 05:03:07,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=578346.6666666666, ans=0.0 2023-11-19 05:03:09,582 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2600, loss[loss=0.06448, simple_loss=0.07542, pruned_loss=0.0174, audio_tagging_loss=0.00938, over 14923.00 frames. ], tot_loss[loss=0.08967, simple_loss=0.1073, pruned_loss=0.02525, audio_tagging_loss=0.01079, over 3052955.51 frames. ], batch size: 56, lr: 8.83e-03, grad_scale: 16.0 2023-11-19 05:03:32,524 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.956e+01 8.609e+01 9.579e+01 1.039e+02 1.650e+02, threshold=1.916e+02, percent-clipped=0.0 2023-11-19 05:03:50,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=578613.3333333334, ans=0.2 2023-11-19 05:03:59,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=578680.0, ans=0.125 2023-11-19 05:04:04,654 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:04:05,561 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2650, loss[loss=0.08572, simple_loss=0.1003, pruned_loss=0.02616, audio_tagging_loss=0.009426, over 14836.00 frames. ], tot_loss[loss=0.09003, simple_loss=0.1077, pruned_loss=0.02543, audio_tagging_loss=0.01073, over 3049591.82 frames. ], batch size: 58, lr: 8.83e-03, grad_scale: 16.0 2023-11-19 05:04:16,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=578813.3333333334, ans=0.0 2023-11-19 05:04:18,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=578813.3333333334, ans=0.0 2023-11-19 05:04:24,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=578813.3333333334, ans=0.125 2023-11-19 05:04:28,281 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=578880.0, ans=0.125 2023-11-19 05:04:48,261 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.75 vs. limit=15.0 2023-11-19 05:05:00,347 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2700, loss[loss=0.08168, simple_loss=0.0923, pruned_loss=0.02451, audio_tagging_loss=0.01102, over 14358.00 frames. ], tot_loss[loss=0.08953, simple_loss=0.1071, pruned_loss=0.02527, audio_tagging_loss=0.0107, over 3045321.18 frames. ], batch size: 54, lr: 8.83e-03, grad_scale: 16.0 2023-11-19 05:05:02,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=579080.0, ans=0.2 2023-11-19 05:05:06,326 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.22 vs. limit=15.0 2023-11-19 05:05:11,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=579146.6666666666, ans=0.1 2023-11-19 05:05:11,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=579146.6666666666, ans=0.125 2023-11-19 05:05:12,740 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=579146.6666666666, ans=0.0 2023-11-19 05:05:24,702 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.328e+01 8.403e+01 9.195e+01 1.002e+02 1.372e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-19 05:05:33,009 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.35 vs. limit=15.0 2023-11-19 05:05:34,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=579280.0, ans=0.015 2023-11-19 05:05:50,729 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2023-11-19 05:05:52,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=579346.6666666666, ans=0.125 2023-11-19 05:05:57,149 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2750, loss[loss=0.08207, simple_loss=0.1059, pruned_loss=0.01834, audio_tagging_loss=0.01079, over 15368.00 frames. ], tot_loss[loss=0.08906, simple_loss=0.1064, pruned_loss=0.02506, audio_tagging_loss=0.0108, over 3040882.98 frames. ], batch size: 55, lr: 8.82e-03, grad_scale: 16.0 2023-11-19 05:05:59,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=579413.3333333334, ans=0.1 2023-11-19 05:05:59,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=579413.3333333334, ans=0.125 2023-11-19 05:06:10,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=579480.0, ans=0.125 2023-11-19 05:06:11,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=579480.0, ans=0.09899494936611666 2023-11-19 05:06:44,672 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:06:50,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=579680.0, ans=0.125 2023-11-19 05:06:50,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=579680.0, ans=0.125 2023-11-19 05:06:52,973 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2800, loss[loss=0.0848, simple_loss=0.09073, pruned_loss=0.0244, audio_tagging_loss=0.01503, over 16391.00 frames. ], tot_loss[loss=0.08856, simple_loss=0.1055, pruned_loss=0.02487, audio_tagging_loss=0.01094, over 3045109.55 frames. ], batch size: 63, lr: 8.82e-03, grad_scale: 32.0 2023-11-19 05:07:16,554 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.973e+01 8.959e+01 9.930e+01 1.123e+02 1.609e+02, threshold=1.986e+02, percent-clipped=0.0 2023-11-19 05:07:28,122 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=7.42 vs. limit=10.0 2023-11-19 05:07:28,227 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=19.89 vs. limit=22.5 2023-11-19 05:07:31,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=579946.6666666666, ans=0.2 2023-11-19 05:07:32,506 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.09 vs. limit=22.5 2023-11-19 05:07:43,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=580013.3333333334, ans=0.0 2023-11-19 05:07:48,252 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2850, loss[loss=0.08488, simple_loss=0.09955, pruned_loss=0.02269, audio_tagging_loss=0.01242, over 15276.00 frames. ], tot_loss[loss=0.0886, simple_loss=0.1056, pruned_loss=0.02487, audio_tagging_loss=0.01092, over 3030795.15 frames. ], batch size: 58, lr: 8.82e-03, grad_scale: 32.0 2023-11-19 05:07:52,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=580080.0, ans=0.1 2023-11-19 05:07:56,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=580080.0, ans=0.125 2023-11-19 05:08:00,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=580146.6666666666, ans=0.125 2023-11-19 05:08:29,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=580280.0, ans=0.125 2023-11-19 05:08:30,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=580280.0, ans=0.125 2023-11-19 05:08:39,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=580346.6666666666, ans=0.0 2023-11-19 05:08:45,316 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2900, loss[loss=0.07268, simple_loss=0.08735, pruned_loss=0.01981, audio_tagging_loss=0.009206, over 16685.00 frames. ], tot_loss[loss=0.08925, simple_loss=0.1067, pruned_loss=0.0251, audio_tagging_loss=0.01082, over 3029325.35 frames. ], batch size: 62, lr: 8.82e-03, grad_scale: 32.0 2023-11-19 05:08:52,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=580413.3333333334, ans=0.0 2023-11-19 05:09:08,429 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.767e+01 8.526e+01 9.240e+01 1.001e+02 1.332e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 05:09:41,483 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 2950, loss[loss=0.09268, simple_loss=0.114, pruned_loss=0.02721, audio_tagging_loss=0.008466, over 14404.00 frames. ], tot_loss[loss=0.09018, simple_loss=0.108, pruned_loss=0.02548, audio_tagging_loss=0.01072, over 3036992.40 frames. ], batch size: 56, lr: 8.81e-03, grad_scale: 32.0 2023-11-19 05:09:44,131 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.99 vs. limit=15.0 2023-11-19 05:10:11,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=580880.0, ans=0.125 2023-11-19 05:10:13,573 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.50 vs. limit=15.0 2023-11-19 05:10:22,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=580946.6666666666, ans=0.0 2023-11-19 05:10:36,739 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3000, loss[loss=0.09934, simple_loss=0.1259, pruned_loss=0.02785, audio_tagging_loss=0.008527, over 14065.00 frames. ], tot_loss[loss=0.09116, simple_loss=0.1091, pruned_loss=0.02585, audio_tagging_loss=0.01076, over 3043605.97 frames. ], batch size: 53, lr: 8.81e-03, grad_scale: 32.0 2023-11-19 05:10:36,742 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 05:11:00,816 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([0.9004, 3.1827, 2.6883, 2.7214, 3.6848, 3.7296, 2.9332, 3.6638], device='cuda:0') 2023-11-19 05:11:08,554 INFO [train_asr.py:1147] (0/4) Epoch 8, validation: loss=0.06637, simple_loss=0.05694, pruned_loss=0.00724, audio_tagging_loss=0.03066, over 4681554.00 frames. 2023-11-19 05:11:08,555 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 05:11:16,700 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:11:19,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=581146.6666666666, ans=0.2 2023-11-19 05:11:31,374 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.149e+01 8.419e+01 9.133e+01 9.790e+01 1.190e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 05:11:31,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=581213.3333333334, ans=0.125 2023-11-19 05:11:35,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=581213.3333333334, ans=0.125 2023-11-19 05:11:57,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=581346.6666666666, ans=0.1 2023-11-19 05:12:04,543 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3050, loss[loss=0.09248, simple_loss=0.1072, pruned_loss=0.02545, audio_tagging_loss=0.01342, over 15788.00 frames. ], tot_loss[loss=0.09111, simple_loss=0.1089, pruned_loss=0.02585, audio_tagging_loss=0.01079, over 3043614.73 frames. ], batch size: 60, lr: 8.81e-03, grad_scale: 32.0 2023-11-19 05:12:07,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.16 vs. limit=15.0 2023-11-19 05:12:11,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=581413.3333333334, ans=0.0 2023-11-19 05:12:14,689 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.82 vs. limit=15.0 2023-11-19 05:12:26,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=581546.6666666666, ans=0.0 2023-11-19 05:12:36,521 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:12:54,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=581680.0, ans=0.1 2023-11-19 05:12:59,898 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3100, loss[loss=0.08247, simple_loss=0.103, pruned_loss=0.02164, audio_tagging_loss=0.009311, over 14671.00 frames. ], tot_loss[loss=0.09088, simple_loss=0.1088, pruned_loss=0.0257, audio_tagging_loss=0.01079, over 3044510.50 frames. ], batch size: 57, lr: 8.81e-03, grad_scale: 32.0 2023-11-19 05:13:00,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=581746.6666666666, ans=0.2 2023-11-19 05:13:10,803 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.45 vs. limit=12.0 2023-11-19 05:13:24,919 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.266e+01 8.767e+01 9.718e+01 1.090e+02 1.747e+02, threshold=1.944e+02, percent-clipped=0.0 2023-11-19 05:13:31,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.max_abs, batch_count=581880.0, ans=10.0 2023-11-19 05:13:54,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=582080.0, ans=0.1 2023-11-19 05:13:55,665 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3150, loss[loss=0.07635, simple_loss=0.08441, pruned_loss=0.02019, audio_tagging_loss=0.01395, over 15434.00 frames. ], tot_loss[loss=0.09152, simple_loss=0.1094, pruned_loss=0.02593, audio_tagging_loss=0.01088, over 3045145.56 frames. ], batch size: 59, lr: 8.80e-03, grad_scale: 16.0 2023-11-19 05:14:51,789 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3200, loss[loss=0.07861, simple_loss=0.09663, pruned_loss=0.01965, audio_tagging_loss=0.01065, over 15927.00 frames. ], tot_loss[loss=0.09156, simple_loss=0.1095, pruned_loss=0.02584, audio_tagging_loss=0.01097, over 3050395.39 frames. ], batch size: 59, lr: 8.80e-03, grad_scale: 32.0 2023-11-19 05:15:15,773 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.106e+01 8.369e+01 9.165e+01 1.003e+02 1.359e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-19 05:15:20,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=582546.6666666666, ans=0.0 2023-11-19 05:15:22,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=582546.6666666666, ans=0.0 2023-11-19 05:15:27,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=582613.3333333334, ans=0.125 2023-11-19 05:15:39,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=582680.0, ans=0.0 2023-11-19 05:15:47,235 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3250, loss[loss=0.08074, simple_loss=0.1094, pruned_loss=0.01858, audio_tagging_loss=0.007457, over 16024.00 frames. ], tot_loss[loss=0.09114, simple_loss=0.109, pruned_loss=0.02558, audio_tagging_loss=0.01108, over 3045023.93 frames. ], batch size: 56, lr: 8.80e-03, grad_scale: 32.0 2023-11-19 05:15:59,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=582813.3333333334, ans=0.1 2023-11-19 05:16:25,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=582946.6666666666, ans=0.05 2023-11-19 05:16:27,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=582946.6666666666, ans=0.125 2023-11-19 05:16:37,788 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.81 vs. limit=15.0 2023-11-19 05:16:42,947 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3300, loss[loss=0.1215, simple_loss=0.1377, pruned_loss=0.0373, audio_tagging_loss=0.0154, over 15950.00 frames. ], tot_loss[loss=0.09169, simple_loss=0.1095, pruned_loss=0.02578, audio_tagging_loss=0.01116, over 3044968.00 frames. ], batch size: 56, lr: 8.80e-03, grad_scale: 32.0 2023-11-19 05:16:43,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=583080.0, ans=0.0 2023-11-19 05:16:59,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=583146.6666666666, ans=0.0 2023-11-19 05:16:59,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=583146.6666666666, ans=0.2 2023-11-19 05:17:04,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=583213.3333333334, ans=0.0 2023-11-19 05:17:08,705 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.018e+01 8.571e+01 9.137e+01 1.011e+02 1.807e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 05:17:31,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=583346.6666666666, ans=0.125 2023-11-19 05:17:38,950 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3350, loss[loss=0.09971, simple_loss=0.1126, pruned_loss=0.03451, audio_tagging_loss=0.008881, over 13892.00 frames. ], tot_loss[loss=0.09104, simple_loss=0.1086, pruned_loss=0.02566, audio_tagging_loss=0.0111, over 3047148.57 frames. ], batch size: 53, lr: 8.79e-03, grad_scale: 16.0 2023-11-19 05:18:10,921 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.71 vs. limit=22.5 2023-11-19 05:18:19,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=583613.3333333334, ans=0.0 2023-11-19 05:18:31,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=583680.0, ans=0.1 2023-11-19 05:18:34,537 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3400, loss[loss=0.09981, simple_loss=0.1188, pruned_loss=0.02933, audio_tagging_loss=0.01106, over 15439.00 frames. ], tot_loss[loss=0.09137, simple_loss=0.1092, pruned_loss=0.02583, audio_tagging_loss=0.01095, over 3046495.78 frames. ], batch size: 56, lr: 8.79e-03, grad_scale: 16.0 2023-11-19 05:18:40,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.67 vs. limit=15.0 2023-11-19 05:18:42,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=583746.6666666666, ans=0.05 2023-11-19 05:18:42,164 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.70 vs. limit=15.0 2023-11-19 05:18:45,819 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.29 vs. limit=22.5 2023-11-19 05:18:57,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=583880.0, ans=0.125 2023-11-19 05:19:00,610 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.104e+01 8.466e+01 9.111e+01 1.006e+02 1.690e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-19 05:19:05,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=583880.0, ans=0.0 2023-11-19 05:19:30,492 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3450, loss[loss=0.06892, simple_loss=0.07726, pruned_loss=0.0171, audio_tagging_loss=0.01319, over 16169.00 frames. ], tot_loss[loss=0.0923, simple_loss=0.1103, pruned_loss=0.02641, audio_tagging_loss=0.01076, over 3046700.21 frames. ], batch size: 62, lr: 8.79e-03, grad_scale: 16.0 2023-11-19 05:19:38,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=584080.0, ans=0.0 2023-11-19 05:19:41,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=584146.6666666666, ans=0.0 2023-11-19 05:19:45,839 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.34 vs. limit=15.0 2023-11-19 05:19:48,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=584146.6666666666, ans=0.125 2023-11-19 05:19:53,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=584213.3333333334, ans=0.0 2023-11-19 05:20:02,685 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.42 vs. limit=15.0 2023-11-19 05:20:05,361 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:20:12,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=584280.0, ans=0.125 2023-11-19 05:20:27,037 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3500, loss[loss=0.09061, simple_loss=0.1181, pruned_loss=0.0233, audio_tagging_loss=0.008287, over 14893.00 frames. ], tot_loss[loss=0.09147, simple_loss=0.1094, pruned_loss=0.02604, audio_tagging_loss=0.01071, over 3046114.90 frames. ], batch size: 54, lr: 8.79e-03, grad_scale: 16.0 2023-11-19 05:20:29,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=584413.3333333334, ans=0.1 2023-11-19 05:20:52,006 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.193e+01 8.420e+01 9.270e+01 9.989e+01 2.188e+02, threshold=1.854e+02, percent-clipped=1.0 2023-11-19 05:20:54,730 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:20:58,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=584546.6666666666, ans=0.2 2023-11-19 05:21:23,087 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3550, loss[loss=0.1184, simple_loss=0.1569, pruned_loss=0.0339, audio_tagging_loss=0.006055, over 15287.00 frames. ], tot_loss[loss=0.09151, simple_loss=0.1095, pruned_loss=0.02607, audio_tagging_loss=0.01067, over 3045104.00 frames. ], batch size: 55, lr: 8.78e-03, grad_scale: 16.0 2023-11-19 05:21:23,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=584746.6666666666, ans=0.1 2023-11-19 05:21:31,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=584746.6666666666, ans=0.1 2023-11-19 05:21:49,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=584880.0, ans=15.0 2023-11-19 05:22:16,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=585013.3333333334, ans=0.0 2023-11-19 05:22:19,042 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3600, loss[loss=0.07233, simple_loss=0.07642, pruned_loss=0.01753, audio_tagging_loss=0.01659, over 14357.00 frames. ], tot_loss[loss=0.09096, simple_loss=0.1088, pruned_loss=0.02594, audio_tagging_loss=0.01064, over 3047359.13 frames. ], batch size: 54, lr: 8.78e-03, grad_scale: 32.0 2023-11-19 05:22:35,608 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.47 vs. limit=22.5 2023-11-19 05:22:44,449 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.038e+01 8.417e+01 9.022e+01 1.001e+02 1.493e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 05:22:57,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=585280.0, ans=0.125 2023-11-19 05:23:15,473 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3650, loss[loss=0.1157, simple_loss=0.1358, pruned_loss=0.03841, audio_tagging_loss=0.009379, over 15612.00 frames. ], tot_loss[loss=0.09112, simple_loss=0.1091, pruned_loss=0.02596, audio_tagging_loss=0.01058, over 3043630.97 frames. ], batch size: 58, lr: 8.78e-03, grad_scale: 32.0 2023-11-19 05:23:34,592 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.27 vs. limit=15.0 2023-11-19 05:23:34,746 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.60 vs. limit=15.0 2023-11-19 05:23:38,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=585546.6666666666, ans=0.125 2023-11-19 05:23:39,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=585546.6666666666, ans=0.0 2023-11-19 05:23:41,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=585546.6666666666, ans=0.1 2023-11-19 05:23:49,612 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.70 vs. limit=15.0 2023-11-19 05:23:55,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=585613.3333333334, ans=0.0 2023-11-19 05:24:07,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=585680.0, ans=0.1 2023-11-19 05:24:10,707 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3700, loss[loss=0.08574, simple_loss=0.09737, pruned_loss=0.02301, audio_tagging_loss=0.01405, over 14487.00 frames. ], tot_loss[loss=0.09187, simple_loss=0.1102, pruned_loss=0.02625, audio_tagging_loss=0.01054, over 3054342.08 frames. ], batch size: 56, lr: 8.78e-03, grad_scale: 32.0 2023-11-19 05:24:14,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=585746.6666666666, ans=0.125 2023-11-19 05:24:25,726 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.682e-01 2023-11-19 05:24:34,049 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=12.61 vs. limit=15.0 2023-11-19 05:24:37,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.128e+01 8.524e+01 9.099e+01 9.813e+01 1.282e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-19 05:24:40,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=585880.0, ans=0.0 2023-11-19 05:24:43,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=585946.6666666666, ans=0.125 2023-11-19 05:25:01,734 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.32 vs. limit=15.0 2023-11-19 05:25:04,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=586013.3333333334, ans=0.1 2023-11-19 05:25:05,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=586080.0, ans=0.1 2023-11-19 05:25:06,477 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3750, loss[loss=0.1045, simple_loss=0.1381, pruned_loss=0.02732, audio_tagging_loss=0.008123, over 15133.00 frames. ], tot_loss[loss=0.09228, simple_loss=0.1109, pruned_loss=0.02635, audio_tagging_loss=0.01046, over 3057131.13 frames. ], batch size: 54, lr: 8.77e-03, grad_scale: 32.0 2023-11-19 05:25:22,338 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.66 vs. limit=15.0 2023-11-19 05:25:25,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=586146.6666666666, ans=0.125 2023-11-19 05:25:44,653 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:26:03,054 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3800, loss[loss=0.09749, simple_loss=0.1196, pruned_loss=0.02609, audio_tagging_loss=0.01161, over 16018.00 frames. ], tot_loss[loss=0.09295, simple_loss=0.1115, pruned_loss=0.02661, audio_tagging_loss=0.01061, over 3060505.31 frames. ], batch size: 58, lr: 8.77e-03, grad_scale: 32.0 2023-11-19 05:26:20,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=586480.0, ans=0.0 2023-11-19 05:26:26,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer_ff2.min_abs, batch_count=586546.6666666666, ans=0.1 2023-11-19 05:26:27,760 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.211e+01 8.964e+01 1.013e+02 1.295e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 05:26:44,534 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-88000.pt 2023-11-19 05:26:50,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=586680.0, ans=0.2 2023-11-19 05:27:00,382 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3850, loss[loss=0.08142, simple_loss=0.09236, pruned_loss=0.01983, audio_tagging_loss=0.01541, over 15695.00 frames. ], tot_loss[loss=0.09298, simple_loss=0.1117, pruned_loss=0.02654, audio_tagging_loss=0.01062, over 3061846.98 frames. ], batch size: 58, lr: 8.77e-03, grad_scale: 32.0 2023-11-19 05:27:10,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=586813.3333333334, ans=0.125 2023-11-19 05:27:10,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=586813.3333333334, ans=0.1 2023-11-19 05:27:15,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=586813.3333333334, ans=0.0 2023-11-19 05:27:33,844 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.91 vs. limit=15.0 2023-11-19 05:27:47,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=587013.3333333334, ans=0.125 2023-11-19 05:27:56,353 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3900, loss[loss=0.07991, simple_loss=0.1038, pruned_loss=0.02003, audio_tagging_loss=0.007987, over 14522.00 frames. ], tot_loss[loss=0.09179, simple_loss=0.1102, pruned_loss=0.02603, audio_tagging_loss=0.01065, over 3051490.74 frames. ], batch size: 57, lr: 8.77e-03, grad_scale: 32.0 2023-11-19 05:28:00,669 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.17 vs. limit=12.0 2023-11-19 05:28:00,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=587080.0, ans=15.0 2023-11-19 05:28:14,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=587146.6666666666, ans=0.125 2023-11-19 05:28:16,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2023-11-19 05:28:22,403 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.241e+01 8.532e+01 9.315e+01 9.987e+01 1.482e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 05:28:28,463 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=587213.3333333334, ans=15.0 2023-11-19 05:28:51,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=587413.3333333334, ans=0.125 2023-11-19 05:28:52,870 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 3950, loss[loss=0.1274, simple_loss=0.1621, pruned_loss=0.03972, audio_tagging_loss=0.006606, over 14670.00 frames. ], tot_loss[loss=0.09167, simple_loss=0.11, pruned_loss=0.02584, audio_tagging_loss=0.01081, over 3052157.29 frames. ], batch size: 56, lr: 8.76e-03, grad_scale: 32.0 2023-11-19 05:29:06,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=587480.0, ans=0.125 2023-11-19 05:29:43,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=587680.0, ans=0.0 2023-11-19 05:29:48,577 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4000, loss[loss=0.08302, simple_loss=0.09701, pruned_loss=0.022, audio_tagging_loss=0.01252, over 15097.00 frames. ], tot_loss[loss=0.09132, simple_loss=0.1094, pruned_loss=0.02572, audio_tagging_loss=0.01092, over 3049661.87 frames. ], batch size: 56, lr: 8.76e-03, grad_scale: 32.0 2023-11-19 05:29:52,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=587746.6666666666, ans=0.125 2023-11-19 05:29:58,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=587813.3333333334, ans=0.0 2023-11-19 05:30:01,962 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.13 vs. limit=15.0 2023-11-19 05:30:02,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=587813.3333333334, ans=0.1 2023-11-19 05:30:07,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=587813.3333333334, ans=0.0 2023-11-19 05:30:14,663 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.556e+01 9.306e+01 1.024e+02 1.346e+02, threshold=1.861e+02, percent-clipped=0.0 2023-11-19 05:30:22,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=587946.6666666666, ans=0.0 2023-11-19 05:30:42,701 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.41 vs. limit=15.0 2023-11-19 05:30:44,078 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4050, loss[loss=0.09419, simple_loss=0.1124, pruned_loss=0.02689, audio_tagging_loss=0.0111, over 16913.00 frames. ], tot_loss[loss=0.09165, simple_loss=0.1099, pruned_loss=0.02577, audio_tagging_loss=0.01094, over 3050718.94 frames. ], batch size: 62, lr: 8.76e-03, grad_scale: 32.0 2023-11-19 05:30:46,194 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:31:10,513 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=588213.3333333334, ans=0.015 2023-11-19 05:31:20,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=588280.0, ans=0.125 2023-11-19 05:31:23,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=588280.0, ans=0.0 2023-11-19 05:31:25,504 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=588280.0, ans=0.0 2023-11-19 05:31:39,958 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=588413.3333333334, ans=0.1 2023-11-19 05:31:40,827 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4100, loss[loss=0.08134, simple_loss=0.1033, pruned_loss=0.02286, audio_tagging_loss=0.006834, over 15605.00 frames. ], tot_loss[loss=0.09093, simple_loss=0.1087, pruned_loss=0.02562, audio_tagging_loss=0.01095, over 3043948.36 frames. ], batch size: 59, lr: 8.76e-03, grad_scale: 32.0 2023-11-19 05:32:00,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=588480.0, ans=0.125 2023-11-19 05:32:05,811 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.987e+01 8.777e+01 9.323e+01 1.010e+02 1.338e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 05:32:27,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.97 vs. limit=10.0 2023-11-19 05:32:36,885 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4150, loss[loss=0.08271, simple_loss=0.1034, pruned_loss=0.01957, audio_tagging_loss=0.01145, over 16588.00 frames. ], tot_loss[loss=0.09076, simple_loss=0.1089, pruned_loss=0.02551, audio_tagging_loss=0.01079, over 3041777.68 frames. ], batch size: 64, lr: 8.75e-03, grad_scale: 32.0 2023-11-19 05:32:57,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=588880.0, ans=0.125 2023-11-19 05:33:04,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=588880.0, ans=0.0 2023-11-19 05:33:16,894 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:33:31,686 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4200, loss[loss=0.0764, simple_loss=0.08771, pruned_loss=0.02321, audio_tagging_loss=0.009341, over 15739.00 frames. ], tot_loss[loss=0.09018, simple_loss=0.1081, pruned_loss=0.02536, audio_tagging_loss=0.01074, over 3043454.44 frames. ], batch size: 61, lr: 8.75e-03, grad_scale: 32.0 2023-11-19 05:33:37,416 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.12 vs. limit=15.0 2023-11-19 05:33:53,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=589146.6666666666, ans=0.125 2023-11-19 05:33:53,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=589146.6666666666, ans=0.125 2023-11-19 05:33:58,305 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.731e+01 8.836e+01 9.609e+01 1.061e+02 1.544e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-19 05:34:09,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=589280.0, ans=0.0 2023-11-19 05:34:18,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=589346.6666666666, ans=0.0 2023-11-19 05:34:19,192 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.52 vs. limit=15.0 2023-11-19 05:34:28,209 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4250, loss[loss=0.07139, simple_loss=0.08478, pruned_loss=0.02087, audio_tagging_loss=0.008127, over 16056.00 frames. ], tot_loss[loss=0.09023, simple_loss=0.1085, pruned_loss=0.02535, audio_tagging_loss=0.0106, over 3045918.57 frames. ], batch size: 60, lr: 8.75e-03, grad_scale: 32.0 2023-11-19 05:34:38,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=589413.3333333334, ans=0.0 2023-11-19 05:34:39,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=589480.0, ans=0.05 2023-11-19 05:34:39,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=589480.0, ans=0.1 2023-11-19 05:34:46,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=589480.0, ans=0.125 2023-11-19 05:34:55,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=589546.6666666666, ans=0.0 2023-11-19 05:34:55,147 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=589546.6666666666, ans=0.05 2023-11-19 05:35:06,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=589613.3333333334, ans=0.125 2023-11-19 05:35:06,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=589613.3333333334, ans=0.125 2023-11-19 05:35:22,860 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.49 vs. limit=6.0 2023-11-19 05:35:24,461 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4300, loss[loss=0.1052, simple_loss=0.1277, pruned_loss=0.0303, audio_tagging_loss=0.01107, over 15272.00 frames. ], tot_loss[loss=0.09164, simple_loss=0.1104, pruned_loss=0.02595, audio_tagging_loss=0.01047, over 3040469.32 frames. ], batch size: 55, lr: 8.75e-03, grad_scale: 32.0 2023-11-19 05:35:24,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=589746.6666666666, ans=0.0 2023-11-19 05:35:39,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=589813.3333333334, ans=0.2 2023-11-19 05:35:49,994 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 8.689e+01 9.452e+01 1.019e+02 1.393e+02, threshold=1.890e+02, percent-clipped=0.0 2023-11-19 05:35:57,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=589946.6666666666, ans=0.1 2023-11-19 05:36:02,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=589946.6666666666, ans=0.0 2023-11-19 05:36:05,719 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.24 vs. limit=15.0 2023-11-19 05:36:10,870 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:36:18,938 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4350, loss[loss=0.06139, simple_loss=0.06986, pruned_loss=0.01352, audio_tagging_loss=0.01294, over 15308.00 frames. ], tot_loss[loss=0.09178, simple_loss=0.1104, pruned_loss=0.02608, audio_tagging_loss=0.01047, over 3036214.92 frames. ], batch size: 59, lr: 8.74e-03, grad_scale: 16.0 2023-11-19 05:36:38,086 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:36:42,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=590213.3333333334, ans=0.1 2023-11-19 05:36:45,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=590213.3333333334, ans=0.125 2023-11-19 05:36:45,502 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=590213.3333333334, ans=0.125 2023-11-19 05:36:58,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.67 vs. limit=15.0 2023-11-19 05:37:05,751 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.53 vs. limit=22.5 2023-11-19 05:37:14,722 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4400, loss[loss=0.1002, simple_loss=0.1286, pruned_loss=0.02701, audio_tagging_loss=0.008932, over 14855.00 frames. ], tot_loss[loss=0.09141, simple_loss=0.1099, pruned_loss=0.02598, audio_tagging_loss=0.01048, over 3039050.26 frames. ], batch size: 55, lr: 8.74e-03, grad_scale: 32.0 2023-11-19 05:37:21,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=590413.3333333334, ans=0.1 2023-11-19 05:37:41,048 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.028e+01 8.473e+01 9.223e+01 1.006e+02 1.257e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 05:37:51,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=590613.3333333334, ans=0.1 2023-11-19 05:37:53,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=590613.3333333334, ans=0.0 2023-11-19 05:38:00,879 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.25 vs. limit=15.0 2023-11-19 05:38:11,087 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4450, loss[loss=0.107, simple_loss=0.1383, pruned_loss=0.03159, audio_tagging_loss=0.006321, over 16283.00 frames. ], tot_loss[loss=0.09152, simple_loss=0.11, pruned_loss=0.02609, audio_tagging_loss=0.01043, over 3043413.11 frames. ], batch size: 59, lr: 8.74e-03, grad_scale: 32.0 2023-11-19 05:38:21,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=590813.3333333334, ans=0.125 2023-11-19 05:39:03,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=591013.3333333334, ans=0.05 2023-11-19 05:39:04,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=591013.3333333334, ans=0.125 2023-11-19 05:39:06,196 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4500, loss[loss=0.07897, simple_loss=0.09835, pruned_loss=0.01905, audio_tagging_loss=0.01074, over 16444.00 frames. ], tot_loss[loss=0.09157, simple_loss=0.1102, pruned_loss=0.02604, audio_tagging_loss=0.01045, over 3044281.38 frames. ], batch size: 60, lr: 8.74e-03, grad_scale: 32.0 2023-11-19 05:39:31,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=591213.3333333334, ans=0.0 2023-11-19 05:39:33,307 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.215e+01 8.438e+01 9.340e+01 1.045e+02 1.489e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 05:39:55,343 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=591346.6666666666, ans=0.125 2023-11-19 05:39:57,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=591346.6666666666, ans=0.125 2023-11-19 05:39:59,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=591346.6666666666, ans=0.2 2023-11-19 05:40:02,484 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4550, loss[loss=0.09142, simple_loss=0.1009, pruned_loss=0.03073, audio_tagging_loss=0.01026, over 14931.00 frames. ], tot_loss[loss=0.09115, simple_loss=0.1096, pruned_loss=0.02582, audio_tagging_loss=0.01054, over 3047391.33 frames. ], batch size: 56, lr: 8.73e-03, grad_scale: 32.0 2023-11-19 05:40:03,229 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.36 vs. limit=15.0 2023-11-19 05:40:03,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=591413.3333333334, ans=0.2 2023-11-19 05:40:18,165 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=591480.0, ans=0.07 2023-11-19 05:40:44,891 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 05:40:45,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=591613.3333333334, ans=0.0 2023-11-19 05:40:58,036 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4600, loss[loss=0.1055, simple_loss=0.1197, pruned_loss=0.03359, audio_tagging_loss=0.01204, over 14254.00 frames. ], tot_loss[loss=0.09123, simple_loss=0.1095, pruned_loss=0.0258, audio_tagging_loss=0.01067, over 3052306.95 frames. ], batch size: 55, lr: 8.73e-03, grad_scale: 16.0 2023-11-19 05:41:10,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=591813.3333333334, ans=0.125 2023-11-19 05:41:14,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=591813.3333333334, ans=0.125 2023-11-19 05:41:25,353 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.649e+01 8.473e+01 9.207e+01 1.048e+02 1.617e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 05:41:31,501 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=591946.6666666666, ans=0.125 2023-11-19 05:41:41,997 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.54 vs. limit=15.0 2023-11-19 05:41:53,875 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4650, loss[loss=0.08357, simple_loss=0.09923, pruned_loss=0.02349, audio_tagging_loss=0.01046, over 15315.00 frames. ], tot_loss[loss=0.09096, simple_loss=0.109, pruned_loss=0.02572, audio_tagging_loss=0.01074, over 3044758.69 frames. ], batch size: 57, lr: 8.73e-03, grad_scale: 16.0 2023-11-19 05:41:58,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=592080.0, ans=0.125 2023-11-19 05:42:05,823 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:42:15,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=592213.3333333334, ans=0.125 2023-11-19 05:42:35,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=592280.0, ans=0.1 2023-11-19 05:42:40,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=592346.6666666666, ans=0.0 2023-11-19 05:42:42,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=592346.6666666666, ans=0.0 2023-11-19 05:42:49,076 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4700, loss[loss=0.1263, simple_loss=0.1595, pruned_loss=0.03793, audio_tagging_loss=0.008629, over 15386.00 frames. ], tot_loss[loss=0.09154, simple_loss=0.11, pruned_loss=0.02583, audio_tagging_loss=0.01069, over 3038646.87 frames. ], batch size: 55, lr: 8.73e-03, grad_scale: 16.0 2023-11-19 05:42:54,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.30 vs. limit=15.0 2023-11-19 05:43:13,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=592546.6666666666, ans=0.125 2023-11-19 05:43:13,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=592546.6666666666, ans=0.0 2023-11-19 05:43:17,542 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.009e+01 8.535e+01 9.340e+01 1.049e+02 1.659e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 05:43:26,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=592613.3333333334, ans=0.1 2023-11-19 05:43:30,709 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.34 vs. limit=12.0 2023-11-19 05:43:37,861 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=592680.0, ans=0.0 2023-11-19 05:43:37,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=592680.0, ans=0.125 2023-11-19 05:43:45,574 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4750, loss[loss=0.1041, simple_loss=0.1314, pruned_loss=0.02687, audio_tagging_loss=0.0115, over 15276.00 frames. ], tot_loss[loss=0.09108, simple_loss=0.1094, pruned_loss=0.0256, audio_tagging_loss=0.01075, over 3037103.75 frames. ], batch size: 57, lr: 8.72e-03, grad_scale: 16.0 2023-11-19 05:43:52,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=592746.6666666666, ans=0.025 2023-11-19 05:44:22,738 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.97 vs. limit=22.5 2023-11-19 05:44:41,353 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4800, loss[loss=0.06054, simple_loss=0.06911, pruned_loss=0.01372, audio_tagging_loss=0.01226, over 15781.00 frames. ], tot_loss[loss=0.09073, simple_loss=0.1088, pruned_loss=0.02548, audio_tagging_loss=0.01085, over 3040167.22 frames. ], batch size: 62, lr: 8.72e-03, grad_scale: 32.0 2023-11-19 05:44:52,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=593146.6666666666, ans=0.125 2023-11-19 05:44:54,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=593146.6666666666, ans=0.125 2023-11-19 05:45:08,995 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.082e+01 8.537e+01 9.216e+01 9.797e+01 1.751e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-19 05:45:33,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.70 vs. limit=15.0 2023-11-19 05:45:36,330 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4850, loss[loss=0.07992, simple_loss=0.1021, pruned_loss=0.02128, audio_tagging_loss=0.007606, over 16261.00 frames. ], tot_loss[loss=0.09064, simple_loss=0.1082, pruned_loss=0.02551, audio_tagging_loss=0.01105, over 3042353.17 frames. ], batch size: 62, lr: 8.72e-03, grad_scale: 32.0 2023-11-19 05:45:40,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=593413.3333333334, ans=0.125 2023-11-19 05:45:56,984 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.60 vs. limit=15.0 2023-11-19 05:46:02,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=593546.6666666666, ans=0.125 2023-11-19 05:46:10,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=593613.3333333334, ans=0.0 2023-11-19 05:46:18,359 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.11 vs. limit=12.0 2023-11-19 05:46:27,987 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=593680.0, ans=0.1 2023-11-19 05:46:30,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=593746.6666666666, ans=0.2 2023-11-19 05:46:31,515 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4900, loss[loss=0.09373, simple_loss=0.1178, pruned_loss=0.02577, audio_tagging_loss=0.00906, over 15837.00 frames. ], tot_loss[loss=0.09035, simple_loss=0.1079, pruned_loss=0.02543, audio_tagging_loss=0.01097, over 3043461.57 frames. ], batch size: 58, lr: 8.72e-03, grad_scale: 32.0 2023-11-19 05:46:32,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=593746.6666666666, ans=0.02 2023-11-19 05:46:51,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=593813.3333333334, ans=0.2 2023-11-19 05:46:52,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=593880.0, ans=0.2 2023-11-19 05:46:58,716 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.397e+01 9.302e+01 1.001e+02 1.305e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 05:47:08,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=593946.6666666666, ans=0.0 2023-11-19 05:47:24,671 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=594013.3333333334, ans=0.125 2023-11-19 05:47:24,679 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=594013.3333333334, ans=0.0 2023-11-19 05:47:26,552 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 4950, loss[loss=0.107, simple_loss=0.1313, pruned_loss=0.03211, audio_tagging_loss=0.009232, over 15093.00 frames. ], tot_loss[loss=0.09098, simple_loss=0.1092, pruned_loss=0.02563, audio_tagging_loss=0.01077, over 3044729.84 frames. ], batch size: 56, lr: 8.71e-03, grad_scale: 32.0 2023-11-19 05:48:05,093 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=7.99 vs. limit=15.0 2023-11-19 05:48:12,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=594346.6666666666, ans=0.0 2023-11-19 05:48:22,027 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5000, loss[loss=0.1031, simple_loss=0.1226, pruned_loss=0.03192, audio_tagging_loss=0.009877, over 15172.00 frames. ], tot_loss[loss=0.09077, simple_loss=0.1088, pruned_loss=0.02566, audio_tagging_loss=0.01069, over 3044145.29 frames. ], batch size: 56, lr: 8.71e-03, grad_scale: 16.0 2023-11-19 05:48:23,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=594413.3333333334, ans=0.125 2023-11-19 05:48:23,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=594413.3333333334, ans=0.125 2023-11-19 05:48:29,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=594413.3333333334, ans=0.025 2023-11-19 05:48:39,808 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.09 vs. limit=12.0 2023-11-19 05:48:51,118 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.368e+01 8.482e+01 9.400e+01 1.026e+02 1.313e+02, threshold=1.880e+02, percent-clipped=0.0 2023-11-19 05:49:18,524 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5050, loss[loss=0.0924, simple_loss=0.1082, pruned_loss=0.02827, audio_tagging_loss=0.01005, over 15339.00 frames. ], tot_loss[loss=0.09049, simple_loss=0.1087, pruned_loss=0.02555, audio_tagging_loss=0.01061, over 3047538.34 frames. ], batch size: 60, lr: 8.71e-03, grad_scale: 16.0 2023-11-19 05:49:47,697 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.32 vs. limit=15.0 2023-11-19 05:50:01,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=594946.6666666666, ans=0.1 2023-11-19 05:50:01,691 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.73 vs. limit=10.0 2023-11-19 05:50:14,333 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5100, loss[loss=0.08157, simple_loss=0.09322, pruned_loss=0.02122, audio_tagging_loss=0.01374, over 16551.00 frames. ], tot_loss[loss=0.09155, simple_loss=0.1104, pruned_loss=0.02585, audio_tagging_loss=0.0105, over 3051270.38 frames. ], batch size: 63, lr: 8.71e-03, grad_scale: 16.0 2023-11-19 05:50:18,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=595080.0, ans=0.0 2023-11-19 05:50:20,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=595080.0, ans=0.1 2023-11-19 05:50:24,077 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=595146.6666666666, ans=0.125 2023-11-19 05:50:25,658 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=9.65 vs. limit=15.0 2023-11-19 05:50:34,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=595146.6666666666, ans=0.04949747468305833 2023-11-19 05:50:43,360 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.142e+01 8.421e+01 9.048e+01 1.022e+02 1.450e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 05:50:46,735 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=6.15 vs. limit=15.0 2023-11-19 05:50:47,300 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=595280.0, ans=0.0 2023-11-19 05:50:59,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=595346.6666666666, ans=0.125 2023-11-19 05:51:06,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=595346.6666666666, ans=0.2 2023-11-19 05:51:09,264 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5150, loss[loss=0.07564, simple_loss=0.08181, pruned_loss=0.02514, audio_tagging_loss=0.009594, over 14169.00 frames. ], tot_loss[loss=0.09004, simple_loss=0.1082, pruned_loss=0.02536, audio_tagging_loss=0.01057, over 3045939.91 frames. ], batch size: 53, lr: 8.71e-03, grad_scale: 16.0 2023-11-19 05:51:10,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=595413.3333333334, ans=0.0 2023-11-19 05:51:18,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=595413.3333333334, ans=0.125 2023-11-19 05:51:23,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=595480.0, ans=0.125 2023-11-19 05:52:02,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=595680.0, ans=0.0 2023-11-19 05:52:05,714 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5200, loss[loss=0.1012, simple_loss=0.1267, pruned_loss=0.028, audio_tagging_loss=0.009887, over 16331.00 frames. ], tot_loss[loss=0.0917, simple_loss=0.1103, pruned_loss=0.02607, audio_tagging_loss=0.01048, over 3054761.11 frames. ], batch size: 59, lr: 8.70e-03, grad_scale: 32.0 2023-11-19 05:52:14,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=595746.6666666666, ans=0.125 2023-11-19 05:52:19,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=595813.3333333334, ans=0.125 2023-11-19 05:52:25,792 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.43 vs. limit=15.0 2023-11-19 05:52:33,595 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.530e+01 8.322e+01 8.934e+01 9.832e+01 1.211e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-19 05:52:34,121 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.32 vs. limit=15.0 2023-11-19 05:52:46,299 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.65 vs. limit=6.0 2023-11-19 05:52:47,850 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.95 vs. limit=15.0 2023-11-19 05:52:54,943 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.73 vs. limit=5.0 2023-11-19 05:53:01,445 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5250, loss[loss=0.06365, simple_loss=0.07983, pruned_loss=0.01508, audio_tagging_loss=0.008658, over 15351.00 frames. ], tot_loss[loss=0.09104, simple_loss=0.1095, pruned_loss=0.02583, audio_tagging_loss=0.01048, over 3056007.81 frames. ], batch size: 57, lr: 8.70e-03, grad_scale: 32.0 2023-11-19 05:53:01,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=596080.0, ans=0.125 2023-11-19 05:53:02,721 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:53:05,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer_ff3.min_abs, batch_count=596080.0, ans=0.2 2023-11-19 05:53:15,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=596146.6666666666, ans=0.2 2023-11-19 05:53:15,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=596146.6666666666, ans=0.2 2023-11-19 05:53:16,529 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:53:36,702 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.49 vs. limit=6.0 2023-11-19 05:53:52,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=596346.6666666666, ans=0.125 2023-11-19 05:53:56,339 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5300, loss[loss=0.113, simple_loss=0.1473, pruned_loss=0.03371, audio_tagging_loss=0.005689, over 15392.00 frames. ], tot_loss[loss=0.09064, simple_loss=0.1089, pruned_loss=0.02565, audio_tagging_loss=0.01055, over 3046218.35 frames. ], batch size: 56, lr: 8.70e-03, grad_scale: 16.0 2023-11-19 05:54:13,211 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.63 vs. limit=15.0 2023-11-19 05:54:26,891 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.349e+01 8.841e+01 9.894e+01 1.112e+02 1.416e+02, threshold=1.979e+02, percent-clipped=0.0 2023-11-19 05:54:30,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=596613.3333333334, ans=0.1 2023-11-19 05:54:33,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.min_positive, batch_count=596613.3333333334, ans=0.05 2023-11-19 05:54:49,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=596680.0, ans=0.0 2023-11-19 05:54:52,767 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5350, loss[loss=0.08065, simple_loss=0.08276, pruned_loss=0.02756, audio_tagging_loss=0.01171, over 15413.00 frames. ], tot_loss[loss=0.09175, simple_loss=0.1102, pruned_loss=0.02616, audio_tagging_loss=0.01051, over 3045852.99 frames. ], batch size: 58, lr: 8.70e-03, grad_scale: 16.0 2023-11-19 05:54:57,529 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=12.0 2023-11-19 05:55:13,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=596880.0, ans=0.0 2023-11-19 05:55:15,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=596880.0, ans=0.0 2023-11-19 05:55:20,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=596880.0, ans=0.2 2023-11-19 05:55:24,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=596946.6666666666, ans=0.125 2023-11-19 05:55:38,851 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.72 vs. limit=15.0 2023-11-19 05:55:48,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=22.45 vs. limit=22.5 2023-11-19 05:55:48,439 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5400, loss[loss=0.09592, simple_loss=0.1249, pruned_loss=0.02479, audio_tagging_loss=0.008659, over 14510.00 frames. ], tot_loss[loss=0.09127, simple_loss=0.1094, pruned_loss=0.02593, audio_tagging_loss=0.01066, over 3042514.03 frames. ], batch size: 53, lr: 8.69e-03, grad_scale: 16.0 2023-11-19 05:55:52,945 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 05:55:54,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=597080.0, ans=0.125 2023-11-19 05:55:55,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=597080.0, ans=0.0 2023-11-19 05:55:58,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=597146.6666666666, ans=0.125 2023-11-19 05:56:03,883 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.41 vs. limit=22.5 2023-11-19 05:56:07,886 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2023-11-19 05:56:08,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.81 vs. limit=15.0 2023-11-19 05:56:17,853 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.569e+01 9.325e+01 1.031e+02 1.430e+02, threshold=1.865e+02, percent-clipped=0.0 2023-11-19 05:56:19,105 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.11 vs. limit=22.5 2023-11-19 05:56:21,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=597280.0, ans=0.1 2023-11-19 05:56:25,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=597280.0, ans=0.05 2023-11-19 05:56:33,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=597346.6666666666, ans=0.2 2023-11-19 05:56:37,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=597346.6666666666, ans=0.125 2023-11-19 05:56:39,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=597346.6666666666, ans=0.1 2023-11-19 05:56:42,034 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.19 vs. limit=15.0 2023-11-19 05:56:43,627 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5450, loss[loss=0.08352, simple_loss=0.09244, pruned_loss=0.02515, audio_tagging_loss=0.01214, over 14321.00 frames. ], tot_loss[loss=0.09103, simple_loss=0.1087, pruned_loss=0.02586, audio_tagging_loss=0.0108, over 3043861.66 frames. ], batch size: 54, lr: 8.69e-03, grad_scale: 16.0 2023-11-19 05:57:00,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=597480.0, ans=0.2 2023-11-19 05:57:30,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=597680.0, ans=0.0 2023-11-19 05:57:35,372 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=597680.0, ans=0.95 2023-11-19 05:57:39,856 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5500, loss[loss=0.09177, simple_loss=0.09714, pruned_loss=0.03129, audio_tagging_loss=0.01192, over 15241.00 frames. ], tot_loss[loss=0.09112, simple_loss=0.109, pruned_loss=0.02584, audio_tagging_loss=0.01077, over 3047660.35 frames. ], batch size: 56, lr: 8.69e-03, grad_scale: 16.0 2023-11-19 05:57:45,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=597746.6666666666, ans=0.0 2023-11-19 05:58:00,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=597813.3333333334, ans=0.2 2023-11-19 05:58:02,741 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.98 vs. limit=22.5 2023-11-19 05:58:09,527 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.153e+01 8.477e+01 9.706e+01 1.076e+02 1.326e+02, threshold=1.941e+02, percent-clipped=0.0 2023-11-19 05:58:17,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=597946.6666666666, ans=0.0 2023-11-19 05:58:35,507 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5550, loss[loss=0.08599, simple_loss=0.0991, pruned_loss=0.02611, audio_tagging_loss=0.01033, over 15312.00 frames. ], tot_loss[loss=0.09172, simple_loss=0.11, pruned_loss=0.02594, audio_tagging_loss=0.01078, over 3050881.31 frames. ], batch size: 55, lr: 8.69e-03, grad_scale: 16.0 2023-11-19 05:58:47,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=598146.6666666666, ans=0.125 2023-11-19 05:58:49,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=598146.6666666666, ans=0.125 2023-11-19 05:58:51,917 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.81 vs. limit=22.5 2023-11-19 05:59:25,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=598346.6666666666, ans=0.1 2023-11-19 05:59:30,945 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5600, loss[loss=0.1046, simple_loss=0.125, pruned_loss=0.0318, audio_tagging_loss=0.01031, over 15725.00 frames. ], tot_loss[loss=0.09178, simple_loss=0.11, pruned_loss=0.02589, audio_tagging_loss=0.01087, over 3043209.82 frames. ], batch size: 59, lr: 8.68e-03, grad_scale: 16.0 2023-11-19 05:59:34,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=598413.3333333334, ans=0.2 2023-11-19 05:59:47,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=598480.0, ans=0.125 2023-11-19 06:00:00,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=598546.6666666666, ans=0.0 2023-11-19 06:00:02,416 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.036e+01 8.552e+01 9.221e+01 1.020e+02 1.317e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 06:00:05,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=598613.3333333334, ans=0.0 2023-11-19 06:00:06,244 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.09 vs. limit=6.0 2023-11-19 06:00:10,759 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:00:19,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=598680.0, ans=0.0 2023-11-19 06:00:20,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=598680.0, ans=0.0 2023-11-19 06:00:26,094 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=598746.6666666666, ans=0.1 2023-11-19 06:00:27,014 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5650, loss[loss=0.0988, simple_loss=0.1087, pruned_loss=0.03394, audio_tagging_loss=0.01051, over 15020.00 frames. ], tot_loss[loss=0.09174, simple_loss=0.1095, pruned_loss=0.02598, audio_tagging_loss=0.01102, over 3056552.31 frames. ], batch size: 57, lr: 8.68e-03, grad_scale: 16.0 2023-11-19 06:00:37,432 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=598813.3333333334, ans=0.0 2023-11-19 06:01:05,437 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:01:08,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.prob, batch_count=598946.6666666666, ans=0.125 2023-11-19 06:01:11,013 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.26 vs. limit=15.0 2023-11-19 06:01:22,500 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5700, loss[loss=0.1131, simple_loss=0.1333, pruned_loss=0.03542, audio_tagging_loss=0.011, over 15060.00 frames. ], tot_loss[loss=0.0917, simple_loss=0.1093, pruned_loss=0.02602, audio_tagging_loss=0.01104, over 3062884.33 frames. ], batch size: 56, lr: 8.68e-03, grad_scale: 16.0 2023-11-19 06:01:28,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=599080.0, ans=0.125 2023-11-19 06:01:29,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=599080.0, ans=0.125 2023-11-19 06:01:29,854 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.19 vs. limit=22.5 2023-11-19 06:01:41,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=599146.6666666666, ans=0.125 2023-11-19 06:01:53,408 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.633e+01 8.957e+01 9.901e+01 1.097e+02 1.583e+02, threshold=1.980e+02, percent-clipped=0.0 2023-11-19 06:01:58,106 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.05 vs. limit=15.0 2023-11-19 06:02:17,816 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5750, loss[loss=0.08808, simple_loss=0.09943, pruned_loss=0.0272, audio_tagging_loss=0.01117, over 15156.00 frames. ], tot_loss[loss=0.09184, simple_loss=0.1097, pruned_loss=0.02609, audio_tagging_loss=0.01092, over 3061743.83 frames. ], batch size: 60, lr: 8.68e-03, grad_scale: 16.0 2023-11-19 06:02:21,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=599413.3333333334, ans=0.0 2023-11-19 06:02:22,284 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=599413.3333333334, ans=0.09899494936611666 2023-11-19 06:02:30,292 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.82 vs. limit=5.0 2023-11-19 06:02:48,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=599546.6666666666, ans=0.035 2023-11-19 06:03:00,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=599613.3333333334, ans=0.125 2023-11-19 06:03:08,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=599680.0, ans=0.2 2023-11-19 06:03:13,150 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5800, loss[loss=0.1083, simple_loss=0.146, pruned_loss=0.02826, audio_tagging_loss=0.007044, over 15667.00 frames. ], tot_loss[loss=0.0912, simple_loss=0.1091, pruned_loss=0.02588, audio_tagging_loss=0.01079, over 3049012.94 frames. ], batch size: 56, lr: 8.67e-03, grad_scale: 16.0 2023-11-19 06:03:24,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=599813.3333333334, ans=0.0 2023-11-19 06:03:44,338 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.060e+01 9.064e+01 9.956e+01 1.074e+02 1.617e+02, threshold=1.991e+02, percent-clipped=0.0 2023-11-19 06:03:51,339 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.13 vs. limit=15.0 2023-11-19 06:03:51,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=599946.6666666666, ans=0.125 2023-11-19 06:03:58,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.18 vs. limit=15.0 2023-11-19 06:04:09,091 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5850, loss[loss=0.07847, simple_loss=0.08323, pruned_loss=0.02395, audio_tagging_loss=0.01291, over 14552.00 frames. ], tot_loss[loss=0.09034, simple_loss=0.1078, pruned_loss=0.0257, audio_tagging_loss=0.01075, over 3047537.03 frames. ], batch size: 56, lr: 8.67e-03, grad_scale: 16.0 2023-11-19 06:04:11,857 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.95 vs. limit=22.5 2023-11-19 06:04:15,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=600080.0, ans=0.2 2023-11-19 06:05:04,593 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5900, loss[loss=0.09874, simple_loss=0.1241, pruned_loss=0.0266, audio_tagging_loss=0.0101, over 14822.00 frames. ], tot_loss[loss=0.09008, simple_loss=0.1074, pruned_loss=0.02556, audio_tagging_loss=0.01083, over 3045459.73 frames. ], batch size: 55, lr: 8.67e-03, grad_scale: 16.0 2023-11-19 06:05:22,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=600480.0, ans=0.125 2023-11-19 06:05:27,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=600546.6666666666, ans=0.0 2023-11-19 06:05:35,553 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.084e+01 8.269e+01 8.982e+01 9.905e+01 1.254e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 06:05:42,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=600613.3333333334, ans=0.125 2023-11-19 06:05:50,537 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.14 vs. limit=10.0 2023-11-19 06:05:59,422 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 5950, loss[loss=0.08856, simple_loss=0.1041, pruned_loss=0.02612, audio_tagging_loss=0.01041, over 16385.00 frames. ], tot_loss[loss=0.08988, simple_loss=0.1077, pruned_loss=0.02537, audio_tagging_loss=0.01069, over 3046531.92 frames. ], batch size: 61, lr: 8.67e-03, grad_scale: 16.0 2023-11-19 06:06:15,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=600813.3333333334, ans=0.1 2023-11-19 06:06:18,544 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.78 vs. limit=12.0 2023-11-19 06:06:19,351 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=600813.3333333334, ans=0.125 2023-11-19 06:06:30,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=600880.0, ans=0.95 2023-11-19 06:06:31,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=600880.0, ans=0.125 2023-11-19 06:06:47,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=601013.3333333334, ans=0.125 2023-11-19 06:06:55,538 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6000, loss[loss=0.1107, simple_loss=0.138, pruned_loss=0.03263, audio_tagging_loss=0.009087, over 15298.00 frames. ], tot_loss[loss=0.08997, simple_loss=0.108, pruned_loss=0.02541, audio_tagging_loss=0.01057, over 3040775.74 frames. ], batch size: 57, lr: 8.66e-03, grad_scale: 32.0 2023-11-19 06:06:55,540 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 06:07:28,382 INFO [train_asr.py:1147] (0/4) Epoch 8, validation: loss=0.06748, simple_loss=0.0569, pruned_loss=0.007185, audio_tagging_loss=0.03185, over 4681554.00 frames. 2023-11-19 06:07:28,383 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 06:07:30,614 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=601080.0, ans=0.1 2023-11-19 06:07:32,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=601080.0, ans=0.125 2023-11-19 06:07:47,653 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=15.46 vs. limit=22.5 2023-11-19 06:07:56,169 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=601213.3333333334, ans=0.125 2023-11-19 06:07:58,803 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=21.43 vs. limit=22.5 2023-11-19 06:07:59,099 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.194e+01 8.682e+01 9.253e+01 9.954e+01 1.321e+02, threshold=1.851e+02, percent-clipped=0.0 2023-11-19 06:08:02,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=601280.0, ans=0.2 2023-11-19 06:08:07,514 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:08:23,825 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6050, loss[loss=0.08076, simple_loss=0.08853, pruned_loss=0.0245, audio_tagging_loss=0.012, over 15208.00 frames. ], tot_loss[loss=0.08993, simple_loss=0.1078, pruned_loss=0.02543, audio_tagging_loss=0.01061, over 3035345.48 frames. ], batch size: 61, lr: 8.66e-03, grad_scale: 16.0 2023-11-19 06:08:43,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=601480.0, ans=0.1 2023-11-19 06:08:52,698 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=601546.6666666666, ans=0.125 2023-11-19 06:08:57,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=601613.3333333334, ans=0.125 2023-11-19 06:09:01,477 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=601613.3333333334, ans=15.0 2023-11-19 06:09:18,658 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6100, loss[loss=0.07425, simple_loss=0.08773, pruned_loss=0.01825, audio_tagging_loss=0.01214, over 14577.00 frames. ], tot_loss[loss=0.09062, simple_loss=0.1087, pruned_loss=0.02568, audio_tagging_loss=0.01059, over 3033158.94 frames. ], batch size: 53, lr: 8.66e-03, grad_scale: 16.0 2023-11-19 06:09:29,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=601813.3333333334, ans=0.125 2023-11-19 06:09:47,097 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:09:50,159 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.701e+01 8.581e+01 9.083e+01 1.023e+02 1.492e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 06:09:51,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=601946.6666666666, ans=0.125 2023-11-19 06:09:54,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=601946.6666666666, ans=0.125 2023-11-19 06:10:12,869 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6150, loss[loss=0.08001, simple_loss=0.09989, pruned_loss=0.02076, audio_tagging_loss=0.009311, over 14738.00 frames. ], tot_loss[loss=0.0906, simple_loss=0.1088, pruned_loss=0.02558, audio_tagging_loss=0.01063, over 3036862.38 frames. ], batch size: 56, lr: 8.66e-03, grad_scale: 16.0 2023-11-19 06:10:16,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=602080.0, ans=0.1 2023-11-19 06:10:21,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=602080.0, ans=0.125 2023-11-19 06:10:47,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=602280.0, ans=0.07 2023-11-19 06:10:51,287 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=12.92 vs. limit=15.0 2023-11-19 06:10:59,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=602346.6666666666, ans=0.125 2023-11-19 06:11:08,565 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6200, loss[loss=0.1114, simple_loss=0.1441, pruned_loss=0.0322, audio_tagging_loss=0.00713, over 15011.00 frames. ], tot_loss[loss=0.08946, simple_loss=0.107, pruned_loss=0.02522, audio_tagging_loss=0.01072, over 3023754.19 frames. ], batch size: 56, lr: 8.65e-03, grad_scale: 16.0 2023-11-19 06:11:10,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=602413.3333333334, ans=0.2 2023-11-19 06:11:12,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=602413.3333333334, ans=0.125 2023-11-19 06:11:22,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=602480.0, ans=0.125 2023-11-19 06:11:25,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=602480.0, ans=0.125 2023-11-19 06:11:39,727 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.753e+01 8.400e+01 8.989e+01 9.962e+01 1.274e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 06:12:03,528 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6250, loss[loss=0.08981, simple_loss=0.1134, pruned_loss=0.0217, audio_tagging_loss=0.0114, over 15054.00 frames. ], tot_loss[loss=0.08981, simple_loss=0.1073, pruned_loss=0.02533, audio_tagging_loss=0.01085, over 3022223.83 frames. ], batch size: 55, lr: 8.65e-03, grad_scale: 16.0 2023-11-19 06:12:06,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=602746.6666666666, ans=0.0 2023-11-19 06:12:21,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=602813.3333333334, ans=0.125 2023-11-19 06:12:50,075 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.61 vs. limit=15.0 2023-11-19 06:12:58,170 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6300, loss[loss=0.0954, simple_loss=0.111, pruned_loss=0.02773, audio_tagging_loss=0.01216, over 15158.00 frames. ], tot_loss[loss=0.09014, simple_loss=0.1077, pruned_loss=0.02535, audio_tagging_loss=0.01096, over 3030912.57 frames. ], batch size: 59, lr: 8.65e-03, grad_scale: 16.0 2023-11-19 06:13:08,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=603146.6666666666, ans=0.0 2023-11-19 06:13:10,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=603146.6666666666, ans=0.95 2023-11-19 06:13:20,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=603213.3333333334, ans=0.0 2023-11-19 06:13:29,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=603213.3333333334, ans=0.125 2023-11-19 06:13:30,519 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.240e+01 8.462e+01 9.271e+01 1.035e+02 1.313e+02, threshold=1.854e+02, percent-clipped=0.0 2023-11-19 06:13:37,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=603280.0, ans=0.125 2023-11-19 06:13:42,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=603346.6666666666, ans=0.07 2023-11-19 06:13:52,816 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6350, loss[loss=0.08809, simple_loss=0.1095, pruned_loss=0.02369, audio_tagging_loss=0.00967, over 15311.00 frames. ], tot_loss[loss=0.09046, simple_loss=0.1079, pruned_loss=0.02551, audio_tagging_loss=0.01098, over 3039977.01 frames. ], batch size: 58, lr: 8.65e-03, grad_scale: 16.0 2023-11-19 06:13:53,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=603413.3333333334, ans=0.1 2023-11-19 06:14:14,707 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=603546.6666666666, ans=0.0 2023-11-19 06:14:15,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=603546.6666666666, ans=0.0 2023-11-19 06:14:29,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=603613.3333333334, ans=0.1 2023-11-19 06:14:40,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=603680.0, ans=0.0 2023-11-19 06:14:41,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=603680.0, ans=0.125 2023-11-19 06:14:48,839 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6400, loss[loss=0.08987, simple_loss=0.113, pruned_loss=0.02058, audio_tagging_loss=0.01279, over 16037.00 frames. ], tot_loss[loss=0.08985, simple_loss=0.107, pruned_loss=0.0252, audio_tagging_loss=0.01114, over 3035773.66 frames. ], batch size: 59, lr: 8.65e-03, grad_scale: 32.0 2023-11-19 06:14:49,115 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=603746.6666666666, ans=0.125 2023-11-19 06:14:49,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.52 vs. limit=22.5 2023-11-19 06:15:00,253 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:15:08,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=603813.3333333334, ans=0.125 2023-11-19 06:15:11,089 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.09 vs. limit=15.0 2023-11-19 06:15:16,063 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=603880.0, ans=0.125 2023-11-19 06:15:20,944 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.417e+01 8.708e+01 9.476e+01 1.030e+02 1.332e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-19 06:15:26,166 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.25 vs. limit=15.0 2023-11-19 06:15:44,368 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6450, loss[loss=0.08968, simple_loss=0.1087, pruned_loss=0.02659, audio_tagging_loss=0.008744, over 14993.00 frames. ], tot_loss[loss=0.09049, simple_loss=0.1078, pruned_loss=0.02545, audio_tagging_loss=0.01113, over 3033313.73 frames. ], batch size: 58, lr: 8.64e-03, grad_scale: 32.0 2023-11-19 06:16:04,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=604146.6666666666, ans=0.125 2023-11-19 06:16:04,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=604146.6666666666, ans=0.95 2023-11-19 06:16:22,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=604280.0, ans=0.125 2023-11-19 06:16:26,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=604280.0, ans=0.0 2023-11-19 06:16:26,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=604280.0, ans=0.125 2023-11-19 06:16:39,332 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6500, loss[loss=0.06119, simple_loss=0.06802, pruned_loss=0.013, audio_tagging_loss=0.01418, over 14759.00 frames. ], tot_loss[loss=0.08979, simple_loss=0.1072, pruned_loss=0.02514, audio_tagging_loss=0.01103, over 3038856.21 frames. ], batch size: 57, lr: 8.64e-03, grad_scale: 32.0 2023-11-19 06:16:49,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=604413.3333333334, ans=0.0 2023-11-19 06:16:55,882 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=604480.0, ans=0.0 2023-11-19 06:16:58,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=604480.0, ans=0.07 2023-11-19 06:17:02,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=604546.6666666666, ans=0.0 2023-11-19 06:17:11,944 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.719e+01 8.554e+01 9.296e+01 1.013e+02 1.424e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 06:17:29,381 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-11-19 06:17:31,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=604680.0, ans=0.125 2023-11-19 06:17:35,739 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6550, loss[loss=0.1051, simple_loss=0.1254, pruned_loss=0.0329, audio_tagging_loss=0.009547, over 15540.00 frames. ], tot_loss[loss=0.09058, simple_loss=0.1085, pruned_loss=0.02556, audio_tagging_loss=0.01078, over 3035705.71 frames. ], batch size: 58, lr: 8.64e-03, grad_scale: 32.0 2023-11-19 06:17:46,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.min_abs, batch_count=604813.3333333334, ans=0.5 2023-11-19 06:17:50,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=604813.3333333334, ans=0.2 2023-11-19 06:17:51,575 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.22 vs. limit=10.0 2023-11-19 06:18:01,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=604880.0, ans=0.025 2023-11-19 06:18:01,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=604880.0, ans=0.125 2023-11-19 06:18:17,238 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=604946.6666666666, ans=0.0 2023-11-19 06:18:19,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=605013.3333333334, ans=0.125 2023-11-19 06:18:23,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=605013.3333333334, ans=0.1 2023-11-19 06:18:31,328 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6600, loss[loss=0.08358, simple_loss=0.1039, pruned_loss=0.02002, audio_tagging_loss=0.01159, over 14225.00 frames. ], tot_loss[loss=0.08942, simple_loss=0.107, pruned_loss=0.02512, audio_tagging_loss=0.01078, over 3031214.27 frames. ], batch size: 53, lr: 8.64e-03, grad_scale: 32.0 2023-11-19 06:18:33,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=605080.0, ans=0.1 2023-11-19 06:19:03,867 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.408e+01 8.547e+01 9.371e+01 1.021e+02 1.350e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 06:19:26,467 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6650, loss[loss=0.1018, simple_loss=0.1198, pruned_loss=0.03185, audio_tagging_loss=0.01003, over 14966.00 frames. ], tot_loss[loss=0.08981, simple_loss=0.1076, pruned_loss=0.02521, audio_tagging_loss=0.01078, over 3033194.36 frames. ], batch size: 58, lr: 8.63e-03, grad_scale: 32.0 2023-11-19 06:19:26,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=605413.3333333334, ans=0.125 2023-11-19 06:19:28,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=605413.3333333334, ans=0.2 2023-11-19 06:19:32,563 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=605413.3333333334, ans=0.125 2023-11-19 06:19:57,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=605546.6666666666, ans=0.0 2023-11-19 06:20:22,600 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6700, loss[loss=0.07583, simple_loss=0.09767, pruned_loss=0.01813, audio_tagging_loss=0.008863, over 15140.00 frames. ], tot_loss[loss=0.09013, simple_loss=0.108, pruned_loss=0.02547, audio_tagging_loss=0.01066, over 3034235.04 frames. ], batch size: 58, lr: 8.63e-03, grad_scale: 32.0 2023-11-19 06:20:29,717 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=605746.6666666666, ans=0.09899494936611666 2023-11-19 06:20:36,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=605813.3333333334, ans=0.125 2023-11-19 06:20:50,582 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.75 vs. limit=8.0 2023-11-19 06:20:52,040 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:20:53,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=605880.0, ans=0.0 2023-11-19 06:20:53,954 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.078e+01 8.187e+01 8.900e+01 9.907e+01 1.762e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 06:20:55,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=605946.6666666666, ans=0.1 2023-11-19 06:21:03,761 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=605946.6666666666, ans=0.0 2023-11-19 06:21:14,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=606013.3333333334, ans=0.125 2023-11-19 06:21:18,315 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6750, loss[loss=0.05709, simple_loss=0.06296, pruned_loss=0.01344, audio_tagging_loss=0.01217, over 14990.00 frames. ], tot_loss[loss=0.08944, simple_loss=0.1069, pruned_loss=0.02526, audio_tagging_loss=0.01074, over 3032464.43 frames. ], batch size: 58, lr: 8.63e-03, grad_scale: 32.0 2023-11-19 06:21:27,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=606146.6666666666, ans=0.125 2023-11-19 06:21:32,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=606146.6666666666, ans=0.09899494936611666 2023-11-19 06:21:32,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=606146.6666666666, ans=0.2 2023-11-19 06:21:43,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.whiten.whitening_limit, batch_count=606213.3333333334, ans=15.0 2023-11-19 06:22:11,052 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.68 vs. limit=22.5 2023-11-19 06:22:13,370 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6800, loss[loss=0.1176, simple_loss=0.1404, pruned_loss=0.03699, audio_tagging_loss=0.01039, over 15109.00 frames. ], tot_loss[loss=0.08933, simple_loss=0.1069, pruned_loss=0.02514, audio_tagging_loss=0.01072, over 3038741.71 frames. ], batch size: 55, lr: 8.63e-03, grad_scale: 32.0 2023-11-19 06:22:31,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=606480.0, ans=0.0 2023-11-19 06:22:32,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=606480.0, ans=0.2 2023-11-19 06:22:45,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=606546.6666666666, ans=0.2 2023-11-19 06:22:45,965 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.152e+01 8.455e+01 9.265e+01 1.067e+02 1.623e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-19 06:22:50,728 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.58 vs. limit=10.0 2023-11-19 06:22:51,562 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=606613.3333333334, ans=0.0 2023-11-19 06:22:57,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=606680.0, ans=0.1 2023-11-19 06:23:03,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=12.0 2023-11-19 06:23:09,235 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6850, loss[loss=0.09494, simple_loss=0.1209, pruned_loss=0.02679, audio_tagging_loss=0.007716, over 15247.00 frames. ], tot_loss[loss=0.0897, simple_loss=0.1073, pruned_loss=0.02524, audio_tagging_loss=0.01081, over 3040748.18 frames. ], batch size: 57, lr: 8.62e-03, grad_scale: 32.0 2023-11-19 06:23:17,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=606746.6666666666, ans=0.0 2023-11-19 06:23:48,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=606946.6666666666, ans=0.125 2023-11-19 06:23:56,005 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=607013.3333333334, ans=0.125 2023-11-19 06:23:56,489 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.18 vs. limit=15.0 2023-11-19 06:24:00,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=607013.3333333334, ans=0.125 2023-11-19 06:24:04,766 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6900, loss[loss=0.1019, simple_loss=0.1289, pruned_loss=0.02662, audio_tagging_loss=0.01084, over 14595.00 frames. ], tot_loss[loss=0.09012, simple_loss=0.1078, pruned_loss=0.02547, audio_tagging_loss=0.01076, over 3046736.83 frames. ], batch size: 55, lr: 8.62e-03, grad_scale: 32.0 2023-11-19 06:24:08,025 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.76 vs. limit=22.5 2023-11-19 06:24:18,699 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.22 vs. limit=15.0 2023-11-19 06:24:34,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=607213.3333333334, ans=0.0 2023-11-19 06:24:34,425 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.07 vs. limit=15.0 2023-11-19 06:24:37,156 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.004e+01 8.334e+01 9.172e+01 9.913e+01 1.941e+02, threshold=1.834e+02, percent-clipped=1.0 2023-11-19 06:24:45,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=607280.0, ans=0.0 2023-11-19 06:24:45,564 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.60 vs. limit=15.0 2023-11-19 06:24:48,213 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:24:48,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.27 vs. limit=6.0 2023-11-19 06:24:52,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.03 vs. limit=15.0 2023-11-19 06:24:54,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=607346.6666666666, ans=0.0 2023-11-19 06:24:55,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=607346.6666666666, ans=0.125 2023-11-19 06:25:00,439 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 6950, loss[loss=0.1145, simple_loss=0.1319, pruned_loss=0.03961, audio_tagging_loss=0.008973, over 14054.00 frames. ], tot_loss[loss=0.0902, simple_loss=0.1081, pruned_loss=0.02541, audio_tagging_loss=0.01076, over 3046265.17 frames. ], batch size: 55, lr: 8.62e-03, grad_scale: 32.0 2023-11-19 06:25:02,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=607413.3333333334, ans=0.125 2023-11-19 06:25:06,238 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.50 vs. limit=15.0 2023-11-19 06:25:26,617 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=607546.6666666666, ans=0.125 2023-11-19 06:25:42,014 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=607613.3333333334, ans=0.125 2023-11-19 06:25:53,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=607680.0, ans=0.2 2023-11-19 06:25:56,742 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7000, loss[loss=0.09651, simple_loss=0.1101, pruned_loss=0.03041, audio_tagging_loss=0.01108, over 15356.00 frames. ], tot_loss[loss=0.08958, simple_loss=0.1075, pruned_loss=0.02508, audio_tagging_loss=0.01076, over 3047348.34 frames. ], batch size: 58, lr: 8.62e-03, grad_scale: 32.0 2023-11-19 06:26:05,570 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.65 vs. limit=12.0 2023-11-19 06:26:28,332 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.831e+01 8.487e+01 9.225e+01 1.011e+02 1.458e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 06:26:52,398 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7050, loss[loss=0.1015, simple_loss=0.1226, pruned_loss=0.03036, audio_tagging_loss=0.009822, over 16108.00 frames. ], tot_loss[loss=0.08955, simple_loss=0.1069, pruned_loss=0.02522, audio_tagging_loss=0.01086, over 3050038.73 frames. ], batch size: 59, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:26:52,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=608080.0, ans=0.0 2023-11-19 06:27:05,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=608146.6666666666, ans=0.125 2023-11-19 06:27:26,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=608280.0, ans=0.125 2023-11-19 06:27:27,807 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.37 vs. limit=5.0 2023-11-19 06:27:28,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=608280.0, ans=0.125 2023-11-19 06:27:28,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.45 vs. limit=15.0 2023-11-19 06:27:32,339 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.55 vs. limit=15.0 2023-11-19 06:27:39,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=608346.6666666666, ans=0.1 2023-11-19 06:27:48,167 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7100, loss[loss=0.09591, simple_loss=0.1208, pruned_loss=0.02611, audio_tagging_loss=0.009399, over 14592.00 frames. ], tot_loss[loss=0.09012, simple_loss=0.1079, pruned_loss=0.02535, audio_tagging_loss=0.01084, over 3049558.61 frames. ], batch size: 54, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:28:07,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=608480.0, ans=0.0 2023-11-19 06:28:19,656 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.211e+01 9.012e+01 9.917e+01 1.109e+02 1.355e+02, threshold=1.983e+02, percent-clipped=0.0 2023-11-19 06:28:32,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=608680.0, ans=0.125 2023-11-19 06:28:34,907 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.64 vs. limit=15.0 2023-11-19 06:28:43,568 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7150, loss[loss=0.09916, simple_loss=0.1141, pruned_loss=0.03331, audio_tagging_loss=0.008784, over 14358.00 frames. ], tot_loss[loss=0.09064, simple_loss=0.1083, pruned_loss=0.02554, audio_tagging_loss=0.01096, over 3051973.27 frames. ], batch size: 55, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:29:06,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=608880.0, ans=0.125 2023-11-19 06:29:08,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=608880.0, ans=0.125 2023-11-19 06:29:11,673 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:29:17,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=608946.6666666666, ans=0.2 2023-11-19 06:29:25,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=608946.6666666666, ans=10.0 2023-11-19 06:29:38,973 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7200, loss[loss=0.1036, simple_loss=0.1202, pruned_loss=0.03249, audio_tagging_loss=0.01103, over 15667.00 frames. ], tot_loss[loss=0.09112, simple_loss=0.109, pruned_loss=0.02563, audio_tagging_loss=0.01099, over 3048317.67 frames. ], batch size: 59, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:29:48,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.55 vs. limit=22.5 2023-11-19 06:29:56,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=609146.6666666666, ans=0.125 2023-11-19 06:30:10,691 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.882e+01 8.596e+01 9.352e+01 1.022e+02 1.385e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-19 06:30:17,206 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=609280.0, ans=0.0 2023-11-19 06:30:22,327 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2023-11-19 06:30:25,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=609346.6666666666, ans=0.0 2023-11-19 06:30:25,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=609346.6666666666, ans=0.1 2023-11-19 06:30:26,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=609346.6666666666, ans=0.0 2023-11-19 06:30:28,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=609346.6666666666, ans=0.125 2023-11-19 06:30:33,337 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7250, loss[loss=0.0999, simple_loss=0.1238, pruned_loss=0.02819, audio_tagging_loss=0.009798, over 15366.00 frames. ], tot_loss[loss=0.09137, simple_loss=0.1093, pruned_loss=0.02568, audio_tagging_loss=0.01105, over 3048290.57 frames. ], batch size: 59, lr: 8.61e-03, grad_scale: 32.0 2023-11-19 06:30:41,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=609413.3333333334, ans=0.5 2023-11-19 06:30:45,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=609480.0, ans=0.125 2023-11-19 06:30:53,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=609480.0, ans=0.1 2023-11-19 06:31:07,339 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.81 vs. limit=15.0 2023-11-19 06:31:16,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=609680.0, ans=0.125 2023-11-19 06:31:23,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=609680.0, ans=0.2 2023-11-19 06:31:28,344 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7300, loss[loss=0.08406, simple_loss=0.09603, pruned_loss=0.02319, audio_tagging_loss=0.01286, over 14991.00 frames. ], tot_loss[loss=0.09105, simple_loss=0.1091, pruned_loss=0.02559, audio_tagging_loss=0.01092, over 3052147.81 frames. ], batch size: 58, lr: 8.60e-03, grad_scale: 32.0 2023-11-19 06:31:32,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.24 vs. limit=22.5 2023-11-19 06:31:36,427 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:31:54,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=609880.0, ans=0.125 2023-11-19 06:31:56,174 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.88 vs. limit=15.0 2023-11-19 06:31:59,848 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.529e+01 8.576e+01 9.670e+01 1.044e+02 1.433e+02, threshold=1.934e+02, percent-clipped=0.0 2023-11-19 06:32:01,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=609946.6666666666, ans=0.2 2023-11-19 06:32:12,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=610013.3333333334, ans=0.125 2023-11-19 06:32:15,125 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.35 vs. limit=6.0 2023-11-19 06:32:23,119 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7350, loss[loss=0.09174, simple_loss=0.1189, pruned_loss=0.02337, audio_tagging_loss=0.008913, over 16199.00 frames. ], tot_loss[loss=0.09133, simple_loss=0.1095, pruned_loss=0.02583, audio_tagging_loss=0.01076, over 3056596.04 frames. ], batch size: 58, lr: 8.60e-03, grad_scale: 32.0 2023-11-19 06:32:23,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=610080.0, ans=0.1 2023-11-19 06:32:58,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=610280.0, ans=0.1 2023-11-19 06:33:04,001 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.14 vs. limit=15.0 2023-11-19 06:33:18,536 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7400, loss[loss=0.08833, simple_loss=0.108, pruned_loss=0.02258, audio_tagging_loss=0.01175, over 15294.00 frames. ], tot_loss[loss=0.09042, simple_loss=0.1086, pruned_loss=0.02549, audio_tagging_loss=0.01062, over 3048449.65 frames. ], batch size: 58, lr: 8.60e-03, grad_scale: 32.0 2023-11-19 06:33:38,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=610480.0, ans=0.1 2023-11-19 06:33:41,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=610546.6666666666, ans=0.0 2023-11-19 06:33:46,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=610546.6666666666, ans=0.2 2023-11-19 06:33:51,303 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.147e+01 8.574e+01 9.523e+01 1.112e+02 1.475e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-19 06:33:56,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=610613.3333333334, ans=0.125 2023-11-19 06:34:02,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=610680.0, ans=0.0 2023-11-19 06:34:09,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=610680.0, ans=0.0 2023-11-19 06:34:11,297 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.68 vs. limit=6.0 2023-11-19 06:34:14,355 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7450, loss[loss=0.08638, simple_loss=0.1023, pruned_loss=0.02155, audio_tagging_loss=0.01367, over 14855.00 frames. ], tot_loss[loss=0.09087, simple_loss=0.1094, pruned_loss=0.02562, audio_tagging_loss=0.01054, over 3053731.09 frames. ], batch size: 55, lr: 8.60e-03, grad_scale: 32.0 2023-11-19 06:34:22,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=610746.6666666666, ans=0.0 2023-11-19 06:34:33,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=610813.3333333334, ans=0.125 2023-11-19 06:34:46,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=610946.6666666666, ans=0.02 2023-11-19 06:35:10,318 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7500, loss[loss=0.09746, simple_loss=0.1193, pruned_loss=0.02854, audio_tagging_loss=0.009283, over 14251.00 frames. ], tot_loss[loss=0.09072, simple_loss=0.1093, pruned_loss=0.02561, audio_tagging_loss=0.01044, over 3046785.56 frames. ], batch size: 55, lr: 8.59e-03, grad_scale: 32.0 2023-11-19 06:35:10,458 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=611080.0, ans=0.125 2023-11-19 06:35:11,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=611080.0, ans=0.0 2023-11-19 06:35:12,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=611080.0, ans=0.125 2023-11-19 06:35:24,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=611146.6666666666, ans=0.0 2023-11-19 06:35:28,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=611146.6666666666, ans=0.09899494936611666 2023-11-19 06:35:33,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=611213.3333333334, ans=0.025 2023-11-19 06:35:38,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=611213.3333333334, ans=0.125 2023-11-19 06:35:42,594 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.328e+01 8.409e+01 9.431e+01 1.041e+02 1.502e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-19 06:36:05,187 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7550, loss[loss=0.09033, simple_loss=0.1157, pruned_loss=0.0253, audio_tagging_loss=0.00717, over 15711.00 frames. ], tot_loss[loss=0.09104, simple_loss=0.1097, pruned_loss=0.02575, audio_tagging_loss=0.01043, over 3049808.17 frames. ], batch size: 59, lr: 8.59e-03, grad_scale: 32.0 2023-11-19 06:36:11,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=611413.3333333334, ans=0.2 2023-11-19 06:36:13,974 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.92 vs. limit=15.0 2023-11-19 06:36:18,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=611480.0, ans=0.0 2023-11-19 06:36:32,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=611546.6666666666, ans=0.125 2023-11-19 06:36:46,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=611613.3333333334, ans=0.125 2023-11-19 06:36:59,382 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7600, loss[loss=0.08602, simple_loss=0.1019, pruned_loss=0.02332, audio_tagging_loss=0.01174, over 14807.00 frames. ], tot_loss[loss=0.09039, simple_loss=0.1086, pruned_loss=0.02545, audio_tagging_loss=0.01063, over 3043148.25 frames. ], batch size: 54, lr: 8.59e-03, grad_scale: 32.0 2023-11-19 06:37:11,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=611813.3333333334, ans=0.125 2023-11-19 06:37:24,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=611880.0, ans=0.125 2023-11-19 06:37:30,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=611880.0, ans=0.1 2023-11-19 06:37:31,620 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.78 vs. limit=15.0 2023-11-19 06:37:31,855 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=10.48 vs. limit=15.0 2023-11-19 06:37:32,017 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.740e+01 8.282e+01 9.217e+01 9.907e+01 1.227e+02, threshold=1.843e+02, percent-clipped=0.0 2023-11-19 06:37:46,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.56 vs. limit=15.0 2023-11-19 06:37:53,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=612013.3333333334, ans=0.025 2023-11-19 06:37:56,176 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7650, loss[loss=0.1029, simple_loss=0.1234, pruned_loss=0.03456, audio_tagging_loss=0.006598, over 15919.00 frames. ], tot_loss[loss=0.08988, simple_loss=0.1077, pruned_loss=0.02528, audio_tagging_loss=0.01077, over 3046826.97 frames. ], batch size: 57, lr: 8.59e-03, grad_scale: 32.0 2023-11-19 06:37:59,435 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:38:06,783 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.48 vs. limit=15.0 2023-11-19 06:38:25,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=612213.3333333334, ans=0.0 2023-11-19 06:38:41,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=612346.6666666666, ans=0.0 2023-11-19 06:38:51,535 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7700, loss[loss=0.09444, simple_loss=0.1114, pruned_loss=0.02626, audio_tagging_loss=0.01246, over 15196.00 frames. ], tot_loss[loss=0.08993, simple_loss=0.108, pruned_loss=0.02526, audio_tagging_loss=0.01066, over 3051170.21 frames. ], batch size: 56, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:38:56,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=612413.3333333334, ans=0.0 2023-11-19 06:39:11,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=612480.0, ans=0.0 2023-11-19 06:39:23,617 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.345e+01 8.913e+01 9.609e+01 1.128e+02 1.741e+02, threshold=1.922e+02, percent-clipped=0.0 2023-11-19 06:39:28,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=612613.3333333334, ans=0.125 2023-11-19 06:39:32,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.43 vs. limit=12.0 2023-11-19 06:39:46,136 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7750, loss[loss=0.08475, simple_loss=0.09979, pruned_loss=0.02422, audio_tagging_loss=0.01064, over 14380.00 frames. ], tot_loss[loss=0.08955, simple_loss=0.1072, pruned_loss=0.02523, audio_tagging_loss=0.01074, over 3041670.15 frames. ], batch size: 54, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:40:25,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=612946.6666666666, ans=0.125 2023-11-19 06:40:31,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=613013.3333333334, ans=0.025 2023-11-19 06:40:42,188 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7800, loss[loss=0.1054, simple_loss=0.1284, pruned_loss=0.03314, audio_tagging_loss=0.008113, over 16047.00 frames. ], tot_loss[loss=0.09006, simple_loss=0.108, pruned_loss=0.02539, audio_tagging_loss=0.01067, over 3045079.95 frames. ], batch size: 59, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:40:47,694 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=10.57 vs. limit=15.0 2023-11-19 06:40:56,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=613146.6666666666, ans=0.1 2023-11-19 06:41:04,836 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.50 vs. limit=15.0 2023-11-19 06:41:13,882 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.793e+01 8.392e+01 9.222e+01 1.047e+02 1.457e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 06:41:16,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=613280.0, ans=0.1 2023-11-19 06:41:17,050 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=13.05 vs. limit=15.0 2023-11-19 06:41:23,695 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-92000.pt 2023-11-19 06:41:38,998 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=22.32 vs. limit=22.5 2023-11-19 06:41:39,093 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.55 vs. limit=12.0 2023-11-19 06:41:40,455 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7850, loss[loss=0.09732, simple_loss=0.1074, pruned_loss=0.02972, audio_tagging_loss=0.01388, over 15721.00 frames. ], tot_loss[loss=0.09092, simple_loss=0.1089, pruned_loss=0.02568, audio_tagging_loss=0.01078, over 3049344.68 frames. ], batch size: 59, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:41:42,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=613413.3333333334, ans=0.125 2023-11-19 06:41:56,402 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=613480.0, ans=0.125 2023-11-19 06:42:06,258 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.30 vs. limit=12.0 2023-11-19 06:42:30,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=613680.0, ans=0.125 2023-11-19 06:42:33,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=613680.0, ans=0.125 2023-11-19 06:42:35,052 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7900, loss[loss=0.08571, simple_loss=0.09976, pruned_loss=0.02492, audio_tagging_loss=0.01091, over 14816.00 frames. ], tot_loss[loss=0.09077, simple_loss=0.1088, pruned_loss=0.02547, audio_tagging_loss=0.01089, over 3054848.46 frames. ], batch size: 58, lr: 8.58e-03, grad_scale: 32.0 2023-11-19 06:43:05,259 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=10.38 vs. limit=15.0 2023-11-19 06:43:07,836 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.644e+01 9.372e+01 1.085e+02 1.414e+02, threshold=1.874e+02, percent-clipped=0.0 2023-11-19 06:43:16,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=613946.6666666666, ans=0.2 2023-11-19 06:43:17,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=613946.6666666666, ans=0.1 2023-11-19 06:43:17,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=613946.6666666666, ans=0.125 2023-11-19 06:43:20,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=614013.3333333334, ans=0.125 2023-11-19 06:43:31,072 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 7950, loss[loss=0.07451, simple_loss=0.08541, pruned_loss=0.02023, audio_tagging_loss=0.01157, over 14701.00 frames. ], tot_loss[loss=0.09125, simple_loss=0.1089, pruned_loss=0.02576, audio_tagging_loss=0.01104, over 3046372.37 frames. ], batch size: 58, lr: 8.57e-03, grad_scale: 32.0 2023-11-19 06:43:35,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=614080.0, ans=0.0 2023-11-19 06:43:44,726 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:43:48,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=614146.6666666666, ans=0.0 2023-11-19 06:43:55,939 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=5.84 vs. limit=15.0 2023-11-19 06:44:01,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=614213.3333333334, ans=0.125 2023-11-19 06:44:10,516 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.95 vs. limit=6.0 2023-11-19 06:44:23,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=614346.6666666666, ans=0.125 2023-11-19 06:44:26,294 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8000, loss[loss=0.07896, simple_loss=0.09029, pruned_loss=0.01923, audio_tagging_loss=0.01458, over 15207.00 frames. ], tot_loss[loss=0.09081, simple_loss=0.1084, pruned_loss=0.0255, audio_tagging_loss=0.01111, over 3053088.03 frames. ], batch size: 58, lr: 8.57e-03, grad_scale: 32.0 2023-11-19 06:44:28,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=614413.3333333334, ans=0.07 2023-11-19 06:44:37,557 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.57 vs. limit=10.0 2023-11-19 06:44:52,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=614546.6666666666, ans=0.125 2023-11-19 06:44:57,964 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.414e+01 8.896e+01 9.629e+01 1.081e+02 1.400e+02, threshold=1.926e+02, percent-clipped=0.0 2023-11-19 06:45:11,377 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.74 vs. limit=15.0 2023-11-19 06:45:15,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=614680.0, ans=0.2 2023-11-19 06:45:19,726 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=614680.0, ans=0.2 2023-11-19 06:45:21,588 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8050, loss[loss=0.08616, simple_loss=0.1085, pruned_loss=0.02067, audio_tagging_loss=0.01125, over 16022.00 frames. ], tot_loss[loss=0.09068, simple_loss=0.1081, pruned_loss=0.02546, audio_tagging_loss=0.01117, over 3054829.91 frames. ], batch size: 59, lr: 8.57e-03, grad_scale: 64.0 2023-11-19 06:45:24,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=614746.6666666666, ans=0.1 2023-11-19 06:45:52,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=614880.0, ans=0.0 2023-11-19 06:45:56,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=614946.6666666666, ans=0.2 2023-11-19 06:45:57,303 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.75 vs. limit=22.5 2023-11-19 06:45:59,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=614946.6666666666, ans=0.0 2023-11-19 06:46:17,959 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8100, loss[loss=0.1028, simple_loss=0.1245, pruned_loss=0.02881, audio_tagging_loss=0.01176, over 14709.00 frames. ], tot_loss[loss=0.09109, simple_loss=0.109, pruned_loss=0.02566, audio_tagging_loss=0.01092, over 3055196.12 frames. ], batch size: 55, lr: 8.57e-03, grad_scale: 64.0 2023-11-19 06:46:28,571 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.99 vs. limit=10.0 2023-11-19 06:46:33,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=615146.6666666666, ans=0.0 2023-11-19 06:46:49,784 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.125e+01 8.425e+01 9.284e+01 1.022e+02 1.413e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-19 06:47:01,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=615346.6666666666, ans=0.5 2023-11-19 06:47:13,504 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8150, loss[loss=0.0621, simple_loss=0.07237, pruned_loss=0.0166, audio_tagging_loss=0.00931, over 14854.00 frames. ], tot_loss[loss=0.0906, simple_loss=0.1086, pruned_loss=0.02551, audio_tagging_loss=0.01078, over 3049523.40 frames. ], batch size: 56, lr: 8.56e-03, grad_scale: 64.0 2023-11-19 06:47:18,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=615413.3333333334, ans=10.0 2023-11-19 06:47:27,050 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=15.0 2023-11-19 06:47:36,570 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.43 vs. limit=6.0 2023-11-19 06:47:45,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=11.57 vs. limit=15.0 2023-11-19 06:47:45,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=615613.3333333334, ans=0.1 2023-11-19 06:47:59,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=615680.0, ans=0.0 2023-11-19 06:48:05,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=615680.0, ans=0.2 2023-11-19 06:48:07,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=615680.0, ans=0.125 2023-11-19 06:48:08,939 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8200, loss[loss=0.07838, simple_loss=0.09649, pruned_loss=0.01838, audio_tagging_loss=0.01176, over 14838.00 frames. ], tot_loss[loss=0.09108, simple_loss=0.1094, pruned_loss=0.02566, audio_tagging_loss=0.01069, over 3050489.83 frames. ], batch size: 55, lr: 8.56e-03, grad_scale: 64.0 2023-11-19 06:48:08,981 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 06:48:10,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=615746.6666666666, ans=0.125 2023-11-19 06:48:17,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=615746.6666666666, ans=0.0 2023-11-19 06:48:27,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=615813.3333333334, ans=0.125 2023-11-19 06:48:30,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=615880.0, ans=0.0 2023-11-19 06:48:40,375 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=615880.0, ans=0.0 2023-11-19 06:48:41,185 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.380e+01 9.339e+01 1.044e+02 1.327e+02, threshold=1.868e+02, percent-clipped=0.0 2023-11-19 06:48:42,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=615946.6666666666, ans=0.05 2023-11-19 06:48:44,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=615946.6666666666, ans=0.125 2023-11-19 06:48:47,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=615946.6666666666, ans=0.125 2023-11-19 06:49:02,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=616013.3333333334, ans=0.0 2023-11-19 06:49:05,243 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8250, loss[loss=0.08636, simple_loss=0.1087, pruned_loss=0.02318, audio_tagging_loss=0.008849, over 15315.00 frames. ], tot_loss[loss=0.09099, simple_loss=0.1096, pruned_loss=0.02555, audio_tagging_loss=0.01063, over 3047331.59 frames. ], batch size: 56, lr: 8.56e-03, grad_scale: 64.0 2023-11-19 06:49:12,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=616080.0, ans=0.1 2023-11-19 06:49:15,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=616146.6666666666, ans=0.0 2023-11-19 06:49:40,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=616280.0, ans=0.2 2023-11-19 06:49:42,966 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.22 vs. limit=15.0 2023-11-19 06:49:50,911 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=616346.6666666666, ans=0.0 2023-11-19 06:50:00,758 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8300, loss[loss=0.07956, simple_loss=0.08517, pruned_loss=0.02227, audio_tagging_loss=0.0147, over 15800.00 frames. ], tot_loss[loss=0.09053, simple_loss=0.1088, pruned_loss=0.02546, audio_tagging_loss=0.01069, over 3046171.12 frames. ], batch size: 60, lr: 8.56e-03, grad_scale: 32.0 2023-11-19 06:50:07,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=616413.3333333334, ans=0.1 2023-11-19 06:50:10,188 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=616413.3333333334, ans=0.125 2023-11-19 06:50:14,720 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=13.52 vs. limit=15.0 2023-11-19 06:50:34,215 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.747e+01 9.487e+01 1.050e+02 1.506e+02, threshold=1.897e+02, percent-clipped=0.0 2023-11-19 06:50:41,320 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.38 vs. limit=12.0 2023-11-19 06:50:43,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=616613.3333333334, ans=15.0 2023-11-19 06:50:56,412 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8350, loss[loss=0.09807, simple_loss=0.1299, pruned_loss=0.02532, audio_tagging_loss=0.007803, over 16069.00 frames. ], tot_loss[loss=0.09041, simple_loss=0.1088, pruned_loss=0.02542, audio_tagging_loss=0.01059, over 3052187.21 frames. ], batch size: 56, lr: 8.55e-03, grad_scale: 32.0 2023-11-19 06:51:09,317 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.151e-02 2023-11-19 06:51:12,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=616813.3333333334, ans=0.2 2023-11-19 06:51:15,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=616813.3333333334, ans=0.07 2023-11-19 06:51:20,290 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.24 vs. limit=6.0 2023-11-19 06:51:32,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=616946.6666666666, ans=0.0 2023-11-19 06:51:51,344 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8400, loss[loss=0.08233, simple_loss=0.09648, pruned_loss=0.02221, audio_tagging_loss=0.01188, over 15247.00 frames. ], tot_loss[loss=0.08922, simple_loss=0.1069, pruned_loss=0.02501, audio_tagging_loss=0.01077, over 3046327.37 frames. ], batch size: 56, lr: 8.55e-03, grad_scale: 32.0 2023-11-19 06:52:25,115 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.224e+01 8.578e+01 9.429e+01 1.025e+02 1.342e+02, threshold=1.886e+02, percent-clipped=0.0 2023-11-19 06:52:39,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.65 vs. limit=15.0 2023-11-19 06:52:47,688 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8450, loss[loss=0.0984, simple_loss=0.1177, pruned_loss=0.02991, audio_tagging_loss=0.009658, over 14693.00 frames. ], tot_loss[loss=0.08936, simple_loss=0.1073, pruned_loss=0.02509, audio_tagging_loss=0.01063, over 3044994.46 frames. ], batch size: 57, lr: 8.55e-03, grad_scale: 32.0 2023-11-19 06:53:01,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=617480.0, ans=0.125 2023-11-19 06:53:12,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=617546.6666666666, ans=0.125 2023-11-19 06:53:14,934 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.12 vs. limit=6.0 2023-11-19 06:53:30,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=617613.3333333334, ans=0.0 2023-11-19 06:53:43,103 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8500, loss[loss=0.0706, simple_loss=0.08538, pruned_loss=0.01674, audio_tagging_loss=0.01117, over 14799.00 frames. ], tot_loss[loss=0.08962, simple_loss=0.1075, pruned_loss=0.02525, audio_tagging_loss=0.01062, over 3046313.16 frames. ], batch size: 58, lr: 8.55e-03, grad_scale: 32.0 2023-11-19 06:54:00,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=617813.3333333334, ans=0.125 2023-11-19 06:54:08,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=617880.0, ans=0.2 2023-11-19 06:54:16,824 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.808e+01 8.976e+01 1.039e+02 1.170e+02 1.800e+02, threshold=2.077e+02, percent-clipped=0.0 2023-11-19 06:54:18,074 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=617946.6666666666, ans=0.1 2023-11-19 06:54:21,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=617946.6666666666, ans=0.09899494936611666 2023-11-19 06:54:38,530 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8550, loss[loss=0.102, simple_loss=0.1293, pruned_loss=0.02902, audio_tagging_loss=0.008306, over 15617.00 frames. ], tot_loss[loss=0.09001, simple_loss=0.1079, pruned_loss=0.02539, audio_tagging_loss=0.01067, over 3051578.40 frames. ], batch size: 56, lr: 8.55e-03, grad_scale: 16.0 2023-11-19 06:54:58,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=618146.6666666666, ans=0.0 2023-11-19 06:55:16,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=618280.0, ans=0.125 2023-11-19 06:55:34,202 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8600, loss[loss=0.1156, simple_loss=0.1385, pruned_loss=0.03708, audio_tagging_loss=0.009258, over 15625.00 frames. ], tot_loss[loss=0.09036, simple_loss=0.1084, pruned_loss=0.02543, audio_tagging_loss=0.01073, over 3057632.16 frames. ], batch size: 57, lr: 8.54e-03, grad_scale: 16.0 2023-11-19 06:55:44,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=618480.0, ans=0.0 2023-11-19 06:56:09,131 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.491e+01 8.546e+01 9.397e+01 1.054e+02 1.390e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 06:56:13,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=618613.3333333334, ans=0.1 2023-11-19 06:56:16,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=618613.3333333334, ans=0.2 2023-11-19 06:56:25,058 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=618680.0, ans=0.125 2023-11-19 06:56:28,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=618680.0, ans=0.0 2023-11-19 06:56:30,047 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8650, loss[loss=0.1006, simple_loss=0.1249, pruned_loss=0.02939, audio_tagging_loss=0.008718, over 15409.00 frames. ], tot_loss[loss=0.09063, simple_loss=0.1087, pruned_loss=0.02552, audio_tagging_loss=0.01075, over 3064790.86 frames. ], batch size: 58, lr: 8.54e-03, grad_scale: 16.0 2023-11-19 06:56:59,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=618880.0, ans=0.07 2023-11-19 06:57:01,731 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.54 vs. limit=6.0 2023-11-19 06:57:15,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=619013.3333333334, ans=0.09899494936611666 2023-11-19 06:57:24,082 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=619080.0, ans=0.0 2023-11-19 06:57:24,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=619080.0, ans=0.125 2023-11-19 06:57:24,840 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8700, loss[loss=0.0599, simple_loss=0.06646, pruned_loss=0.01533, audio_tagging_loss=0.01134, over 16541.00 frames. ], tot_loss[loss=0.09096, simple_loss=0.1089, pruned_loss=0.02564, audio_tagging_loss=0.01088, over 3058766.78 frames. ], batch size: 65, lr: 8.54e-03, grad_scale: 16.0 2023-11-19 06:57:46,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=619146.6666666666, ans=0.0 2023-11-19 06:58:00,016 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.397e+01 9.176e+01 9.937e+01 1.111e+02 1.947e+02, threshold=1.987e+02, percent-clipped=1.0 2023-11-19 06:58:13,017 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=619346.6666666666, ans=0.125 2023-11-19 06:58:21,805 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8750, loss[loss=0.07905, simple_loss=0.09015, pruned_loss=0.02081, audio_tagging_loss=0.01317, over 16013.00 frames. ], tot_loss[loss=0.09101, simple_loss=0.109, pruned_loss=0.02559, audio_tagging_loss=0.01092, over 3068984.57 frames. ], batch size: 60, lr: 8.54e-03, grad_scale: 16.0 2023-11-19 06:58:27,699 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.20 vs. limit=15.0 2023-11-19 06:58:58,294 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 06:59:16,514 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8800, loss[loss=0.122, simple_loss=0.1475, pruned_loss=0.03937, audio_tagging_loss=0.008878, over 15413.00 frames. ], tot_loss[loss=0.09254, simple_loss=0.1109, pruned_loss=0.02615, audio_tagging_loss=0.01094, over 3064818.59 frames. ], batch size: 56, lr: 8.53e-03, grad_scale: 32.0 2023-11-19 06:59:19,033 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.88 vs. limit=15.0 2023-11-19 06:59:23,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=619746.6666666666, ans=0.125 2023-11-19 06:59:45,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=619880.0, ans=0.2 2023-11-19 06:59:50,698 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.071e+01 8.376e+01 8.906e+01 9.929e+01 1.528e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-19 07:00:00,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=620013.3333333334, ans=0.2 2023-11-19 07:00:11,507 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8850, loss[loss=0.09995, simple_loss=0.1194, pruned_loss=0.03127, audio_tagging_loss=0.008968, over 14700.00 frames. ], tot_loss[loss=0.09268, simple_loss=0.1112, pruned_loss=0.02626, audio_tagging_loss=0.01081, over 3063482.85 frames. ], batch size: 54, lr: 8.53e-03, grad_scale: 32.0 2023-11-19 07:00:15,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=620080.0, ans=0.125 2023-11-19 07:00:16,940 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:00:23,173 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:00:24,407 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=620146.6666666666, ans=0.0 2023-11-19 07:00:25,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=620146.6666666666, ans=0.2 2023-11-19 07:00:35,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=620213.3333333334, ans=0.0 2023-11-19 07:01:07,560 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8900, loss[loss=0.08416, simple_loss=0.1034, pruned_loss=0.02389, audio_tagging_loss=0.008598, over 14229.00 frames. ], tot_loss[loss=0.09267, simple_loss=0.1116, pruned_loss=0.02629, audio_tagging_loss=0.01059, over 3061861.96 frames. ], batch size: 57, lr: 8.53e-03, grad_scale: 32.0 2023-11-19 07:01:07,758 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=620413.3333333334, ans=0.07 2023-11-19 07:01:08,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=620413.3333333334, ans=0.125 2023-11-19 07:01:34,709 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=620546.6666666666, ans=0.0 2023-11-19 07:01:42,268 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.959e+01 8.332e+01 9.247e+01 1.033e+02 1.504e+02, threshold=1.849e+02, percent-clipped=0.0 2023-11-19 07:01:54,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=620680.0, ans=0.0 2023-11-19 07:01:54,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=620680.0, ans=0.125 2023-11-19 07:02:02,942 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 8950, loss[loss=0.09187, simple_loss=0.1083, pruned_loss=0.02583, audio_tagging_loss=0.0119, over 14924.00 frames. ], tot_loss[loss=0.09156, simple_loss=0.1102, pruned_loss=0.02593, audio_tagging_loss=0.01052, over 3058698.01 frames. ], batch size: 57, lr: 8.53e-03, grad_scale: 16.0 2023-11-19 07:02:05,243 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=620746.6666666666, ans=0.0 2023-11-19 07:02:26,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=620880.0, ans=0.0 2023-11-19 07:02:28,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=620880.0, ans=0.125 2023-11-19 07:02:30,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=620880.0, ans=0.125 2023-11-19 07:02:49,585 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:02:57,856 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9000, loss[loss=0.09175, simple_loss=0.1082, pruned_loss=0.02671, audio_tagging_loss=0.01092, over 14681.00 frames. ], tot_loss[loss=0.09067, simple_loss=0.1093, pruned_loss=0.0256, audio_tagging_loss=0.01044, over 3060925.88 frames. ], batch size: 55, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:02:57,859 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 07:03:30,610 INFO [train_asr.py:1147] (0/4) Epoch 8, validation: loss=0.06719, simple_loss=0.05665, pruned_loss=0.006997, audio_tagging_loss=0.03186, over 4681554.00 frames. 2023-11-19 07:03:30,611 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 07:04:05,132 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.930e+01 8.568e+01 9.180e+01 1.028e+02 1.650e+02, threshold=1.836e+02, percent-clipped=0.0 2023-11-19 07:04:16,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=621346.6666666666, ans=0.2 2023-11-19 07:04:20,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.max_abs, batch_count=621346.6666666666, ans=10.0 2023-11-19 07:04:26,000 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9050, loss[loss=0.09778, simple_loss=0.1026, pruned_loss=0.0343, audio_tagging_loss=0.01216, over 16027.00 frames. ], tot_loss[loss=0.0913, simple_loss=0.1102, pruned_loss=0.02575, audio_tagging_loss=0.01045, over 3055958.03 frames. ], batch size: 61, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:04:27,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=621413.3333333334, ans=0.2 2023-11-19 07:04:36,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.18 vs. limit=22.5 2023-11-19 07:04:45,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=621480.0, ans=0.125 2023-11-19 07:04:47,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=621546.6666666666, ans=0.125 2023-11-19 07:04:51,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=621546.6666666666, ans=0.125 2023-11-19 07:04:51,359 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=621546.6666666666, ans=0.125 2023-11-19 07:05:20,480 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9100, loss[loss=0.109, simple_loss=0.1286, pruned_loss=0.03492, audio_tagging_loss=0.009837, over 14644.00 frames. ], tot_loss[loss=0.09091, simple_loss=0.1098, pruned_loss=0.02569, audio_tagging_loss=0.01033, over 3055423.07 frames. ], batch size: 54, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:05:30,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=621813.3333333334, ans=0.125 2023-11-19 07:05:36,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.66 vs. limit=15.0 2023-11-19 07:05:52,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=621880.0, ans=0.1 2023-11-19 07:05:56,149 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.582e+01 8.531e+01 9.413e+01 1.050e+02 2.515e+02, threshold=1.883e+02, percent-clipped=1.0 2023-11-19 07:06:16,309 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9150, loss[loss=0.08124, simple_loss=0.09178, pruned_loss=0.02417, audio_tagging_loss=0.01118, over 15237.00 frames. ], tot_loss[loss=0.09028, simple_loss=0.1093, pruned_loss=0.02536, audio_tagging_loss=0.01029, over 3054566.41 frames. ], batch size: 57, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:06:23,468 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=622080.0, ans=0.125 2023-11-19 07:06:37,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=622213.3333333334, ans=0.125 2023-11-19 07:06:47,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=622213.3333333334, ans=0.125 2023-11-19 07:06:47,107 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=622213.3333333334, ans=0.125 2023-11-19 07:07:10,211 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:07:12,193 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9200, loss[loss=0.07592, simple_loss=0.08207, pruned_loss=0.02105, audio_tagging_loss=0.01383, over 15390.00 frames. ], tot_loss[loss=0.08996, simple_loss=0.1086, pruned_loss=0.02531, audio_tagging_loss=0.01036, over 3046302.79 frames. ], batch size: 59, lr: 8.52e-03, grad_scale: 16.0 2023-11-19 07:07:24,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=622480.0, ans=0.125 2023-11-19 07:07:29,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=622480.0, ans=0.125 2023-11-19 07:07:31,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=622480.0, ans=0.5 2023-11-19 07:07:43,025 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=622546.6666666666, ans=0.1 2023-11-19 07:07:48,136 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.097e+01 8.490e+01 9.151e+01 1.009e+02 3.492e+02, threshold=1.830e+02, percent-clipped=1.0 2023-11-19 07:08:03,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=622680.0, ans=0.125 2023-11-19 07:08:06,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=622746.6666666666, ans=0.125 2023-11-19 07:08:06,982 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9250, loss[loss=0.1313, simple_loss=0.1608, pruned_loss=0.04418, audio_tagging_loss=0.006687, over 15111.00 frames. ], tot_loss[loss=0.0905, simple_loss=0.1089, pruned_loss=0.02563, audio_tagging_loss=0.0104, over 3048356.61 frames. ], batch size: 56, lr: 8.51e-03, grad_scale: 16.0 2023-11-19 07:08:17,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=622813.3333333334, ans=0.125 2023-11-19 07:08:19,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=622813.3333333334, ans=0.0 2023-11-19 07:08:42,158 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=622946.6666666666, ans=0.125 2023-11-19 07:08:43,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=622946.6666666666, ans=0.1 2023-11-19 07:08:58,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=623013.3333333334, ans=0.125 2023-11-19 07:09:02,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=623080.0, ans=0.1 2023-11-19 07:09:03,094 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9300, loss[loss=0.1196, simple_loss=0.145, pruned_loss=0.03722, audio_tagging_loss=0.009853, over 15167.00 frames. ], tot_loss[loss=0.0898, simple_loss=0.1083, pruned_loss=0.02523, audio_tagging_loss=0.01044, over 3055592.86 frames. ], batch size: 56, lr: 8.51e-03, grad_scale: 16.0 2023-11-19 07:09:05,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=623080.0, ans=0.0 2023-11-19 07:09:08,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=623080.0, ans=0.0 2023-11-19 07:09:39,349 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.343e+01 8.509e+01 9.283e+01 1.013e+02 1.341e+02, threshold=1.857e+02, percent-clipped=0.0 2023-11-19 07:09:55,739 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.53 vs. limit=15.0 2023-11-19 07:09:58,417 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9350, loss[loss=0.1247, simple_loss=0.145, pruned_loss=0.04259, audio_tagging_loss=0.009628, over 16400.00 frames. ], tot_loss[loss=0.08973, simple_loss=0.108, pruned_loss=0.02513, audio_tagging_loss=0.01062, over 3053085.19 frames. ], batch size: 60, lr: 8.51e-03, grad_scale: 16.0 2023-11-19 07:09:58,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=623413.3333333334, ans=0.0 2023-11-19 07:10:05,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=623413.3333333334, ans=0.0 2023-11-19 07:10:10,958 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:10:10,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=623480.0, ans=0.125 2023-11-19 07:10:17,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=623480.0, ans=0.0 2023-11-19 07:10:25,438 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.63 vs. limit=15.0 2023-11-19 07:10:42,956 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2023-11-19 07:10:54,092 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9400, loss[loss=0.06851, simple_loss=0.08029, pruned_loss=0.01699, audio_tagging_loss=0.01137, over 15045.00 frames. ], tot_loss[loss=0.08914, simple_loss=0.1068, pruned_loss=0.02494, audio_tagging_loss=0.01077, over 3048893.95 frames. ], batch size: 58, lr: 8.51e-03, grad_scale: 16.0 2023-11-19 07:10:56,975 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.87 vs. limit=15.0 2023-11-19 07:11:05,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=623813.3333333334, ans=0.0 2023-11-19 07:11:16,978 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.00 vs. limit=15.0 2023-11-19 07:11:28,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.43 vs. limit=15.0 2023-11-19 07:11:31,161 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.566e+01 8.638e+01 9.433e+01 1.071e+02 1.581e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-19 07:11:48,329 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.36 vs. limit=15.0 2023-11-19 07:11:48,784 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:11:49,851 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9450, loss[loss=0.06518, simple_loss=0.06594, pruned_loss=0.0159, audio_tagging_loss=0.01631, over 14320.00 frames. ], tot_loss[loss=0.08875, simple_loss=0.1062, pruned_loss=0.0248, audio_tagging_loss=0.01085, over 3047374.02 frames. ], batch size: 56, lr: 8.50e-03, grad_scale: 16.0 2023-11-19 07:11:56,453 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=13.40 vs. limit=15.0 2023-11-19 07:11:57,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=624080.0, ans=0.1 2023-11-19 07:12:38,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=624346.6666666666, ans=0.1 2023-11-19 07:12:45,980 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9500, loss[loss=0.08746, simple_loss=0.1037, pruned_loss=0.02438, audio_tagging_loss=0.01124, over 14919.00 frames. ], tot_loss[loss=0.09034, simple_loss=0.1083, pruned_loss=0.02548, audio_tagging_loss=0.01073, over 3045550.79 frames. ], batch size: 56, lr: 8.50e-03, grad_scale: 16.0 2023-11-19 07:12:56,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=624480.0, ans=0.125 2023-11-19 07:13:03,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=624480.0, ans=0.125 2023-11-19 07:13:05,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.31 vs. limit=12.0 2023-11-19 07:13:12,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=624546.6666666666, ans=0.0 2023-11-19 07:13:15,358 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.13 vs. limit=15.0 2023-11-19 07:13:20,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=624613.3333333334, ans=0.2 2023-11-19 07:13:22,387 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.807e+01 8.457e+01 9.140e+01 9.820e+01 1.196e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 07:13:27,472 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=624613.3333333334, ans=0.125 2023-11-19 07:13:37,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=624680.0, ans=0.025 2023-11-19 07:13:41,567 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9550, loss[loss=0.07416, simple_loss=0.08237, pruned_loss=0.01831, audio_tagging_loss=0.01467, over 15134.00 frames. ], tot_loss[loss=0.0906, simple_loss=0.1085, pruned_loss=0.0255, audio_tagging_loss=0.01083, over 3050168.25 frames. ], batch size: 59, lr: 8.50e-03, grad_scale: 16.0 2023-11-19 07:13:44,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=624746.6666666666, ans=0.0 2023-11-19 07:13:58,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=624813.3333333334, ans=0.1 2023-11-19 07:14:05,387 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.26 vs. limit=15.0 2023-11-19 07:14:07,461 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.77 vs. limit=15.0 2023-11-19 07:14:11,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.42 vs. limit=15.0 2023-11-19 07:14:17,126 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=624946.6666666666, ans=0.125 2023-11-19 07:14:28,230 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:14:37,085 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9600, loss[loss=0.08222, simple_loss=0.09364, pruned_loss=0.02124, audio_tagging_loss=0.01416, over 15353.00 frames. ], tot_loss[loss=0.09032, simple_loss=0.1081, pruned_loss=0.02528, audio_tagging_loss=0.011, over 3045289.86 frames. ], batch size: 59, lr: 8.50e-03, grad_scale: 32.0 2023-11-19 07:14:42,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=625080.0, ans=0.0 2023-11-19 07:15:10,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=625280.0, ans=0.035 2023-11-19 07:15:13,356 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.874e+01 8.461e+01 9.298e+01 1.020e+02 1.547e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 07:15:18,680 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=625280.0, ans=0.0 2023-11-19 07:15:27,896 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.92 vs. limit=15.0 2023-11-19 07:15:33,197 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9650, loss[loss=0.1036, simple_loss=0.1287, pruned_loss=0.03044, audio_tagging_loss=0.008779, over 14643.00 frames. ], tot_loss[loss=0.09061, simple_loss=0.1082, pruned_loss=0.02548, audio_tagging_loss=0.01101, over 3040932.16 frames. ], batch size: 53, lr: 8.50e-03, grad_scale: 32.0 2023-11-19 07:15:33,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=625413.3333333334, ans=0.125 2023-11-19 07:15:39,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=625413.3333333334, ans=0.0 2023-11-19 07:15:42,939 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=625480.0, ans=0.1 2023-11-19 07:15:58,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=625546.6666666666, ans=0.2 2023-11-19 07:15:58,380 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.17 vs. limit=15.0 2023-11-19 07:16:12,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=625613.3333333334, ans=0.125 2023-11-19 07:16:13,347 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.89 vs. limit=15.0 2023-11-19 07:16:16,092 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.24 vs. limit=12.0 2023-11-19 07:16:28,192 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9700, loss[loss=0.1061, simple_loss=0.1238, pruned_loss=0.03808, audio_tagging_loss=0.006133, over 15053.00 frames. ], tot_loss[loss=0.09018, simple_loss=0.1082, pruned_loss=0.02531, audio_tagging_loss=0.01075, over 3043297.11 frames. ], batch size: 56, lr: 8.49e-03, grad_scale: 32.0 2023-11-19 07:16:34,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=625746.6666666666, ans=0.125 2023-11-19 07:16:39,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=625813.3333333334, ans=0.125 2023-11-19 07:16:42,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=625813.3333333334, ans=0.125 2023-11-19 07:16:51,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=625880.0, ans=0.1 2023-11-19 07:16:56,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=625880.0, ans=0.04949747468305833 2023-11-19 07:17:03,651 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.32 vs. limit=15.0 2023-11-19 07:17:04,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=625946.6666666666, ans=0.1 2023-11-19 07:17:05,236 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.958e+01 8.422e+01 9.037e+01 9.716e+01 1.315e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 07:17:06,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=625946.6666666666, ans=0.125 2023-11-19 07:17:13,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=626013.3333333334, ans=0.2 2023-11-19 07:17:23,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.51 vs. limit=22.5 2023-11-19 07:17:24,197 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9750, loss[loss=0.07885, simple_loss=0.09997, pruned_loss=0.01902, audio_tagging_loss=0.009843, over 14593.00 frames. ], tot_loss[loss=0.09024, simple_loss=0.1084, pruned_loss=0.02539, audio_tagging_loss=0.01067, over 3041344.46 frames. ], batch size: 57, lr: 8.49e-03, grad_scale: 32.0 2023-11-19 07:17:26,784 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.52 vs. limit=22.5 2023-11-19 07:17:42,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=626146.6666666666, ans=0.5 2023-11-19 07:17:53,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=626213.3333333334, ans=0.09899494936611666 2023-11-19 07:18:00,114 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2023-11-19 07:18:01,092 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.03 vs. limit=15.0 2023-11-19 07:18:06,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=626280.0, ans=0.125 2023-11-19 07:18:19,689 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9800, loss[loss=0.1009, simple_loss=0.1302, pruned_loss=0.02611, audio_tagging_loss=0.009708, over 15746.00 frames. ], tot_loss[loss=0.08898, simple_loss=0.1071, pruned_loss=0.02486, audio_tagging_loss=0.01056, over 3038862.84 frames. ], batch size: 56, lr: 8.49e-03, grad_scale: 32.0 2023-11-19 07:18:21,073 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=626413.3333333334, ans=0.125 2023-11-19 07:18:27,809 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=626413.3333333334, ans=0.125 2023-11-19 07:18:39,433 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=626480.0, ans=0.1 2023-11-19 07:18:56,499 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.157e+01 8.941e+01 9.770e+01 1.328e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-19 07:19:10,046 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:19:14,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=626746.6666666666, ans=0.125 2023-11-19 07:19:15,286 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9850, loss[loss=0.08161, simple_loss=0.09168, pruned_loss=0.02438, audio_tagging_loss=0.01139, over 14424.00 frames. ], tot_loss[loss=0.08917, simple_loss=0.1076, pruned_loss=0.02491, audio_tagging_loss=0.01047, over 3038434.15 frames. ], batch size: 53, lr: 8.49e-03, grad_scale: 32.0 2023-11-19 07:19:30,645 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.15 vs. limit=12.0 2023-11-19 07:19:37,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=626880.0, ans=0.2 2023-11-19 07:20:05,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=627013.3333333334, ans=0.125 2023-11-19 07:20:10,742 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9900, loss[loss=0.06661, simple_loss=0.07755, pruned_loss=0.01669, audio_tagging_loss=0.01115, over 15858.00 frames. ], tot_loss[loss=0.08968, simple_loss=0.1082, pruned_loss=0.02513, audio_tagging_loss=0.01043, over 3041788.03 frames. ], batch size: 59, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:20:33,744 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.60 vs. limit=10.0 2023-11-19 07:20:33,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.82 vs. limit=12.0 2023-11-19 07:20:47,071 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.338e+01 8.674e+01 9.203e+01 1.082e+02 1.582e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 07:20:57,169 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.08 vs. limit=22.5 2023-11-19 07:21:03,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=627346.6666666666, ans=0.05 2023-11-19 07:21:05,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=627413.3333333334, ans=0.125 2023-11-19 07:21:06,725 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 9950, loss[loss=0.08024, simple_loss=0.09498, pruned_loss=0.01877, audio_tagging_loss=0.01398, over 15989.00 frames. ], tot_loss[loss=0.08965, simple_loss=0.1081, pruned_loss=0.02515, audio_tagging_loss=0.01047, over 3043393.37 frames. ], batch size: 60, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:21:56,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=15.46 vs. limit=15.0 2023-11-19 07:22:00,166 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:22:02,094 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10000, loss[loss=0.08911, simple_loss=0.09648, pruned_loss=0.02975, audio_tagging_loss=0.01112, over 15294.00 frames. ], tot_loss[loss=0.08961, simple_loss=0.1082, pruned_loss=0.02513, audio_tagging_loss=0.01038, over 3043213.11 frames. ], batch size: 57, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:22:11,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=627813.3333333334, ans=0.0 2023-11-19 07:22:20,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=627813.3333333334, ans=0.125 2023-11-19 07:22:38,986 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.402e+01 8.658e+01 9.575e+01 1.064e+02 1.480e+02, threshold=1.915e+02, percent-clipped=0.0 2023-11-19 07:22:41,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=627946.6666666666, ans=0.0 2023-11-19 07:22:57,038 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10050, loss[loss=0.08894, simple_loss=0.1136, pruned_loss=0.01858, audio_tagging_loss=0.01356, over 15252.00 frames. ], tot_loss[loss=0.08994, simple_loss=0.1084, pruned_loss=0.02528, audio_tagging_loss=0.01047, over 3039476.99 frames. ], batch size: 55, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:23:08,554 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.18 vs. limit=22.5 2023-11-19 07:23:18,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=628146.6666666666, ans=0.1 2023-11-19 07:23:26,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=628213.3333333334, ans=0.04949747468305833 2023-11-19 07:23:27,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=628213.3333333334, ans=0.1 2023-11-19 07:23:29,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=628280.0, ans=0.0 2023-11-19 07:23:36,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=628280.0, ans=0.2 2023-11-19 07:23:43,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=628346.6666666666, ans=0.5 2023-11-19 07:23:45,214 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=628346.6666666666, ans=0.125 2023-11-19 07:23:47,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=628346.6666666666, ans=0.125 2023-11-19 07:23:50,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=628346.6666666666, ans=0.125 2023-11-19 07:23:53,592 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10100, loss[loss=0.07332, simple_loss=0.08057, pruned_loss=0.02282, audio_tagging_loss=0.01022, over 15426.00 frames. ], tot_loss[loss=0.09, simple_loss=0.1085, pruned_loss=0.02526, audio_tagging_loss=0.0105, over 3047738.05 frames. ], batch size: 59, lr: 8.48e-03, grad_scale: 32.0 2023-11-19 07:24:05,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=628480.0, ans=0.125 2023-11-19 07:24:18,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=628546.6666666666, ans=0.07 2023-11-19 07:24:29,378 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.227e+01 8.846e+01 9.478e+01 1.084e+02 1.850e+02, threshold=1.896e+02, percent-clipped=0.0 2023-11-19 07:24:32,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=628613.3333333334, ans=0.125 2023-11-19 07:24:32,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=628613.3333333334, ans=0.2 2023-11-19 07:24:37,885 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:24:48,924 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10150, loss[loss=0.1036, simple_loss=0.1255, pruned_loss=0.03221, audio_tagging_loss=0.008639, over 14744.00 frames. ], tot_loss[loss=0.08973, simple_loss=0.108, pruned_loss=0.02514, audio_tagging_loss=0.01058, over 3046779.25 frames. ], batch size: 54, lr: 8.47e-03, grad_scale: 32.0 2023-11-19 07:25:06,261 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=21.33 vs. limit=22.5 2023-11-19 07:25:11,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=628880.0, ans=0.1 2023-11-19 07:25:15,292 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:25:19,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=628880.0, ans=0.0 2023-11-19 07:25:23,568 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=628946.6666666666, ans=0.2 2023-11-19 07:25:25,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=628946.6666666666, ans=0.125 2023-11-19 07:25:26,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=628946.6666666666, ans=0.125 2023-11-19 07:25:27,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=628946.6666666666, ans=0.125 2023-11-19 07:25:43,826 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10200, loss[loss=0.1186, simple_loss=0.1317, pruned_loss=0.0384, audio_tagging_loss=0.01429, over 13245.00 frames. ], tot_loss[loss=0.08852, simple_loss=0.1061, pruned_loss=0.02478, audio_tagging_loss=0.0107, over 3041927.18 frames. ], batch size: 51, lr: 8.47e-03, grad_scale: 32.0 2023-11-19 07:25:45,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=629080.0, ans=0.125 2023-11-19 07:25:56,139 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.76 vs. limit=15.0 2023-11-19 07:26:05,552 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:26:20,963 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.198e+01 8.630e+01 9.623e+01 1.074e+02 1.731e+02, threshold=1.925e+02, percent-clipped=0.0 2023-11-19 07:26:40,089 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10250, loss[loss=0.07387, simple_loss=0.08701, pruned_loss=0.01902, audio_tagging_loss=0.01134, over 17410.00 frames. ], tot_loss[loss=0.0879, simple_loss=0.1053, pruned_loss=0.02448, audio_tagging_loss=0.01075, over 3040863.46 frames. ], batch size: 67, lr: 8.47e-03, grad_scale: 32.0 2023-11-19 07:26:41,251 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=629413.3333333334, ans=0.125 2023-11-19 07:26:50,171 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2023-11-19 07:27:02,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=629546.6666666666, ans=0.0 2023-11-19 07:27:07,172 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.55 vs. limit=6.0 2023-11-19 07:27:13,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=629613.3333333334, ans=0.07 2023-11-19 07:27:34,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.27 vs. limit=22.5 2023-11-19 07:27:36,291 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10300, loss[loss=0.09395, simple_loss=0.1187, pruned_loss=0.022, audio_tagging_loss=0.0126, over 15104.00 frames. ], tot_loss[loss=0.08859, simple_loss=0.1059, pruned_loss=0.02486, audio_tagging_loss=0.0108, over 3041808.40 frames. ], batch size: 56, lr: 8.47e-03, grad_scale: 32.0 2023-11-19 07:27:46,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=629813.3333333334, ans=0.0 2023-11-19 07:27:49,506 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=10.48 vs. limit=12.0 2023-11-19 07:28:00,627 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.49 vs. limit=15.0 2023-11-19 07:28:12,260 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.771e+01 9.491e+01 1.012e+02 1.579e+02, threshold=1.898e+02, percent-clipped=0.0 2023-11-19 07:28:15,309 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=629946.6666666666, ans=0.2 2023-11-19 07:28:27,071 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=4.31 vs. limit=15.0 2023-11-19 07:28:29,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=630080.0, ans=0.2 2023-11-19 07:28:30,766 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10350, loss[loss=0.08588, simple_loss=0.1047, pruned_loss=0.02291, audio_tagging_loss=0.01064, over 16084.00 frames. ], tot_loss[loss=0.08896, simple_loss=0.1064, pruned_loss=0.02482, audio_tagging_loss=0.01096, over 3043112.53 frames. ], batch size: 60, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:28:39,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=630080.0, ans=0.5 2023-11-19 07:28:52,611 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=9.06 vs. limit=10.0 2023-11-19 07:28:56,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=11.44 vs. limit=12.0 2023-11-19 07:29:26,687 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10400, loss[loss=0.1191, simple_loss=0.143, pruned_loss=0.03842, audio_tagging_loss=0.009171, over 15626.00 frames. ], tot_loss[loss=0.08972, simple_loss=0.107, pruned_loss=0.02515, audio_tagging_loss=0.01105, over 3042589.31 frames. ], batch size: 58, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:29:57,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=630546.6666666666, ans=0.0 2023-11-19 07:30:03,013 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.887e+01 8.595e+01 9.141e+01 1.035e+02 1.375e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 07:30:06,998 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.78 vs. limit=15.0 2023-11-19 07:30:21,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=630746.6666666666, ans=0.2 2023-11-19 07:30:22,399 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10450, loss[loss=0.072, simple_loss=0.08282, pruned_loss=0.02004, audio_tagging_loss=0.01054, over 14995.00 frames. ], tot_loss[loss=0.08889, simple_loss=0.1061, pruned_loss=0.02481, audio_tagging_loss=0.011, over 3041017.89 frames. ], batch size: 56, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:30:37,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=630813.3333333334, ans=0.125 2023-11-19 07:30:37,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=630813.3333333334, ans=0.0 2023-11-19 07:30:39,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=630813.3333333334, ans=0.0 2023-11-19 07:30:46,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=630880.0, ans=0.125 2023-11-19 07:30:47,204 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.99 vs. limit=15.0 2023-11-19 07:30:48,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=630880.0, ans=0.02 2023-11-19 07:30:56,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=630946.6666666666, ans=0.0 2023-11-19 07:30:59,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=630946.6666666666, ans=0.0 2023-11-19 07:31:04,830 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=630946.6666666666, ans=0.05 2023-11-19 07:31:04,996 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.08 vs. limit=10.0 2023-11-19 07:31:17,682 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10500, loss[loss=0.1005, simple_loss=0.1222, pruned_loss=0.03081, audio_tagging_loss=0.008585, over 15330.00 frames. ], tot_loss[loss=0.0884, simple_loss=0.1057, pruned_loss=0.0247, audio_tagging_loss=0.01086, over 3039965.84 frames. ], batch size: 58, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:31:44,964 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:31:45,112 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=14.95 vs. limit=22.5 2023-11-19 07:31:54,645 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.423e+01 8.554e+01 9.380e+01 1.032e+02 1.223e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-19 07:31:56,350 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.95 vs. limit=22.5 2023-11-19 07:32:13,150 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10550, loss[loss=0.07513, simple_loss=0.08971, pruned_loss=0.01923, audio_tagging_loss=0.01104, over 15485.00 frames. ], tot_loss[loss=0.08847, simple_loss=0.1058, pruned_loss=0.02479, audio_tagging_loss=0.01078, over 3041944.89 frames. ], batch size: 56, lr: 8.46e-03, grad_scale: 32.0 2023-11-19 07:32:29,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=631480.0, ans=0.125 2023-11-19 07:32:49,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=631613.3333333334, ans=0.125 2023-11-19 07:32:53,542 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=631613.3333333334, ans=0.125 2023-11-19 07:32:53,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=631613.3333333334, ans=0.0 2023-11-19 07:32:54,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=631613.3333333334, ans=0.125 2023-11-19 07:33:09,244 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10600, loss[loss=0.08129, simple_loss=0.09356, pruned_loss=0.02337, audio_tagging_loss=0.01115, over 15611.00 frames. ], tot_loss[loss=0.08832, simple_loss=0.1057, pruned_loss=0.02466, audio_tagging_loss=0.0108, over 3043933.66 frames. ], batch size: 60, lr: 8.45e-03, grad_scale: 32.0 2023-11-19 07:33:29,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.min_positive, batch_count=631880.0, ans=0.05 2023-11-19 07:33:45,593 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.034e+01 8.589e+01 9.112e+01 9.990e+01 1.319e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-19 07:33:48,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=631946.6666666666, ans=0.125 2023-11-19 07:33:54,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=632013.3333333334, ans=0.07 2023-11-19 07:34:04,984 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10650, loss[loss=0.07826, simple_loss=0.09886, pruned_loss=0.02009, audio_tagging_loss=0.008739, over 15642.00 frames. ], tot_loss[loss=0.08901, simple_loss=0.1066, pruned_loss=0.02497, audio_tagging_loss=0.01074, over 3039180.64 frames. ], batch size: 58, lr: 8.45e-03, grad_scale: 32.0 2023-11-19 07:34:12,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=632080.0, ans=0.07 2023-11-19 07:34:19,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=632146.6666666666, ans=0.125 2023-11-19 07:34:22,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=632146.6666666666, ans=0.0 2023-11-19 07:34:35,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=632213.3333333334, ans=0.1 2023-11-19 07:34:36,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=632213.3333333334, ans=0.0 2023-11-19 07:35:00,539 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10700, loss[loss=0.06469, simple_loss=0.07411, pruned_loss=0.01635, audio_tagging_loss=0.01128, over 14038.00 frames. ], tot_loss[loss=0.08832, simple_loss=0.1059, pruned_loss=0.02468, audio_tagging_loss=0.01071, over 3036415.49 frames. ], batch size: 56, lr: 8.45e-03, grad_scale: 32.0 2023-11-19 07:35:23,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=632546.6666666666, ans=0.2 2023-11-19 07:35:37,032 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.376e+01 8.469e+01 9.057e+01 9.771e+01 1.264e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 07:35:46,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=632680.0, ans=0.125 2023-11-19 07:35:47,373 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.85 vs. limit=15.0 2023-11-19 07:35:50,187 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=632680.0, ans=0.0 2023-11-19 07:35:52,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=632680.0, ans=0.1 2023-11-19 07:35:56,136 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10750, loss[loss=0.08972, simple_loss=0.1109, pruned_loss=0.02817, audio_tagging_loss=0.006087, over 15044.00 frames. ], tot_loss[loss=0.08862, simple_loss=0.1065, pruned_loss=0.02481, audio_tagging_loss=0.01056, over 3040312.13 frames. ], batch size: 55, lr: 8.45e-03, grad_scale: 32.0 2023-11-19 07:36:02,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=632746.6666666666, ans=0.125 2023-11-19 07:36:10,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=632813.3333333334, ans=0.125 2023-11-19 07:36:16,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=632813.3333333334, ans=0.0 2023-11-19 07:36:51,458 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10800, loss[loss=0.08745, simple_loss=0.1067, pruned_loss=0.02394, audio_tagging_loss=0.01014, over 15162.00 frames. ], tot_loss[loss=0.08846, simple_loss=0.1066, pruned_loss=0.02465, audio_tagging_loss=0.01053, over 3043072.21 frames. ], batch size: 57, lr: 8.44e-03, grad_scale: 32.0 2023-11-19 07:36:51,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633080.0, ans=0.1 2023-11-19 07:37:16,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=633213.3333333334, ans=0.125 2023-11-19 07:37:27,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=633280.0, ans=0.125 2023-11-19 07:37:28,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633280.0, ans=0.1 2023-11-19 07:37:29,645 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.352e+01 8.684e+01 9.473e+01 1.039e+02 1.467e+02, threshold=1.895e+02, percent-clipped=0.0 2023-11-19 07:37:33,364 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=633280.0, ans=0.125 2023-11-19 07:37:35,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.82 vs. limit=22.5 2023-11-19 07:37:48,137 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10850, loss[loss=0.06539, simple_loss=0.07306, pruned_loss=0.01742, audio_tagging_loss=0.01144, over 15356.00 frames. ], tot_loss[loss=0.0884, simple_loss=0.1065, pruned_loss=0.02459, audio_tagging_loss=0.01059, over 3044615.29 frames. ], batch size: 58, lr: 8.44e-03, grad_scale: 16.0 2023-11-19 07:37:49,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=633413.3333333334, ans=0.0 2023-11-19 07:37:51,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=633413.3333333334, ans=0.1 2023-11-19 07:37:55,390 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=12.0 2023-11-19 07:38:06,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=633480.0, ans=0.125 2023-11-19 07:38:06,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=633480.0, ans=0.125 2023-11-19 07:38:07,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=633480.0, ans=0.125 2023-11-19 07:38:17,281 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.55 vs. limit=15.0 2023-11-19 07:38:19,977 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=633613.3333333334, ans=0.1 2023-11-19 07:38:33,925 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=3.658e-01 2023-11-19 07:38:37,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_positive, batch_count=633680.0, ans=0.05 2023-11-19 07:38:37,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=633680.0, ans=0.0 2023-11-19 07:38:40,561 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:38:43,098 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.09 vs. limit=15.0 2023-11-19 07:38:43,645 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10900, loss[loss=0.1131, simple_loss=0.1267, pruned_loss=0.03938, audio_tagging_loss=0.01033, over 15266.00 frames. ], tot_loss[loss=0.08931, simple_loss=0.1075, pruned_loss=0.02494, audio_tagging_loss=0.01062, over 3048908.26 frames. ], batch size: 58, lr: 8.44e-03, grad_scale: 16.0 2023-11-19 07:38:56,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=633813.3333333334, ans=0.2 2023-11-19 07:38:59,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2023-11-19 07:39:15,560 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.48 vs. limit=15.0 2023-11-19 07:39:21,879 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.513e+01 8.390e+01 9.440e+01 1.044e+02 1.572e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-19 07:39:24,646 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.81 vs. limit=22.5 2023-11-19 07:39:27,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=634013.3333333334, ans=0.125 2023-11-19 07:39:30,060 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:39:39,383 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 10950, loss[loss=0.0886, simple_loss=0.1032, pruned_loss=0.02224, audio_tagging_loss=0.01475, over 15071.00 frames. ], tot_loss[loss=0.08909, simple_loss=0.1071, pruned_loss=0.02483, audio_tagging_loss=0.01069, over 3048659.18 frames. ], batch size: 54, lr: 8.44e-03, grad_scale: 16.0 2023-11-19 07:40:05,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=634213.3333333334, ans=0.2 2023-11-19 07:40:18,312 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.90 vs. limit=6.0 2023-11-19 07:40:34,758 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11000, loss[loss=0.0814, simple_loss=0.1023, pruned_loss=0.02129, audio_tagging_loss=0.008982, over 15007.00 frames. ], tot_loss[loss=0.08882, simple_loss=0.107, pruned_loss=0.02465, audio_tagging_loss=0.01066, over 3045061.00 frames. ], batch size: 54, lr: 8.44e-03, grad_scale: 16.0 2023-11-19 07:40:44,803 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:40:48,392 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.15 vs. limit=15.0 2023-11-19 07:41:12,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=634613.3333333334, ans=0.2 2023-11-19 07:41:12,768 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.925e+01 8.297e+01 9.076e+01 1.002e+02 1.429e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-19 07:41:13,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=634613.3333333334, ans=0.125 2023-11-19 07:41:13,979 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=634613.3333333334, ans=0.1 2023-11-19 07:41:20,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=634680.0, ans=0.2 2023-11-19 07:41:30,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=634746.6666666666, ans=0.0 2023-11-19 07:41:31,720 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11050, loss[loss=0.1094, simple_loss=0.1314, pruned_loss=0.0294, audio_tagging_loss=0.01427, over 16268.00 frames. ], tot_loss[loss=0.08937, simple_loss=0.1074, pruned_loss=0.02495, audio_tagging_loss=0.01073, over 3042702.92 frames. ], batch size: 59, lr: 8.43e-03, grad_scale: 16.0 2023-11-19 07:41:35,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=634746.6666666666, ans=0.0 2023-11-19 07:41:36,186 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=634746.6666666666, ans=0.1 2023-11-19 07:41:37,169 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:41:44,149 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=634813.3333333334, ans=0.1 2023-11-19 07:41:45,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.61 vs. limit=15.0 2023-11-19 07:41:56,861 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:42:04,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=634946.6666666666, ans=0.125 2023-11-19 07:42:13,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=634946.6666666666, ans=0.125 2023-11-19 07:42:22,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=635013.3333333334, ans=0.2 2023-11-19 07:42:23,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=635013.3333333334, ans=0.05 2023-11-19 07:42:27,263 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11100, loss[loss=0.08817, simple_loss=0.1102, pruned_loss=0.02236, audio_tagging_loss=0.01071, over 15590.00 frames. ], tot_loss[loss=0.08916, simple_loss=0.1071, pruned_loss=0.02471, audio_tagging_loss=0.01092, over 3046088.52 frames. ], batch size: 57, lr: 8.43e-03, grad_scale: 16.0 2023-11-19 07:42:48,107 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.41 vs. limit=15.0 2023-11-19 07:42:53,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=635213.3333333334, ans=0.125 2023-11-19 07:43:00,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=635280.0, ans=0.125 2023-11-19 07:43:01,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=635280.0, ans=0.0 2023-11-19 07:43:05,511 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.101e+01 8.720e+01 9.689e+01 1.049e+02 1.321e+02, threshold=1.938e+02, percent-clipped=0.0 2023-11-19 07:43:16,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=635346.6666666666, ans=0.0 2023-11-19 07:43:22,344 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11150, loss[loss=0.09651, simple_loss=0.1091, pruned_loss=0.02894, audio_tagging_loss=0.01303, over 14985.00 frames. ], tot_loss[loss=0.08866, simple_loss=0.1062, pruned_loss=0.02458, audio_tagging_loss=0.01097, over 3040445.15 frames. ], batch size: 57, lr: 8.43e-03, grad_scale: 16.0 2023-11-19 07:43:32,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=635413.3333333334, ans=0.1 2023-11-19 07:43:52,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=635546.6666666666, ans=0.125 2023-11-19 07:43:57,490 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=635613.3333333334, ans=0.035 2023-11-19 07:44:13,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer1.prob, batch_count=635680.0, ans=0.125 2023-11-19 07:44:18,684 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11200, loss[loss=0.06045, simple_loss=0.06321, pruned_loss=0.01462, audio_tagging_loss=0.01422, over 14971.00 frames. ], tot_loss[loss=0.08919, simple_loss=0.1069, pruned_loss=0.02473, audio_tagging_loss=0.01102, over 3035779.00 frames. ], batch size: 62, lr: 8.43e-03, grad_scale: 32.0 2023-11-19 07:44:18,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer_na.min_abs, batch_count=635746.6666666666, ans=0.02 2023-11-19 07:44:45,615 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:44:56,021 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.574e+01 8.472e+01 9.235e+01 9.971e+01 1.338e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 07:45:10,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=636013.3333333334, ans=0.125 2023-11-19 07:45:14,413 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11250, loss[loss=0.05111, simple_loss=0.05339, pruned_loss=0.01196, audio_tagging_loss=0.01245, over 15471.00 frames. ], tot_loss[loss=0.08959, simple_loss=0.1072, pruned_loss=0.02504, audio_tagging_loss=0.01095, over 3033277.56 frames. ], batch size: 60, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:45:28,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=636146.6666666666, ans=0.1 2023-11-19 07:45:48,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=636280.0, ans=0.125 2023-11-19 07:45:48,956 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=636280.0, ans=0.125 2023-11-19 07:46:09,193 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11300, loss[loss=0.08718, simple_loss=0.09617, pruned_loss=0.02391, audio_tagging_loss=0.01519, over 14962.00 frames. ], tot_loss[loss=0.08855, simple_loss=0.1061, pruned_loss=0.02467, audio_tagging_loss=0.01085, over 3038592.50 frames. ], batch size: 59, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:46:15,201 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=636413.3333333334, ans=0.125 2023-11-19 07:46:37,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=636546.6666666666, ans=0.035 2023-11-19 07:46:47,209 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.708e+01 8.767e+01 9.641e+01 1.057e+02 1.574e+02, threshold=1.928e+02, percent-clipped=0.0 2023-11-19 07:46:48,547 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:46:50,724 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=13.03 vs. limit=15.0 2023-11-19 07:47:05,219 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11350, loss[loss=0.08253, simple_loss=0.1067, pruned_loss=0.01782, audio_tagging_loss=0.01134, over 15870.00 frames. ], tot_loss[loss=0.08838, simple_loss=0.1061, pruned_loss=0.02453, audio_tagging_loss=0.01079, over 3046884.67 frames. ], batch size: 58, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:47:05,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=636746.6666666666, ans=0.2 2023-11-19 07:47:25,166 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=11.10 vs. limit=15.0 2023-11-19 07:47:37,144 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=636946.6666666666, ans=0.2 2023-11-19 07:48:01,154 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11400, loss[loss=0.07333, simple_loss=0.07905, pruned_loss=0.02401, audio_tagging_loss=0.009797, over 15426.00 frames. ], tot_loss[loss=0.08835, simple_loss=0.106, pruned_loss=0.02465, audio_tagging_loss=0.01069, over 3054340.92 frames. ], batch size: 59, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:48:05,788 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=5.60 vs. limit=10.0 2023-11-19 07:48:38,434 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.888e+01 8.501e+01 9.316e+01 1.025e+02 1.797e+02, threshold=1.863e+02, percent-clipped=0.0 2023-11-19 07:48:47,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=637346.6666666666, ans=0.125 2023-11-19 07:48:51,301 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=637346.6666666666, ans=0.05 2023-11-19 07:48:54,700 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.01 vs. limit=12.0 2023-11-19 07:48:56,275 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11450, loss[loss=0.06674, simple_loss=0.07975, pruned_loss=0.01541, audio_tagging_loss=0.01146, over 14693.00 frames. ], tot_loss[loss=0.0892, simple_loss=0.1071, pruned_loss=0.025, audio_tagging_loss=0.01066, over 3061390.84 frames. ], batch size: 57, lr: 8.42e-03, grad_scale: 32.0 2023-11-19 07:49:03,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=637413.3333333334, ans=0.125 2023-11-19 07:49:03,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=637413.3333333334, ans=0.125 2023-11-19 07:49:15,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=637480.0, ans=0.1 2023-11-19 07:49:28,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=637546.6666666666, ans=0.0 2023-11-19 07:49:35,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=637613.3333333334, ans=0.2 2023-11-19 07:49:53,048 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11500, loss[loss=0.07727, simple_loss=0.1002, pruned_loss=0.01813, audio_tagging_loss=0.009061, over 15787.00 frames. ], tot_loss[loss=0.08977, simple_loss=0.1082, pruned_loss=0.02518, audio_tagging_loss=0.01051, over 3053320.67 frames. ], batch size: 61, lr: 8.41e-03, grad_scale: 32.0 2023-11-19 07:50:02,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.max_abs, batch_count=637813.3333333334, ans=10.0 2023-11-19 07:50:15,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.83 vs. limit=15.0 2023-11-19 07:50:16,946 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.81 vs. limit=22.5 2023-11-19 07:50:18,701 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=637880.0, ans=0.0 2023-11-19 07:50:19,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=637880.0, ans=0.0 2023-11-19 07:50:19,873 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer1.prob, batch_count=637880.0, ans=0.125 2023-11-19 07:50:30,711 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.770e+01 8.483e+01 9.238e+01 9.842e+01 1.262e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 07:50:33,179 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=637946.6666666666, ans=0.05 2023-11-19 07:50:37,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=638013.3333333334, ans=0.125 2023-11-19 07:50:44,107 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:50:49,248 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11550, loss[loss=0.08178, simple_loss=0.1061, pruned_loss=0.01964, audio_tagging_loss=0.009083, over 14906.00 frames. ], tot_loss[loss=0.08982, simple_loss=0.1083, pruned_loss=0.02517, audio_tagging_loss=0.0105, over 3054203.13 frames. ], batch size: 54, lr: 8.41e-03, grad_scale: 32.0 2023-11-19 07:51:06,241 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=638146.6666666666, ans=0.125 2023-11-19 07:51:07,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=638146.6666666666, ans=0.125 2023-11-19 07:51:12,020 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=638213.3333333334, ans=0.2 2023-11-19 07:51:17,060 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.46 vs. limit=22.5 2023-11-19 07:51:22,838 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 07:51:25,609 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.36 vs. limit=15.0 2023-11-19 07:51:36,089 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.17 vs. limit=12.0 2023-11-19 07:51:43,891 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11600, loss[loss=0.08008, simple_loss=0.09624, pruned_loss=0.02105, audio_tagging_loss=0.01091, over 14557.00 frames. ], tot_loss[loss=0.08974, simple_loss=0.1083, pruned_loss=0.02516, audio_tagging_loss=0.01042, over 3048755.94 frames. ], batch size: 57, lr: 8.41e-03, grad_scale: 32.0 2023-11-19 07:51:59,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=638480.0, ans=0.0 2023-11-19 07:52:06,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=638546.6666666666, ans=0.125 2023-11-19 07:52:21,739 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.115e+01 8.734e+01 9.337e+01 1.014e+02 1.440e+02, threshold=1.867e+02, percent-clipped=0.0 2023-11-19 07:52:24,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=638613.3333333334, ans=0.0 2023-11-19 07:52:39,904 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11650, loss[loss=0.1066, simple_loss=0.1351, pruned_loss=0.03286, audio_tagging_loss=0.006195, over 15137.00 frames. ], tot_loss[loss=0.09072, simple_loss=0.1097, pruned_loss=0.02549, audio_tagging_loss=0.0104, over 3044199.49 frames. ], batch size: 57, lr: 8.41e-03, grad_scale: 16.0 2023-11-19 07:52:44,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=638746.6666666666, ans=0.2 2023-11-19 07:52:54,430 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=638813.3333333334, ans=0.2 2023-11-19 07:52:58,477 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.59 vs. limit=10.0 2023-11-19 07:53:17,600 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=638946.6666666666, ans=0.125 2023-11-19 07:53:25,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=639013.3333333334, ans=0.2 2023-11-19 07:53:34,702 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11700, loss[loss=0.1094, simple_loss=0.1334, pruned_loss=0.03414, audio_tagging_loss=0.008531, over 14919.00 frames. ], tot_loss[loss=0.09021, simple_loss=0.109, pruned_loss=0.02523, audio_tagging_loss=0.01047, over 3046792.30 frames. ], batch size: 53, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:53:48,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=639146.6666666666, ans=0.2 2023-11-19 07:53:50,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=3.54 vs. limit=15.0 2023-11-19 07:53:51,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=639146.6666666666, ans=0.125 2023-11-19 07:53:51,883 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=639146.6666666666, ans=0.0 2023-11-19 07:54:13,831 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.036e+01 8.154e+01 8.844e+01 9.528e+01 1.167e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 07:54:30,858 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11750, loss[loss=0.1089, simple_loss=0.1351, pruned_loss=0.03003, audio_tagging_loss=0.01132, over 15704.00 frames. ], tot_loss[loss=0.09015, simple_loss=0.1088, pruned_loss=0.02517, audio_tagging_loss=0.01059, over 3052831.22 frames. ], batch size: 57, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:54:33,551 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.30 vs. limit=15.0 2023-11-19 07:54:41,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=639480.0, ans=0.1 2023-11-19 07:55:20,476 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=639680.0, ans=0.0 2023-11-19 07:55:26,015 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11800, loss[loss=0.1116, simple_loss=0.1367, pruned_loss=0.03451, audio_tagging_loss=0.008814, over 16066.00 frames. ], tot_loss[loss=0.08937, simple_loss=0.1076, pruned_loss=0.0249, audio_tagging_loss=0.01068, over 3049975.67 frames. ], batch size: 57, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:55:51,667 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=10.25 vs. limit=15.0 2023-11-19 07:55:53,438 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 07:55:56,943 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.18 vs. limit=10.0 2023-11-19 07:56:05,555 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.407e+01 8.609e+01 9.500e+01 1.074e+02 1.455e+02, threshold=1.900e+02, percent-clipped=0.0 2023-11-19 07:56:07,975 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-96000.pt 2023-11-19 07:56:11,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.09 vs. limit=15.0 2023-11-19 07:56:13,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=640013.3333333334, ans=0.125 2023-11-19 07:56:24,333 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11850, loss[loss=0.09593, simple_loss=0.1216, pruned_loss=0.0263, audio_tagging_loss=0.008815, over 15971.00 frames. ], tot_loss[loss=0.08973, simple_loss=0.1078, pruned_loss=0.02504, audio_tagging_loss=0.01078, over 3046464.74 frames. ], batch size: 58, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:56:24,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=640080.0, ans=0.2 2023-11-19 07:56:25,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=640080.0, ans=0.125 2023-11-19 07:56:29,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=640080.0, ans=0.0 2023-11-19 07:56:30,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=640080.0, ans=0.04949747468305833 2023-11-19 07:56:55,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=640213.3333333334, ans=0.09899494936611666 2023-11-19 07:57:06,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=640280.0, ans=0.125 2023-11-19 07:57:20,090 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11900, loss[loss=0.1275, simple_loss=0.1429, pruned_loss=0.04399, audio_tagging_loss=0.01202, over 14932.00 frames. ], tot_loss[loss=0.0896, simple_loss=0.1074, pruned_loss=0.025, audio_tagging_loss=0.01089, over 3039350.36 frames. ], batch size: 55, lr: 8.40e-03, grad_scale: 16.0 2023-11-19 07:57:42,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=640546.6666666666, ans=0.0 2023-11-19 07:57:59,326 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.550e+01 8.414e+01 8.951e+01 9.837e+01 1.464e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 07:58:12,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=640680.0, ans=0.125 2023-11-19 07:58:16,218 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 11950, loss[loss=0.1074, simple_loss=0.1242, pruned_loss=0.03473, audio_tagging_loss=0.01059, over 15444.00 frames. ], tot_loss[loss=0.08948, simple_loss=0.1072, pruned_loss=0.02495, audio_tagging_loss=0.01094, over 3045385.41 frames. ], batch size: 57, lr: 8.39e-03, grad_scale: 16.0 2023-11-19 07:58:39,820 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=640880.0, ans=0.125 2023-11-19 07:58:42,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=640880.0, ans=0.125 2023-11-19 07:58:46,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer_na.min_abs, batch_count=640880.0, ans=0.02 2023-11-19 07:58:51,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=640946.6666666666, ans=0.125 2023-11-19 07:59:10,348 INFO [train_asr.py:1115] (0/4) Epoch 8, batch 12000, loss[loss=0.07087, simple_loss=0.076, pruned_loss=0.01958, audio_tagging_loss=0.01329, over 17388.00 frames. ], tot_loss[loss=0.08966, simple_loss=0.1073, pruned_loss=0.02497, audio_tagging_loss=0.01106, over 3051628.86 frames. ], batch size: 67, lr: 8.39e-03, grad_scale: 32.0 2023-11-19 07:59:10,350 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 07:59:35,677 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.2.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([4.5820, 3.7826, 4.3294, 3.2071], device='cuda:0') 2023-11-19 07:59:42,995 INFO [train_asr.py:1147] (0/4) Epoch 8, validation: loss=0.06649, simple_loss=0.05653, pruned_loss=0.006961, audio_tagging_loss=0.03127, over 4681554.00 frames. 2023-11-19 07:59:42,996 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 07:59:54,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=3.82 vs. limit=15.0 2023-11-19 08:00:08,468 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-8.pt 2023-11-19 08:00:44,329 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 0, loss[loss=0.1011, simple_loss=0.1119, pruned_loss=0.02119, audio_tagging_loss=0.02398, over 14686.00 frames. ], tot_loss[loss=0.1011, simple_loss=0.1119, pruned_loss=0.02119, audio_tagging_loss=0.02398, over 14686.00 frames. ], batch size: 55, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:00:44,331 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 08:01:16,089 INFO [train_asr.py:1147] (0/4) Epoch 9, validation: loss=0.06566, simple_loss=0.05652, pruned_loss=0.006966, audio_tagging_loss=0.03043, over 4681554.00 frames. 2023-11-19 08:01:16,089 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 08:01:28,796 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.618e+01 8.783e+01 9.637e+01 1.099e+02 1.400e+02, threshold=1.927e+02, percent-clipped=0.0 2023-11-19 08:01:51,511 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.71 vs. limit=22.5 2023-11-19 08:01:55,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=641440.0, ans=0.05 2023-11-19 08:02:05,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=641506.6666666666, ans=0.0 2023-11-19 08:02:10,475 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=641506.6666666666, ans=0.125 2023-11-19 08:02:12,380 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 50, loss[loss=0.09273, simple_loss=0.09979, pruned_loss=0.02099, audio_tagging_loss=0.02184, over 15490.00 frames. ], tot_loss[loss=0.09471, simple_loss=0.1014, pruned_loss=0.02267, audio_tagging_loss=0.02135, over 688927.75 frames. ], batch size: 58, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:02:16,084 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=18.38 vs. limit=15.0 2023-11-19 08:02:24,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=641640.0, ans=0.125 2023-11-19 08:02:26,999 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=641640.0, ans=0.125 2023-11-19 08:02:27,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=641640.0, ans=0.0 2023-11-19 08:02:34,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=641706.6666666666, ans=0.125 2023-11-19 08:02:43,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=641706.6666666666, ans=0.125 2023-11-19 08:02:43,969 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=641706.6666666666, ans=0.125 2023-11-19 08:02:46,225 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.07 vs. limit=22.5 2023-11-19 08:02:48,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=641773.3333333334, ans=0.0 2023-11-19 08:02:49,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=641773.3333333334, ans=0.0 2023-11-19 08:02:58,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=641840.0, ans=0.0 2023-11-19 08:03:02,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=641840.0, ans=0.0 2023-11-19 08:03:07,991 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 100, loss[loss=0.09888, simple_loss=0.1067, pruned_loss=0.02631, audio_tagging_loss=0.01922, over 16031.00 frames. ], tot_loss[loss=0.09807, simple_loss=0.1079, pruned_loss=0.02436, audio_tagging_loss=0.01976, over 1217122.20 frames. ], batch size: 63, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:03:19,967 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.444e+01 8.656e+01 9.404e+01 1.019e+02 1.351e+02, threshold=1.881e+02, percent-clipped=0.0 2023-11-19 08:03:23,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=641973.3333333334, ans=0.125 2023-11-19 08:03:29,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=642040.0, ans=0.1 2023-11-19 08:03:35,302 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.68 vs. limit=10.0 2023-11-19 08:03:39,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=642040.0, ans=0.125 2023-11-19 08:03:45,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=642106.6666666666, ans=0.1 2023-11-19 08:03:54,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=642173.3333333334, ans=0.125 2023-11-19 08:04:03,509 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 150, loss[loss=0.09483, simple_loss=0.1114, pruned_loss=0.02519, audio_tagging_loss=0.01393, over 14496.00 frames. ], tot_loss[loss=0.09577, simple_loss=0.1076, pruned_loss=0.02431, audio_tagging_loss=0.01766, over 1623487.19 frames. ], batch size: 55, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:04:13,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=642306.6666666666, ans=0.125 2023-11-19 08:04:34,247 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.07 vs. limit=15.0 2023-11-19 08:04:38,398 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=642440.0, ans=0.125 2023-11-19 08:04:40,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=642440.0, ans=0.2 2023-11-19 08:04:51,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=642506.6666666666, ans=0.0 2023-11-19 08:04:53,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=642506.6666666666, ans=0.125 2023-11-19 08:04:59,896 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 200, loss[loss=0.1029, simple_loss=0.1148, pruned_loss=0.03382, audio_tagging_loss=0.01167, over 15187.00 frames. ], tot_loss[loss=0.09346, simple_loss=0.1065, pruned_loss=0.02437, audio_tagging_loss=0.01584, over 1939320.62 frames. ], batch size: 57, lr: 7.94e-03, grad_scale: 32.0 2023-11-19 08:05:06,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.54 vs. limit=15.0 2023-11-19 08:05:07,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=642573.3333333334, ans=15.0 2023-11-19 08:05:13,046 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.507e+01 8.518e+01 9.330e+01 1.026e+02 1.321e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-19 08:05:14,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=642640.0, ans=0.035 2023-11-19 08:05:46,015 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=642840.0, ans=0.0 2023-11-19 08:05:49,676 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.48 vs. limit=15.0 2023-11-19 08:05:50,294 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:05:50,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=642840.0, ans=0.1 2023-11-19 08:05:55,846 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 250, loss[loss=0.06485, simple_loss=0.07728, pruned_loss=0.01501, audio_tagging_loss=0.01121, over 15389.00 frames. ], tot_loss[loss=0.09354, simple_loss=0.1086, pruned_loss=0.02505, audio_tagging_loss=0.01421, over 2186021.45 frames. ], batch size: 59, lr: 7.93e-03, grad_scale: 16.0 2023-11-19 08:06:05,993 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.41 vs. limit=15.0 2023-11-19 08:06:07,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=642973.3333333334, ans=0.07 2023-11-19 08:06:23,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=643040.0, ans=0.2 2023-11-19 08:06:36,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=643106.6666666666, ans=0.125 2023-11-19 08:06:45,997 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=643173.3333333334, ans=0.125 2023-11-19 08:06:51,158 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 300, loss[loss=0.1003, simple_loss=0.1146, pruned_loss=0.03226, audio_tagging_loss=0.01076, over 15421.00 frames. ], tot_loss[loss=0.09243, simple_loss=0.1084, pruned_loss=0.02507, audio_tagging_loss=0.01318, over 2380646.95 frames. ], batch size: 60, lr: 7.93e-03, grad_scale: 16.0 2023-11-19 08:06:53,536 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=643240.0, ans=0.125 2023-11-19 08:07:04,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=643306.6666666666, ans=0.95 2023-11-19 08:07:05,328 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.313e+01 8.625e+01 9.241e+01 1.032e+02 1.343e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 08:07:05,960 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=643306.6666666666, ans=15.0 2023-11-19 08:07:15,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=643373.3333333334, ans=0.125 2023-11-19 08:07:21,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=643373.3333333334, ans=0.2 2023-11-19 08:07:22,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=643373.3333333334, ans=0.1 2023-11-19 08:07:23,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=643440.0, ans=0.0 2023-11-19 08:07:23,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=643440.0, ans=0.125 2023-11-19 08:07:27,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=643440.0, ans=0.125 2023-11-19 08:07:28,857 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.65 vs. limit=5.0 2023-11-19 08:07:33,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.55 vs. limit=22.5 2023-11-19 08:07:34,498 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=643506.6666666666, ans=0.1 2023-11-19 08:07:47,521 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 350, loss[loss=0.08332, simple_loss=0.09817, pruned_loss=0.02098, audio_tagging_loss=0.01326, over 14940.00 frames. ], tot_loss[loss=0.09081, simple_loss=0.1073, pruned_loss=0.02461, audio_tagging_loss=0.01254, over 2534273.83 frames. ], batch size: 56, lr: 7.93e-03, grad_scale: 16.0 2023-11-19 08:07:57,841 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.78 vs. limit=6.0 2023-11-19 08:07:58,488 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=643640.0, ans=0.125 2023-11-19 08:08:19,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=643773.3333333334, ans=0.1 2023-11-19 08:08:29,550 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2023-11-19 08:08:32,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=643840.0, ans=0.125 2023-11-19 08:08:42,509 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=643906.6666666666, ans=0.0 2023-11-19 08:08:43,343 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 400, loss[loss=0.07042, simple_loss=0.08115, pruned_loss=0.01802, audio_tagging_loss=0.01182, over 14641.00 frames. ], tot_loss[loss=0.0902, simple_loss=0.1071, pruned_loss=0.02462, audio_tagging_loss=0.01202, over 2647141.25 frames. ], batch size: 56, lr: 7.93e-03, grad_scale: 32.0 2023-11-19 08:08:55,932 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.016e+01 8.363e+01 9.025e+01 9.871e+01 1.227e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 08:09:10,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=644040.0, ans=0.0 2023-11-19 08:09:26,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=644106.6666666666, ans=0.125 2023-11-19 08:09:27,753 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.39 vs. limit=15.0 2023-11-19 08:09:35,707 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:09:36,819 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=644173.3333333334, ans=0.125 2023-11-19 08:09:38,847 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 450, loss[loss=0.06973, simple_loss=0.07786, pruned_loss=0.02002, audio_tagging_loss=0.01077, over 16261.00 frames. ], tot_loss[loss=0.08923, simple_loss=0.1066, pruned_loss=0.02429, audio_tagging_loss=0.01162, over 2741311.04 frames. ], batch size: 62, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:09:41,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=644240.0, ans=0.125 2023-11-19 08:09:42,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=644240.0, ans=0.125 2023-11-19 08:09:42,570 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.17 vs. limit=15.0 2023-11-19 08:09:58,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=644306.6666666666, ans=0.0 2023-11-19 08:09:59,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=644306.6666666666, ans=0.125 2023-11-19 08:10:04,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=644373.3333333334, ans=0.0 2023-11-19 08:10:13,603 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn1.whiten, num_groups=1, num_channels=512, metric=22.06 vs. limit=22.5 2023-11-19 08:10:22,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=644506.6666666666, ans=0.1 2023-11-19 08:10:24,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer_na.min_abs, batch_count=644506.6666666666, ans=0.02 2023-11-19 08:10:27,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=644506.6666666666, ans=0.2 2023-11-19 08:10:31,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=644506.6666666666, ans=0.0 2023-11-19 08:10:33,237 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=15.79 vs. limit=22.5 2023-11-19 08:10:35,250 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 500, loss[loss=0.1273, simple_loss=0.1499, pruned_loss=0.04354, audio_tagging_loss=0.008804, over 14670.00 frames. ], tot_loss[loss=0.08854, simple_loss=0.1057, pruned_loss=0.02426, audio_tagging_loss=0.01142, over 2806022.37 frames. ], batch size: 55, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:10:41,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=644573.3333333334, ans=0.2 2023-11-19 08:10:45,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=644640.0, ans=0.0 2023-11-19 08:10:48,619 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.872e+01 8.533e+01 9.443e+01 1.042e+02 1.372e+02, threshold=1.889e+02, percent-clipped=0.0 2023-11-19 08:11:00,912 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=644706.6666666666, ans=0.125 2023-11-19 08:11:12,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=644773.3333333334, ans=0.0 2023-11-19 08:11:25,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=644840.0, ans=0.07 2023-11-19 08:11:31,022 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 550, loss[loss=0.08851, simple_loss=0.09766, pruned_loss=0.02718, audio_tagging_loss=0.01251, over 15408.00 frames. ], tot_loss[loss=0.08853, simple_loss=0.1061, pruned_loss=0.02434, audio_tagging_loss=0.01116, over 2860895.00 frames. ], batch size: 61, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:11:43,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=644973.3333333334, ans=0.125 2023-11-19 08:11:58,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=645040.0, ans=0.125 2023-11-19 08:12:05,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=645106.6666666666, ans=0.2 2023-11-19 08:12:06,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=645106.6666666666, ans=0.125 2023-11-19 08:12:24,903 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.min_positive, batch_count=645173.3333333334, ans=0.025 2023-11-19 08:12:26,855 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 600, loss[loss=0.103, simple_loss=0.122, pruned_loss=0.03197, audio_tagging_loss=0.01001, over 14327.00 frames. ], tot_loss[loss=0.08799, simple_loss=0.1056, pruned_loss=0.02411, audio_tagging_loss=0.01108, over 2897952.02 frames. ], batch size: 55, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:12:38,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=645306.6666666666, ans=0.125 2023-11-19 08:12:38,450 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.24 vs. limit=22.5 2023-11-19 08:12:39,528 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.77 vs. limit=15.0 2023-11-19 08:12:40,044 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.132e+01 8.342e+01 9.026e+01 9.768e+01 1.504e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 08:12:52,200 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=645373.3333333334, ans=0.125 2023-11-19 08:12:56,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=645373.3333333334, ans=0.125 2023-11-19 08:12:58,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=645373.3333333334, ans=0.0 2023-11-19 08:13:20,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=645506.6666666666, ans=0.0 2023-11-19 08:13:22,987 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 650, loss[loss=0.0923, simple_loss=0.1094, pruned_loss=0.02686, audio_tagging_loss=0.01074, over 15112.00 frames. ], tot_loss[loss=0.08831, simple_loss=0.106, pruned_loss=0.02431, audio_tagging_loss=0.011, over 2928210.01 frames. ], batch size: 57, lr: 7.92e-03, grad_scale: 32.0 2023-11-19 08:13:35,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.55 vs. limit=22.5 2023-11-19 08:13:51,829 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.58 vs. limit=15.0 2023-11-19 08:14:16,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=645840.0, ans=0.07 2023-11-19 08:14:19,616 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 700, loss[loss=0.09648, simple_loss=0.1197, pruned_loss=0.02798, audio_tagging_loss=0.008669, over 15503.00 frames. ], tot_loss[loss=0.08789, simple_loss=0.1056, pruned_loss=0.02415, audio_tagging_loss=0.01093, over 2956597.03 frames. ], batch size: 58, lr: 7.91e-03, grad_scale: 16.0 2023-11-19 08:14:33,922 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.588e+01 8.287e+01 8.978e+01 1.006e+02 1.254e+02, threshold=1.796e+02, percent-clipped=0.0 2023-11-19 08:14:34,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=645973.3333333334, ans=0.0 2023-11-19 08:14:40,466 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=646040.0, ans=0.0 2023-11-19 08:15:02,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=646106.6666666666, ans=0.125 2023-11-19 08:15:10,262 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=646173.3333333334, ans=0.125 2023-11-19 08:15:13,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=646173.3333333334, ans=0.125 2023-11-19 08:15:15,509 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 750, loss[loss=0.07689, simple_loss=0.08877, pruned_loss=0.01991, audio_tagging_loss=0.01259, over 14975.00 frames. ], tot_loss[loss=0.08892, simple_loss=0.107, pruned_loss=0.02453, audio_tagging_loss=0.0109, over 2974787.34 frames. ], batch size: 57, lr: 7.91e-03, grad_scale: 16.0 2023-11-19 08:15:21,552 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.23 vs. limit=15.0 2023-11-19 08:15:36,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=646373.3333333334, ans=0.125 2023-11-19 08:15:38,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=646373.3333333334, ans=0.125 2023-11-19 08:15:40,448 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=12.0 2023-11-19 08:16:11,324 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 800, loss[loss=0.0998, simple_loss=0.1188, pruned_loss=0.02911, audio_tagging_loss=0.0113, over 15358.00 frames. ], tot_loss[loss=0.08905, simple_loss=0.1071, pruned_loss=0.02461, audio_tagging_loss=0.01087, over 2992391.03 frames. ], batch size: 58, lr: 7.91e-03, grad_scale: 32.0 2023-11-19 08:16:19,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=646573.3333333334, ans=0.09899494936611666 2023-11-19 08:16:24,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=646640.0, ans=0.0 2023-11-19 08:16:25,538 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.051e+01 8.571e+01 9.363e+01 1.050e+02 1.472e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-19 08:16:33,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=646706.6666666666, ans=0.125 2023-11-19 08:16:50,373 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.31 vs. limit=12.0 2023-11-19 08:16:54,334 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.65 vs. limit=22.5 2023-11-19 08:17:07,122 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 850, loss[loss=0.08977, simple_loss=0.1125, pruned_loss=0.02194, audio_tagging_loss=0.01155, over 16228.00 frames. ], tot_loss[loss=0.08957, simple_loss=0.1079, pruned_loss=0.02477, audio_tagging_loss=0.01087, over 3010405.28 frames. ], batch size: 62, lr: 7.91e-03, grad_scale: 32.0 2023-11-19 08:17:27,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=647040.0, ans=0.125 2023-11-19 08:17:50,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.33 vs. limit=10.0 2023-11-19 08:17:53,093 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=647173.3333333334, ans=0.125 2023-11-19 08:17:56,678 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.73 vs. limit=22.5 2023-11-19 08:18:02,489 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 900, loss[loss=0.1237, simple_loss=0.1457, pruned_loss=0.04208, audio_tagging_loss=0.008811, over 15950.00 frames. ], tot_loss[loss=0.0895, simple_loss=0.1077, pruned_loss=0.02475, audio_tagging_loss=0.01088, over 3023075.83 frames. ], batch size: 59, lr: 7.91e-03, grad_scale: 32.0 2023-11-19 08:18:16,858 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.327e+01 8.077e+01 9.345e+01 1.007e+02 1.276e+02, threshold=1.869e+02, percent-clipped=0.0 2023-11-19 08:18:33,026 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.51 vs. limit=10.0 2023-11-19 08:18:48,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=647506.6666666666, ans=0.0 2023-11-19 08:18:58,292 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 950, loss[loss=0.1331, simple_loss=0.1657, pruned_loss=0.04243, audio_tagging_loss=0.007886, over 16250.00 frames. ], tot_loss[loss=0.08912, simple_loss=0.1074, pruned_loss=0.02457, audio_tagging_loss=0.01086, over 3034244.34 frames. ], batch size: 59, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:19:07,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=647573.3333333334, ans=0.125 2023-11-19 08:19:10,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=647640.0, ans=0.125 2023-11-19 08:19:10,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=647640.0, ans=0.125 2023-11-19 08:19:20,055 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.75 vs. limit=6.0 2023-11-19 08:19:21,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.08 vs. limit=15.0 2023-11-19 08:19:26,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=647706.6666666666, ans=0.0 2023-11-19 08:19:42,974 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=647840.0, ans=0.125 2023-11-19 08:19:47,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=647840.0, ans=0.1 2023-11-19 08:19:48,998 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.64 vs. limit=22.5 2023-11-19 08:19:53,966 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1000, loss[loss=0.1102, simple_loss=0.1401, pruned_loss=0.03035, audio_tagging_loss=0.009796, over 14583.00 frames. ], tot_loss[loss=0.08917, simple_loss=0.1077, pruned_loss=0.02462, audio_tagging_loss=0.0107, over 3027543.85 frames. ], batch size: 53, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:20:04,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=647973.3333333334, ans=0.1 2023-11-19 08:20:08,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=647973.3333333334, ans=0.0 2023-11-19 08:20:08,926 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.042e+01 8.007e+01 8.931e+01 9.562e+01 1.265e+02, threshold=1.786e+02, percent-clipped=0.0 2023-11-19 08:20:17,762 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:20:47,447 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=15.0 2023-11-19 08:20:50,071 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1050, loss[loss=0.07917, simple_loss=0.08923, pruned_loss=0.02477, audio_tagging_loss=0.009793, over 15933.00 frames. ], tot_loss[loss=0.08895, simple_loss=0.1075, pruned_loss=0.02458, audio_tagging_loss=0.01059, over 3027868.10 frames. ], batch size: 61, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:21:09,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=648306.6666666666, ans=0.04949747468305833 2023-11-19 08:21:14,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=648373.3333333334, ans=0.125 2023-11-19 08:21:16,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=648373.3333333334, ans=0.125 2023-11-19 08:21:18,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=648373.3333333334, ans=0.0 2023-11-19 08:21:22,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=648440.0, ans=0.125 2023-11-19 08:21:39,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=648506.6666666666, ans=0.125 2023-11-19 08:21:41,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=648506.6666666666, ans=0.2 2023-11-19 08:21:43,741 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=648506.6666666666, ans=0.125 2023-11-19 08:21:46,131 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1100, loss[loss=0.1107, simple_loss=0.1294, pruned_loss=0.03389, audio_tagging_loss=0.01215, over 15698.00 frames. ], tot_loss[loss=0.08842, simple_loss=0.1069, pruned_loss=0.02446, audio_tagging_loss=0.01052, over 3028788.32 frames. ], batch size: 57, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:21:48,231 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:21:53,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=648573.3333333334, ans=0.125 2023-11-19 08:21:53,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.prob, batch_count=648573.3333333334, ans=0.125 2023-11-19 08:21:59,844 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=15.0 2023-11-19 08:22:00,250 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.479e+01 9.483e+01 1.065e+02 1.916e+02, threshold=1.897e+02, percent-clipped=1.0 2023-11-19 08:22:07,249 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.33 vs. limit=10.0 2023-11-19 08:22:12,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=648706.6666666666, ans=0.2 2023-11-19 08:22:25,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.out_whiten, num_groups=1, num_channels=192, metric=6.65 vs. limit=8.0 2023-11-19 08:22:41,991 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1150, loss[loss=0.07429, simple_loss=0.08714, pruned_loss=0.01851, audio_tagging_loss=0.01221, over 14320.00 frames. ], tot_loss[loss=0.08864, simple_loss=0.1073, pruned_loss=0.02451, audio_tagging_loss=0.01046, over 3044244.01 frames. ], batch size: 54, lr: 7.90e-03, grad_scale: 32.0 2023-11-19 08:22:51,056 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.89 vs. limit=22.5 2023-11-19 08:22:51,122 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.63 vs. limit=15.0 2023-11-19 08:22:58,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=648973.3333333334, ans=0.0 2023-11-19 08:22:59,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=648973.3333333334, ans=0.125 2023-11-19 08:23:08,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=649040.0, ans=0.125 2023-11-19 08:23:15,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=649106.6666666666, ans=0.07 2023-11-19 08:23:37,845 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1200, loss[loss=0.0767, simple_loss=0.09862, pruned_loss=0.01885, audio_tagging_loss=0.008538, over 14450.00 frames. ], tot_loss[loss=0.08869, simple_loss=0.1074, pruned_loss=0.02463, audio_tagging_loss=0.01038, over 3038872.09 frames. ], batch size: 55, lr: 7.89e-03, grad_scale: 32.0 2023-11-19 08:23:43,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=649240.0, ans=0.125 2023-11-19 08:23:51,319 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=649306.6666666666, ans=0.125 2023-11-19 08:23:52,033 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.405e+01 8.142e+01 8.954e+01 1.003e+02 1.503e+02, threshold=1.791e+02, percent-clipped=0.0 2023-11-19 08:23:53,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=649306.6666666666, ans=0.125 2023-11-19 08:23:59,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=649373.3333333334, ans=0.04949747468305833 2023-11-19 08:24:02,787 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=649373.3333333334, ans=0.0 2023-11-19 08:24:21,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=649506.6666666666, ans=0.125 2023-11-19 08:24:32,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=649573.3333333334, ans=0.125 2023-11-19 08:24:33,448 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1250, loss[loss=0.0637, simple_loss=0.07877, pruned_loss=0.01356, audio_tagging_loss=0.01075, over 14665.00 frames. ], tot_loss[loss=0.08815, simple_loss=0.1062, pruned_loss=0.02456, audio_tagging_loss=0.01051, over 3039395.54 frames. ], batch size: 57, lr: 7.89e-03, grad_scale: 32.0 2023-11-19 08:24:49,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=649640.0, ans=0.0 2023-11-19 08:24:53,083 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.79 vs. limit=15.0 2023-11-19 08:25:11,682 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.82 vs. limit=10.0 2023-11-19 08:25:29,558 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1300, loss[loss=0.09743, simple_loss=0.118, pruned_loss=0.02875, audio_tagging_loss=0.009675, over 13777.00 frames. ], tot_loss[loss=0.08828, simple_loss=0.1064, pruned_loss=0.02454, audio_tagging_loss=0.01053, over 3036764.07 frames. ], batch size: 52, lr: 7.89e-03, grad_scale: 16.0 2023-11-19 08:25:36,313 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.80 vs. limit=15.0 2023-11-19 08:25:42,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=649973.3333333334, ans=0.125 2023-11-19 08:25:44,818 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.781e+01 8.314e+01 8.988e+01 1.010e+02 1.259e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 08:25:48,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.79 vs. limit=10.0 2023-11-19 08:25:50,872 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.99 vs. limit=15.0 2023-11-19 08:25:53,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.94 vs. limit=12.0 2023-11-19 08:26:02,336 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.22 vs. limit=12.0 2023-11-19 08:26:15,411 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=650173.3333333334, ans=0.125 2023-11-19 08:26:25,336 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1350, loss[loss=0.07708, simple_loss=0.0883, pruned_loss=0.02106, audio_tagging_loss=0.01187, over 15217.00 frames. ], tot_loss[loss=0.08864, simple_loss=0.107, pruned_loss=0.02464, audio_tagging_loss=0.01053, over 3040491.88 frames. ], batch size: 60, lr: 7.89e-03, grad_scale: 16.0 2023-11-19 08:26:32,972 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=650240.0, ans=0.125 2023-11-19 08:26:47,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=650373.3333333334, ans=0.125 2023-11-19 08:26:49,417 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=650373.3333333334, ans=0.1 2023-11-19 08:26:55,223 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.77 vs. limit=22.5 2023-11-19 08:26:56,200 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.25 vs. limit=15.0 2023-11-19 08:26:56,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=650373.3333333334, ans=0.125 2023-11-19 08:27:00,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=650440.0, ans=0.125 2023-11-19 08:27:05,216 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:27:07,127 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.99 vs. limit=6.0 2023-11-19 08:27:11,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=650506.6666666666, ans=0.0 2023-11-19 08:27:15,646 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.34 vs. limit=12.0 2023-11-19 08:27:16,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=650506.6666666666, ans=0.0 2023-11-19 08:27:20,253 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1400, loss[loss=0.08331, simple_loss=0.09247, pruned_loss=0.01998, audio_tagging_loss=0.0171, over 14678.00 frames. ], tot_loss[loss=0.08899, simple_loss=0.1072, pruned_loss=0.02481, audio_tagging_loss=0.01059, over 3043519.77 frames. ], batch size: 57, lr: 7.89e-03, grad_scale: 16.0 2023-11-19 08:27:29,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=650573.3333333334, ans=0.125 2023-11-19 08:27:36,580 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.876e+01 8.323e+01 8.984e+01 9.924e+01 1.651e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-19 08:27:38,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=650640.0, ans=0.1 2023-11-19 08:27:46,432 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=13.07 vs. limit=15.0 2023-11-19 08:27:48,934 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.prob, batch_count=650706.6666666666, ans=0.125 2023-11-19 08:27:52,304 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.81 vs. limit=15.0 2023-11-19 08:28:17,050 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1450, loss[loss=0.08677, simple_loss=0.09925, pruned_loss=0.02556, audio_tagging_loss=0.01158, over 15348.00 frames. ], tot_loss[loss=0.08988, simple_loss=0.1084, pruned_loss=0.02515, audio_tagging_loss=0.01052, over 3045952.55 frames. ], batch size: 59, lr: 7.88e-03, grad_scale: 16.0 2023-11-19 08:28:23,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=650906.6666666666, ans=0.125 2023-11-19 08:28:33,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=650973.3333333334, ans=0.2 2023-11-19 08:29:12,417 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1500, loss[loss=0.09888, simple_loss=0.1261, pruned_loss=0.02806, audio_tagging_loss=0.00775, over 14748.00 frames. ], tot_loss[loss=0.0898, simple_loss=0.1084, pruned_loss=0.02498, audio_tagging_loss=0.01062, over 3047823.22 frames. ], batch size: 54, lr: 7.88e-03, grad_scale: 16.0 2023-11-19 08:29:12,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=651240.0, ans=0.125 2023-11-19 08:29:14,317 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=651240.0, ans=0.125 2023-11-19 08:29:26,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=651306.6666666666, ans=0.95 2023-11-19 08:29:27,755 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.092e+01 8.592e+01 9.437e+01 1.054e+02 1.547e+02, threshold=1.887e+02, percent-clipped=0.0 2023-11-19 08:29:30,145 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=651306.6666666666, ans=0.0 2023-11-19 08:29:35,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=651373.3333333334, ans=0.125 2023-11-19 08:30:08,178 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1550, loss[loss=0.08959, simple_loss=0.1058, pruned_loss=0.02717, audio_tagging_loss=0.009515, over 15343.00 frames. ], tot_loss[loss=0.08935, simple_loss=0.1077, pruned_loss=0.02481, audio_tagging_loss=0.0107, over 3053035.11 frames. ], batch size: 57, lr: 7.88e-03, grad_scale: 16.0 2023-11-19 08:30:17,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=651573.3333333334, ans=0.125 2023-11-19 08:30:17,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=651573.3333333334, ans=0.0 2023-11-19 08:30:26,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=651640.0, ans=0.125 2023-11-19 08:30:52,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=651840.0, ans=0.125 2023-11-19 08:31:05,002 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1600, loss[loss=0.08419, simple_loss=0.0977, pruned_loss=0.02413, audio_tagging_loss=0.01121, over 13786.00 frames. ], tot_loss[loss=0.08886, simple_loss=0.1071, pruned_loss=0.02452, audio_tagging_loss=0.0108, over 3048600.66 frames. ], batch size: 55, lr: 7.88e-03, grad_scale: 32.0 2023-11-19 08:31:20,262 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.774e+01 8.197e+01 8.883e+01 9.741e+01 1.380e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-19 08:31:24,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=651973.3333333334, ans=0.125 2023-11-19 08:31:42,716 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=652106.6666666666, ans=0.05 2023-11-19 08:31:44,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.19 vs. limit=15.0 2023-11-19 08:31:50,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=652173.3333333334, ans=0.0 2023-11-19 08:31:58,142 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.76 vs. limit=15.0 2023-11-19 08:32:00,804 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1650, loss[loss=0.07769, simple_loss=0.09714, pruned_loss=0.01863, audio_tagging_loss=0.01049, over 14696.00 frames. ], tot_loss[loss=0.08919, simple_loss=0.1075, pruned_loss=0.02459, audio_tagging_loss=0.01086, over 3049027.36 frames. ], batch size: 56, lr: 7.88e-03, grad_scale: 32.0 2023-11-19 08:32:37,396 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.97 vs. limit=15.0 2023-11-19 08:32:46,112 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.96 vs. limit=10.0 2023-11-19 08:32:50,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=652506.6666666666, ans=0.0 2023-11-19 08:32:55,731 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=652573.3333333334, ans=0.0 2023-11-19 08:32:56,432 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1700, loss[loss=0.0797, simple_loss=0.1078, pruned_loss=0.01592, audio_tagging_loss=0.00987, over 16038.00 frames. ], tot_loss[loss=0.08879, simple_loss=0.1069, pruned_loss=0.02449, audio_tagging_loss=0.01087, over 3051635.19 frames. ], batch size: 57, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:33:06,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=652640.0, ans=0.125 2023-11-19 08:33:13,285 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.943e+01 8.601e+01 9.234e+01 1.036e+02 1.381e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 08:33:15,741 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.66 vs. limit=15.0 2023-11-19 08:33:21,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=652706.6666666666, ans=0.0 2023-11-19 08:33:27,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=652706.6666666666, ans=0.125 2023-11-19 08:33:31,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=652773.3333333334, ans=0.125 2023-11-19 08:33:33,705 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=652773.3333333334, ans=0.0 2023-11-19 08:33:36,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=652773.3333333334, ans=0.125 2023-11-19 08:33:44,517 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.09 vs. limit=22.5 2023-11-19 08:33:48,585 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=256, metric=21.06 vs. limit=22.5 2023-11-19 08:33:52,662 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1750, loss[loss=0.1097, simple_loss=0.1299, pruned_loss=0.03457, audio_tagging_loss=0.01017, over 14537.00 frames. ], tot_loss[loss=0.08921, simple_loss=0.1078, pruned_loss=0.02461, audio_tagging_loss=0.01068, over 3047822.42 frames. ], batch size: 54, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:34:09,312 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=652973.3333333334, ans=0.0 2023-11-19 08:34:16,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=653040.0, ans=0.2 2023-11-19 08:34:35,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=653106.6666666666, ans=0.0 2023-11-19 08:34:38,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=653173.3333333334, ans=0.05 2023-11-19 08:34:43,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=653173.3333333334, ans=0.0 2023-11-19 08:34:48,524 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1800, loss[loss=0.09356, simple_loss=0.1176, pruned_loss=0.02639, audio_tagging_loss=0.00838, over 14399.00 frames. ], tot_loss[loss=0.08939, simple_loss=0.108, pruned_loss=0.02474, audio_tagging_loss=0.01067, over 3047446.94 frames. ], batch size: 55, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:35:05,316 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.029e+01 8.385e+01 9.235e+01 9.785e+01 1.398e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 08:35:06,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=653306.6666666666, ans=0.125 2023-11-19 08:35:10,104 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.98 vs. limit=12.0 2023-11-19 08:35:21,437 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=653440.0, ans=10.0 2023-11-19 08:35:29,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=653440.0, ans=0.125 2023-11-19 08:35:36,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=653506.6666666666, ans=0.2 2023-11-19 08:35:41,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=653506.6666666666, ans=0.0 2023-11-19 08:35:44,912 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1850, loss[loss=0.09121, simple_loss=0.1, pruned_loss=0.02774, audio_tagging_loss=0.01346, over 15467.00 frames. ], tot_loss[loss=0.08902, simple_loss=0.1078, pruned_loss=0.02462, audio_tagging_loss=0.01051, over 3052694.23 frames. ], batch size: 58, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:35:56,545 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2023-11-19 08:36:21,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=653773.3333333334, ans=0.125 2023-11-19 08:36:26,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=653773.3333333334, ans=0.125 2023-11-19 08:36:29,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.63 vs. limit=10.0 2023-11-19 08:36:40,905 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1900, loss[loss=0.07187, simple_loss=0.08486, pruned_loss=0.01997, audio_tagging_loss=0.009474, over 15317.00 frames. ], tot_loss[loss=0.08883, simple_loss=0.1077, pruned_loss=0.02451, audio_tagging_loss=0.01048, over 3052874.80 frames. ], batch size: 59, lr: 7.87e-03, grad_scale: 16.0 2023-11-19 08:36:41,132 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=653906.6666666666, ans=0.125 2023-11-19 08:36:44,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=653906.6666666666, ans=0.0 2023-11-19 08:36:55,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer_ff2.min_abs, batch_count=653973.3333333334, ans=0.1 2023-11-19 08:36:57,971 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.243e+01 8.285e+01 9.091e+01 1.015e+02 1.507e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-19 08:37:05,574 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.91 vs. limit=15.0 2023-11-19 08:37:15,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=654106.6666666666, ans=0.125 2023-11-19 08:37:20,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=654106.6666666666, ans=0.0 2023-11-19 08:37:35,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=654240.0, ans=0.0 2023-11-19 08:37:36,693 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 1950, loss[loss=0.1042, simple_loss=0.129, pruned_loss=0.03019, audio_tagging_loss=0.009487, over 15473.00 frames. ], tot_loss[loss=0.08859, simple_loss=0.1073, pruned_loss=0.02436, audio_tagging_loss=0.01059, over 3055138.81 frames. ], batch size: 57, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:37:36,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=654240.0, ans=0.1 2023-11-19 08:37:50,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=654306.6666666666, ans=0.125 2023-11-19 08:38:13,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=654440.0, ans=0.125 2023-11-19 08:38:26,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=654506.6666666666, ans=0.125 2023-11-19 08:38:32,854 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2000, loss[loss=0.09563, simple_loss=0.1202, pruned_loss=0.02636, audio_tagging_loss=0.009155, over 15684.00 frames. ], tot_loss[loss=0.08842, simple_loss=0.1069, pruned_loss=0.02435, audio_tagging_loss=0.01062, over 3055495.91 frames. ], batch size: 57, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:38:34,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=654573.3333333334, ans=0.04949747468305833 2023-11-19 08:38:38,626 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.30 vs. limit=15.0 2023-11-19 08:38:48,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=654640.0, ans=0.0 2023-11-19 08:38:48,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=654640.0, ans=0.125 2023-11-19 08:38:50,128 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=7.83 vs. limit=15.0 2023-11-19 08:38:51,177 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.460e+01 8.516e+01 9.233e+01 1.009e+02 1.531e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 08:38:58,901 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=654706.6666666666, ans=0.125 2023-11-19 08:38:58,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=654706.6666666666, ans=0.2 2023-11-19 08:39:18,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=8.06 vs. limit=12.0 2023-11-19 08:39:21,259 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=654840.0, ans=0.05 2023-11-19 08:39:28,858 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2050, loss[loss=0.06008, simple_loss=0.07773, pruned_loss=0.01207, audio_tagging_loss=0.009144, over 14292.00 frames. ], tot_loss[loss=0.08804, simple_loss=0.1069, pruned_loss=0.0241, audio_tagging_loss=0.01051, over 3049443.28 frames. ], batch size: 55, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:39:43,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=654973.3333333334, ans=0.125 2023-11-19 08:40:07,035 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=655106.6666666666, ans=0.07 2023-11-19 08:40:18,133 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.80 vs. limit=15.0 2023-11-19 08:40:25,127 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2100, loss[loss=0.08312, simple_loss=0.09714, pruned_loss=0.02171, audio_tagging_loss=0.01285, over 15060.00 frames. ], tot_loss[loss=0.08871, simple_loss=0.1076, pruned_loss=0.0244, audio_tagging_loss=0.01052, over 3053566.82 frames. ], batch size: 59, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:40:42,514 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.190e+01 8.322e+01 9.112e+01 9.913e+01 1.952e+02, threshold=1.822e+02, percent-clipped=1.0 2023-11-19 08:40:52,046 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.10 vs. limit=15.0 2023-11-19 08:41:20,754 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2150, loss[loss=0.09935, simple_loss=0.1211, pruned_loss=0.02923, audio_tagging_loss=0.009572, over 14547.00 frames. ], tot_loss[loss=0.08809, simple_loss=0.1068, pruned_loss=0.02417, audio_tagging_loss=0.01052, over 3044441.52 frames. ], batch size: 55, lr: 7.86e-03, grad_scale: 16.0 2023-11-19 08:41:20,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=655573.3333333334, ans=0.1 2023-11-19 08:41:26,355 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=655573.3333333334, ans=0.125 2023-11-19 08:41:32,029 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.06 vs. limit=6.0 2023-11-19 08:41:43,305 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=655706.6666666666, ans=0.2 2023-11-19 08:41:48,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=655706.6666666666, ans=0.125 2023-11-19 08:41:53,794 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:41:56,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=655773.3333333334, ans=0.2 2023-11-19 08:41:58,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_na.min_abs, batch_count=655773.3333333334, ans=0.02 2023-11-19 08:42:06,846 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:42:16,757 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2200, loss[loss=0.09447, simple_loss=0.1131, pruned_loss=0.0294, audio_tagging_loss=0.008507, over 14401.00 frames. ], tot_loss[loss=0.08835, simple_loss=0.1069, pruned_loss=0.02427, audio_tagging_loss=0.0106, over 3038061.25 frames. ], batch size: 55, lr: 7.85e-03, grad_scale: 16.0 2023-11-19 08:42:28,130 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=655973.3333333334, ans=0.0 2023-11-19 08:42:32,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=655973.3333333334, ans=0.0 2023-11-19 08:42:34,507 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.518e+01 8.336e+01 8.931e+01 9.747e+01 1.930e+02, threshold=1.786e+02, percent-clipped=1.0 2023-11-19 08:42:35,742 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=655973.3333333334, ans=0.125 2023-11-19 08:42:36,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=655973.3333333334, ans=0.125 2023-11-19 08:42:42,753 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.84 vs. limit=12.0 2023-11-19 08:42:52,537 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=656106.6666666666, ans=0.2 2023-11-19 08:43:12,648 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2250, loss[loss=0.06768, simple_loss=0.07739, pruned_loss=0.01589, audio_tagging_loss=0.01309, over 15300.00 frames. ], tot_loss[loss=0.08874, simple_loss=0.1071, pruned_loss=0.02449, audio_tagging_loss=0.01071, over 3030983.35 frames. ], batch size: 57, lr: 7.85e-03, grad_scale: 16.0 2023-11-19 08:43:22,557 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=656306.6666666666, ans=0.125 2023-11-19 08:43:31,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=656306.6666666666, ans=0.0 2023-11-19 08:43:39,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=656373.3333333334, ans=0.1 2023-11-19 08:43:55,730 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=656440.0, ans=0.0 2023-11-19 08:44:08,703 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2300, loss[loss=0.0757, simple_loss=0.1042, pruned_loss=0.01469, audio_tagging_loss=0.008914, over 16509.00 frames. ], tot_loss[loss=0.08786, simple_loss=0.106, pruned_loss=0.02408, audio_tagging_loss=0.01077, over 3038642.40 frames. ], batch size: 60, lr: 7.85e-03, grad_scale: 16.0 2023-11-19 08:44:12,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=4.21 vs. limit=12.0 2023-11-19 08:44:20,275 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.55 vs. limit=10.0 2023-11-19 08:44:26,704 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.756e+01 8.387e+01 9.079e+01 9.954e+01 1.397e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-19 08:44:43,566 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=656773.3333333334, ans=0.125 2023-11-19 08:44:57,978 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:45:04,359 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2350, loss[loss=0.126, simple_loss=0.1477, pruned_loss=0.0409, audio_tagging_loss=0.01119, over 15843.00 frames. ], tot_loss[loss=0.08838, simple_loss=0.1065, pruned_loss=0.02427, audio_tagging_loss=0.01084, over 3041823.67 frames. ], batch size: 56, lr: 7.85e-03, grad_scale: 16.0 2023-11-19 08:45:27,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=657040.0, ans=0.125 2023-11-19 08:45:27,837 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=657040.0, ans=0.125 2023-11-19 08:45:33,594 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=657040.0, ans=0.1 2023-11-19 08:45:43,172 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=9.47 vs. limit=15.0 2023-11-19 08:45:43,603 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=657106.6666666666, ans=0.1 2023-11-19 08:45:43,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=657106.6666666666, ans=0.125 2023-11-19 08:46:00,325 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2400, loss[loss=0.08754, simple_loss=0.09959, pruned_loss=0.02487, audio_tagging_loss=0.01287, over 15156.00 frames. ], tot_loss[loss=0.0892, simple_loss=0.1076, pruned_loss=0.02451, audio_tagging_loss=0.01086, over 3048291.53 frames. ], batch size: 59, lr: 7.85e-03, grad_scale: 32.0 2023-11-19 08:46:05,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=657240.0, ans=0.1 2023-11-19 08:46:17,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=657306.6666666666, ans=0.125 2023-11-19 08:46:17,889 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.290e+01 8.346e+01 9.299e+01 9.977e+01 1.391e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 08:46:30,804 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=12.07 vs. limit=15.0 2023-11-19 08:46:34,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=657440.0, ans=0.0 2023-11-19 08:46:36,493 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.36 vs. limit=10.0 2023-11-19 08:46:37,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=657440.0, ans=0.125 2023-11-19 08:46:38,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=657440.0, ans=0.125 2023-11-19 08:46:47,550 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=657506.6666666666, ans=0.125 2023-11-19 08:46:56,228 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2450, loss[loss=0.09405, simple_loss=0.1168, pruned_loss=0.02338, audio_tagging_loss=0.01229, over 15339.00 frames. ], tot_loss[loss=0.08806, simple_loss=0.1061, pruned_loss=0.02399, audio_tagging_loss=0.011, over 3036727.85 frames. ], batch size: 58, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:47:29,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=657773.3333333334, ans=0.1 2023-11-19 08:47:51,716 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2500, loss[loss=0.08762, simple_loss=0.1024, pruned_loss=0.02358, audio_tagging_loss=0.01284, over 16217.00 frames. ], tot_loss[loss=0.08794, simple_loss=0.1062, pruned_loss=0.02392, audio_tagging_loss=0.01094, over 3046773.91 frames. ], batch size: 60, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:48:03,457 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=657973.3333333334, ans=0.1 2023-11-19 08:48:09,791 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.559e+01 8.238e+01 9.130e+01 9.958e+01 1.355e+02, threshold=1.826e+02, percent-clipped=0.0 2023-11-19 08:48:22,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=658040.0, ans=0.125 2023-11-19 08:48:34,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.90 vs. limit=10.0 2023-11-19 08:48:48,248 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2550, loss[loss=0.08, simple_loss=0.09333, pruned_loss=0.02273, audio_tagging_loss=0.01061, over 15836.00 frames. ], tot_loss[loss=0.08804, simple_loss=0.1062, pruned_loss=0.02412, audio_tagging_loss=0.01082, over 3053901.35 frames. ], batch size: 59, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:48:53,927 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 08:49:02,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=658306.6666666666, ans=0.0 2023-11-19 08:49:04,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=658306.6666666666, ans=0.125 2023-11-19 08:49:19,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=658373.3333333334, ans=0.125 2023-11-19 08:49:28,322 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=658440.0, ans=0.1 2023-11-19 08:49:37,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=658506.6666666666, ans=0.125 2023-11-19 08:49:43,991 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2600, loss[loss=0.08867, simple_loss=0.1072, pruned_loss=0.02631, audio_tagging_loss=0.008785, over 14263.00 frames. ], tot_loss[loss=0.08725, simple_loss=0.1052, pruned_loss=0.02387, audio_tagging_loss=0.01077, over 3046928.80 frames. ], batch size: 53, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:49:46,876 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=658573.3333333334, ans=0.125 2023-11-19 08:50:01,805 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.363e+01 8.704e+01 9.558e+01 1.082e+02 2.151e+02, threshold=1.912e+02, percent-clipped=1.0 2023-11-19 08:50:23,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=658773.3333333334, ans=0.05 2023-11-19 08:50:28,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=658840.0, ans=0.1 2023-11-19 08:50:36,060 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=658840.0, ans=0.0 2023-11-19 08:50:40,081 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2650, loss[loss=0.07327, simple_loss=0.08785, pruned_loss=0.01881, audio_tagging_loss=0.01054, over 16889.00 frames. ], tot_loss[loss=0.08748, simple_loss=0.106, pruned_loss=0.0239, audio_tagging_loss=0.0106, over 3048948.10 frames. ], batch size: 65, lr: 7.84e-03, grad_scale: 32.0 2023-11-19 08:50:40,570 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.90 vs. limit=22.5 2023-11-19 08:50:41,715 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.56 vs. limit=15.0 2023-11-19 08:50:42,814 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.99 vs. limit=12.0 2023-11-19 08:51:06,335 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=659040.0, ans=0.2 2023-11-19 08:51:07,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=659040.0, ans=0.2 2023-11-19 08:51:31,755 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.14 vs. limit=15.0 2023-11-19 08:51:36,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=659240.0, ans=0.0 2023-11-19 08:51:36,941 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2700, loss[loss=0.08446, simple_loss=0.1028, pruned_loss=0.02144, audio_tagging_loss=0.01165, over 15877.00 frames. ], tot_loss[loss=0.08796, simple_loss=0.1068, pruned_loss=0.024, audio_tagging_loss=0.01058, over 3051258.23 frames. ], batch size: 59, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:51:37,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=659240.0, ans=0.0 2023-11-19 08:51:46,591 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=659306.6666666666, ans=0.125 2023-11-19 08:51:50,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=659306.6666666666, ans=0.125 2023-11-19 08:51:54,344 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.151e+01 8.620e+01 9.365e+01 1.021e+02 1.535e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-19 08:52:00,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=659373.3333333334, ans=0.0 2023-11-19 08:52:04,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=659373.3333333334, ans=0.125 2023-11-19 08:52:10,348 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.52 vs. limit=15.0 2023-11-19 08:52:21,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=659506.6666666666, ans=0.0 2023-11-19 08:52:25,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=659506.6666666666, ans=0.0 2023-11-19 08:52:27,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=659506.6666666666, ans=0.125 2023-11-19 08:52:31,920 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2750, loss[loss=0.06707, simple_loss=0.07905, pruned_loss=0.01667, audio_tagging_loss=0.01087, over 15728.00 frames. ], tot_loss[loss=0.08752, simple_loss=0.1061, pruned_loss=0.02393, audio_tagging_loss=0.01053, over 3053165.21 frames. ], batch size: 59, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:52:42,172 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=659640.0, ans=0.0 2023-11-19 08:53:08,055 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.64 vs. limit=10.0 2023-11-19 08:53:18,440 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:53:24,867 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=659840.0, ans=0.1 2023-11-19 08:53:26,882 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2800, loss[loss=0.09877, simple_loss=0.1207, pruned_loss=0.02961, audio_tagging_loss=0.008805, over 14905.00 frames. ], tot_loss[loss=0.08806, simple_loss=0.1068, pruned_loss=0.02422, audio_tagging_loss=0.01042, over 3055979.40 frames. ], batch size: 57, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:53:36,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=659906.6666666666, ans=0.1 2023-11-19 08:53:37,659 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.45 vs. limit=15.0 2023-11-19 08:53:45,302 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.607e+01 8.203e+01 8.899e+01 9.500e+01 1.297e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 08:54:00,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=660106.6666666666, ans=0.1 2023-11-19 08:54:06,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=660106.6666666666, ans=0.125 2023-11-19 08:54:22,320 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2850, loss[loss=0.08403, simple_loss=0.1046, pruned_loss=0.02038, audio_tagging_loss=0.01137, over 15068.00 frames. ], tot_loss[loss=0.08728, simple_loss=0.106, pruned_loss=0.02376, audio_tagging_loss=0.01051, over 3054185.11 frames. ], batch size: 55, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:54:28,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=660240.0, ans=0.0 2023-11-19 08:54:43,462 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=660306.6666666666, ans=0.1 2023-11-19 08:54:53,871 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=660373.3333333334, ans=0.125 2023-11-19 08:55:03,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=660440.0, ans=0.125 2023-11-19 08:55:18,145 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.22 vs. limit=15.0 2023-11-19 08:55:18,651 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2900, loss[loss=0.05225, simple_loss=0.06394, pruned_loss=0.01008, audio_tagging_loss=0.01019, over 15296.00 frames. ], tot_loss[loss=0.08793, simple_loss=0.1067, pruned_loss=0.02408, audio_tagging_loss=0.01048, over 3055643.82 frames. ], batch size: 60, lr: 7.83e-03, grad_scale: 32.0 2023-11-19 08:55:28,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=660573.3333333334, ans=0.125 2023-11-19 08:55:32,207 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=660640.0, ans=0.0 2023-11-19 08:55:36,362 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.241e+01 8.534e+01 9.333e+01 1.020e+02 1.874e+02, threshold=1.867e+02, percent-clipped=1.0 2023-11-19 08:55:57,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=660773.3333333334, ans=0.125 2023-11-19 08:56:10,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=660840.0, ans=0.0 2023-11-19 08:56:14,633 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 2950, loss[loss=0.1214, simple_loss=0.1549, pruned_loss=0.03775, audio_tagging_loss=0.006177, over 15444.00 frames. ], tot_loss[loss=0.0884, simple_loss=0.1071, pruned_loss=0.02432, audio_tagging_loss=0.01052, over 3056723.45 frames. ], batch size: 57, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 08:56:15,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=660906.6666666666, ans=0.125 2023-11-19 08:56:20,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=660906.6666666666, ans=0.1 2023-11-19 08:56:30,710 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.09 vs. limit=10.0 2023-11-19 08:56:42,667 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.40 vs. limit=15.0 2023-11-19 08:56:46,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=661040.0, ans=0.125 2023-11-19 08:57:06,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=661173.3333333334, ans=0.125 2023-11-19 08:57:10,812 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3000, loss[loss=0.1072, simple_loss=0.128, pruned_loss=0.02793, audio_tagging_loss=0.0153, over 15680.00 frames. ], tot_loss[loss=0.08867, simple_loss=0.1072, pruned_loss=0.0244, audio_tagging_loss=0.01067, over 3055019.54 frames. ], batch size: 57, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 08:57:10,814 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 08:57:34,640 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.0.layers.1.self_attn_weights, attn_weights_entropy = tensor([5.7344, 5.7622, 5.8192, 5.8590], device='cuda:0') 2023-11-19 08:57:44,074 INFO [train_asr.py:1147] (0/4) Epoch 9, validation: loss=0.06604, simple_loss=0.05618, pruned_loss=0.006775, audio_tagging_loss=0.03117, over 4681554.00 frames. 2023-11-19 08:57:44,074 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 08:57:53,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=661306.6666666666, ans=0.1 2023-11-19 08:58:01,253 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.176e+01 8.862e+01 9.645e+01 1.062e+02 1.575e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-19 08:58:17,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=661440.0, ans=0.0 2023-11-19 08:58:21,592 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.63 vs. limit=15.0 2023-11-19 08:58:34,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=661506.6666666666, ans=0.5 2023-11-19 08:58:35,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=661506.6666666666, ans=0.1 2023-11-19 08:58:39,500 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3050, loss[loss=0.1039, simple_loss=0.1264, pruned_loss=0.03111, audio_tagging_loss=0.009531, over 15223.00 frames. ], tot_loss[loss=0.08885, simple_loss=0.1074, pruned_loss=0.02442, audio_tagging_loss=0.01071, over 3053827.63 frames. ], batch size: 58, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 08:59:02,531 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.32 vs. limit=22.5 2023-11-19 08:59:04,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=661706.6666666666, ans=0.1 2023-11-19 08:59:09,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=661706.6666666666, ans=0.2 2023-11-19 08:59:11,037 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=661706.6666666666, ans=0.0 2023-11-19 08:59:11,560 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.32 vs. limit=10.0 2023-11-19 08:59:12,952 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 08:59:25,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=661840.0, ans=0.125 2023-11-19 08:59:33,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=661840.0, ans=0.125 2023-11-19 08:59:36,840 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3100, loss[loss=0.1208, simple_loss=0.1475, pruned_loss=0.03633, audio_tagging_loss=0.01067, over 15724.00 frames. ], tot_loss[loss=0.08932, simple_loss=0.1081, pruned_loss=0.02459, audio_tagging_loss=0.01068, over 3052078.10 frames. ], batch size: 58, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 08:59:54,980 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.318e+01 8.514e+01 9.258e+01 1.059e+02 1.772e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-19 08:59:58,748 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.08 vs. limit=15.0 2023-11-19 09:00:01,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=662040.0, ans=0.2 2023-11-19 09:00:06,063 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.70 vs. limit=15.0 2023-11-19 09:00:11,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten.whitening_limit, batch_count=662106.6666666666, ans=15.0 2023-11-19 09:00:28,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=662173.3333333334, ans=0.0 2023-11-19 09:00:32,265 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3150, loss[loss=0.08243, simple_loss=0.1052, pruned_loss=0.02096, audio_tagging_loss=0.008897, over 15446.00 frames. ], tot_loss[loss=0.0894, simple_loss=0.1084, pruned_loss=0.02453, audio_tagging_loss=0.01067, over 3053745.02 frames. ], batch size: 56, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 09:00:35,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.56 vs. limit=15.0 2023-11-19 09:00:37,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.max_abs, batch_count=662240.0, ans=10.0 2023-11-19 09:00:58,458 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:01:06,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=662440.0, ans=0.125 2023-11-19 09:01:28,538 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3200, loss[loss=0.08918, simple_loss=0.1117, pruned_loss=0.02175, audio_tagging_loss=0.0116, over 14762.00 frames. ], tot_loss[loss=0.08857, simple_loss=0.1073, pruned_loss=0.02414, audio_tagging_loss=0.01078, over 3053114.00 frames. ], batch size: 56, lr: 7.82e-03, grad_scale: 32.0 2023-11-19 09:01:37,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=662573.3333333334, ans=0.95 2023-11-19 09:01:46,781 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.390e+01 8.354e+01 9.192e+01 1.025e+02 1.348e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 09:02:24,477 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3250, loss[loss=0.102, simple_loss=0.128, pruned_loss=0.02913, audio_tagging_loss=0.008861, over 15683.00 frames. ], tot_loss[loss=0.08809, simple_loss=0.1067, pruned_loss=0.02397, audio_tagging_loss=0.01078, over 3047326.70 frames. ], batch size: 58, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:02:39,331 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.15 vs. limit=6.0 2023-11-19 09:02:50,097 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.80 vs. limit=10.0 2023-11-19 09:03:20,476 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3300, loss[loss=0.0903, simple_loss=0.1083, pruned_loss=0.02351, audio_tagging_loss=0.01262, over 15221.00 frames. ], tot_loss[loss=0.08754, simple_loss=0.106, pruned_loss=0.02364, audio_tagging_loss=0.0109, over 3058265.90 frames. ], batch size: 54, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:03:38,458 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.177e+01 8.893e+01 9.711e+01 1.106e+02 1.510e+02, threshold=1.942e+02, percent-clipped=0.0 2023-11-19 09:03:41,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=663373.3333333334, ans=0.125 2023-11-19 09:03:43,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=663373.3333333334, ans=0.0 2023-11-19 09:03:54,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=663440.0, ans=0.1 2023-11-19 09:04:06,313 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=663506.6666666666, ans=0.0 2023-11-19 09:04:08,993 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.min_abs, batch_count=663506.6666666666, ans=0.5 2023-11-19 09:04:16,815 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3350, loss[loss=0.1049, simple_loss=0.1367, pruned_loss=0.02619, audio_tagging_loss=0.01037, over 14646.00 frames. ], tot_loss[loss=0.0876, simple_loss=0.1057, pruned_loss=0.02394, audio_tagging_loss=0.0108, over 3057964.65 frames. ], batch size: 54, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:04:18,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=663573.3333333334, ans=0.125 2023-11-19 09:04:22,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=663573.3333333334, ans=0.125 2023-11-19 09:04:40,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=663706.6666666666, ans=0.0 2023-11-19 09:04:40,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=663706.6666666666, ans=0.125 2023-11-19 09:04:54,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=663773.3333333334, ans=0.1 2023-11-19 09:05:09,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=663840.0, ans=0.125 2023-11-19 09:05:12,840 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3400, loss[loss=0.09597, simple_loss=0.1224, pruned_loss=0.02578, audio_tagging_loss=0.008967, over 15467.00 frames. ], tot_loss[loss=0.08794, simple_loss=0.1064, pruned_loss=0.02406, audio_tagging_loss=0.01068, over 3057471.69 frames. ], batch size: 57, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:05:15,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=663906.6666666666, ans=0.125 2023-11-19 09:05:26,839 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.29 vs. limit=6.0 2023-11-19 09:05:28,089 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.68 vs. limit=12.0 2023-11-19 09:05:29,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=663973.3333333334, ans=0.125 2023-11-19 09:05:29,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=663973.3333333334, ans=0.2 2023-11-19 09:05:30,639 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.572e+01 8.652e+01 9.509e+01 1.042e+02 1.757e+02, threshold=1.902e+02, percent-clipped=0.0 2023-11-19 09:05:41,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=664040.0, ans=0.125 2023-11-19 09:05:54,808 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.38 vs. limit=15.0 2023-11-19 09:05:59,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=664173.3333333334, ans=0.125 2023-11-19 09:06:01,659 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=664173.3333333334, ans=0.125 2023-11-19 09:06:08,681 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3450, loss[loss=0.1015, simple_loss=0.1182, pruned_loss=0.03017, audio_tagging_loss=0.01223, over 17417.00 frames. ], tot_loss[loss=0.08877, simple_loss=0.1077, pruned_loss=0.02436, audio_tagging_loss=0.01055, over 3060978.58 frames. ], batch size: 65, lr: 7.81e-03, grad_scale: 32.0 2023-11-19 09:06:15,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=664240.0, ans=0.05 2023-11-19 09:06:25,986 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=664306.6666666666, ans=0.125 2023-11-19 09:06:38,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=6.82 vs. limit=10.0 2023-11-19 09:07:04,549 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3500, loss[loss=0.0859, simple_loss=0.1024, pruned_loss=0.02326, audio_tagging_loss=0.01146, over 14726.00 frames. ], tot_loss[loss=0.08816, simple_loss=0.1068, pruned_loss=0.02414, audio_tagging_loss=0.01062, over 3068047.18 frames. ], batch size: 56, lr: 7.80e-03, grad_scale: 32.0 2023-11-19 09:07:18,827 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.76 vs. limit=15.0 2023-11-19 09:07:21,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=664640.0, ans=0.125 2023-11-19 09:07:22,600 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.188e+01 8.336e+01 8.969e+01 9.703e+01 1.247e+02, threshold=1.794e+02, percent-clipped=0.0 2023-11-19 09:07:32,802 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:07:33,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.99 vs. limit=15.0 2023-11-19 09:07:48,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=664840.0, ans=0.125 2023-11-19 09:07:49,506 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:07:55,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=664840.0, ans=0.0 2023-11-19 09:07:56,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=664840.0, ans=0.125 2023-11-19 09:07:59,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=664840.0, ans=0.125 2023-11-19 09:08:00,888 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3550, loss[loss=0.08827, simple_loss=0.1096, pruned_loss=0.02469, audio_tagging_loss=0.008781, over 14755.00 frames. ], tot_loss[loss=0.08808, simple_loss=0.1067, pruned_loss=0.0241, audio_tagging_loss=0.01062, over 3057372.92 frames. ], batch size: 55, lr: 7.80e-03, grad_scale: 32.0 2023-11-19 09:08:05,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=664906.6666666666, ans=0.125 2023-11-19 09:08:12,182 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=664973.3333333334, ans=0.2 2023-11-19 09:08:16,481 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=1.586e-01 2023-11-19 09:08:36,653 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=665106.6666666666, ans=0.2 2023-11-19 09:08:49,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=665173.3333333334, ans=0.2 2023-11-19 09:08:56,457 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3600, loss[loss=0.07747, simple_loss=0.09908, pruned_loss=0.02027, audio_tagging_loss=0.007659, over 14742.00 frames. ], tot_loss[loss=0.08732, simple_loss=0.1057, pruned_loss=0.02384, audio_tagging_loss=0.01063, over 3050917.12 frames. ], batch size: 54, lr: 7.80e-03, grad_scale: 32.0 2023-11-19 09:09:05,923 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.13 vs. limit=15.0 2023-11-19 09:09:15,142 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.775e+01 8.230e+01 8.845e+01 9.597e+01 1.384e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 09:09:33,257 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=665440.0, ans=0.2 2023-11-19 09:09:34,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=665440.0, ans=0.0 2023-11-19 09:09:37,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=665440.0, ans=0.0 2023-11-19 09:09:42,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=665506.6666666666, ans=0.2 2023-11-19 09:09:44,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=665506.6666666666, ans=0.2 2023-11-19 09:09:52,792 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3650, loss[loss=0.07894, simple_loss=0.09346, pruned_loss=0.02193, audio_tagging_loss=0.01028, over 14203.00 frames. ], tot_loss[loss=0.08715, simple_loss=0.1048, pruned_loss=0.02408, audio_tagging_loss=0.01067, over 3054375.55 frames. ], batch size: 54, lr: 7.80e-03, grad_scale: 16.0 2023-11-19 09:10:24,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=665706.6666666666, ans=0.125 2023-11-19 09:10:25,567 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.92 vs. limit=22.5 2023-11-19 09:10:27,327 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=665773.3333333334, ans=0.2 2023-11-19 09:10:37,803 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.84 vs. limit=12.0 2023-11-19 09:10:42,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=665840.0, ans=0.125 2023-11-19 09:10:42,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=665840.0, ans=0.125 2023-11-19 09:10:47,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=665906.6666666666, ans=0.0 2023-11-19 09:10:48,460 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3700, loss[loss=0.09322, simple_loss=0.1099, pruned_loss=0.02639, audio_tagging_loss=0.01187, over 16534.00 frames. ], tot_loss[loss=0.08784, simple_loss=0.1059, pruned_loss=0.02428, audio_tagging_loss=0.01059, over 3053081.32 frames. ], batch size: 62, lr: 7.80e-03, grad_scale: 16.0 2023-11-19 09:10:50,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=665906.6666666666, ans=0.2 2023-11-19 09:10:54,369 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.06 vs. limit=6.0 2023-11-19 09:11:01,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=665973.3333333334, ans=0.09899494936611666 2023-11-19 09:11:04,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.dropout.p, batch_count=665973.3333333334, ans=0.1 2023-11-19 09:11:06,729 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.089e+01 8.367e+01 9.183e+01 1.016e+02 1.508e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-19 09:11:12,819 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=1.092e-01 2023-11-19 09:11:20,261 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.2.self_attn_weights, loss-sum=5.110e-03 2023-11-19 09:11:43,399 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3750, loss[loss=0.07556, simple_loss=0.07526, pruned_loss=0.02043, audio_tagging_loss=0.0175, over 14383.00 frames. ], tot_loss[loss=0.08886, simple_loss=0.1072, pruned_loss=0.02465, audio_tagging_loss=0.01061, over 3060458.68 frames. ], batch size: 56, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:11:59,885 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=666306.6666666666, ans=0.0 2023-11-19 09:12:13,793 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=666373.3333333334, ans=0.125 2023-11-19 09:12:20,552 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:12:22,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=666440.0, ans=0.2 2023-11-19 09:12:39,255 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3800, loss[loss=0.07584, simple_loss=0.09408, pruned_loss=0.01846, audio_tagging_loss=0.01033, over 14453.00 frames. ], tot_loss[loss=0.08857, simple_loss=0.1068, pruned_loss=0.0244, audio_tagging_loss=0.01076, over 3063135.11 frames. ], batch size: 55, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:12:43,227 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.25 vs. limit=15.0 2023-11-19 09:12:53,991 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-100000.pt 2023-11-19 09:12:58,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=666640.0, ans=0.2 2023-11-19 09:12:59,140 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=666640.0, ans=0.1 2023-11-19 09:13:01,080 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.623e+01 8.849e+01 9.395e+01 1.021e+02 1.360e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 09:13:03,970 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=666706.6666666666, ans=0.2 2023-11-19 09:13:27,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=666840.0, ans=0.0 2023-11-19 09:13:30,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=666840.0, ans=0.0 2023-11-19 09:13:37,582 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3850, loss[loss=0.09456, simple_loss=0.1145, pruned_loss=0.02473, audio_tagging_loss=0.01258, over 14444.00 frames. ], tot_loss[loss=0.08896, simple_loss=0.1076, pruned_loss=0.0244, audio_tagging_loss=0.01074, over 3060851.06 frames. ], batch size: 55, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:13:44,900 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=666906.6666666666, ans=0.0 2023-11-19 09:14:13,615 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=667106.6666666666, ans=0.0 2023-11-19 09:14:33,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=667240.0, ans=0.0 2023-11-19 09:14:34,208 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3900, loss[loss=0.0637, simple_loss=0.07637, pruned_loss=0.01542, audio_tagging_loss=0.0101, over 14938.00 frames. ], tot_loss[loss=0.0896, simple_loss=0.1083, pruned_loss=0.02471, audio_tagging_loss=0.01071, over 3050521.63 frames. ], batch size: 55, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:14:34,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=667240.0, ans=0.125 2023-11-19 09:14:35,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=667240.0, ans=0.2 2023-11-19 09:14:52,706 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.594e+01 8.271e+01 8.958e+01 9.768e+01 1.292e+02, threshold=1.792e+02, percent-clipped=0.0 2023-11-19 09:15:10,432 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.02 vs. limit=12.0 2023-11-19 09:15:11,712 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=667440.0, ans=0.125 2023-11-19 09:15:15,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=667440.0, ans=0.1 2023-11-19 09:15:22,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=667506.6666666666, ans=0.125 2023-11-19 09:15:30,405 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 3950, loss[loss=0.1143, simple_loss=0.1297, pruned_loss=0.03746, audio_tagging_loss=0.01199, over 15781.00 frames. ], tot_loss[loss=0.08916, simple_loss=0.1077, pruned_loss=0.02457, audio_tagging_loss=0.01076, over 3046279.52 frames. ], batch size: 59, lr: 7.79e-03, grad_scale: 16.0 2023-11-19 09:15:44,338 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=667640.0, ans=0.0 2023-11-19 09:16:06,844 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=667773.3333333334, ans=0.0 2023-11-19 09:16:25,231 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4000, loss[loss=0.0706, simple_loss=0.07848, pruned_loss=0.01741, audio_tagging_loss=0.01395, over 15040.00 frames. ], tot_loss[loss=0.089, simple_loss=0.1071, pruned_loss=0.02456, audio_tagging_loss=0.01089, over 3043364.74 frames. ], batch size: 57, lr: 7.78e-03, grad_scale: 32.0 2023-11-19 09:16:28,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=667906.6666666666, ans=0.0 2023-11-19 09:16:33,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=667906.6666666666, ans=0.125 2023-11-19 09:16:45,244 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.201e+01 8.595e+01 9.425e+01 1.030e+02 1.465e+02, threshold=1.885e+02, percent-clipped=0.0 2023-11-19 09:17:02,845 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=668106.6666666666, ans=0.035 2023-11-19 09:17:16,184 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=668173.3333333334, ans=0.0 2023-11-19 09:17:19,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=668173.3333333334, ans=0.0 2023-11-19 09:17:21,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=668240.0, ans=0.0 2023-11-19 09:17:22,414 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4050, loss[loss=0.08208, simple_loss=0.09054, pruned_loss=0.02134, audio_tagging_loss=0.01547, over 15207.00 frames. ], tot_loss[loss=0.08971, simple_loss=0.1079, pruned_loss=0.02488, audio_tagging_loss=0.01087, over 3045643.11 frames. ], batch size: 58, lr: 7.78e-03, grad_scale: 32.0 2023-11-19 09:17:23,542 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:17:41,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=668306.6666666666, ans=0.2 2023-11-19 09:17:48,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=668373.3333333334, ans=0.125 2023-11-19 09:17:59,111 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=8.39 vs. limit=15.0 2023-11-19 09:18:02,397 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=668440.0, ans=0.0 2023-11-19 09:18:12,522 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=668506.6666666666, ans=0.0 2023-11-19 09:18:17,769 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4100, loss[loss=0.09822, simple_loss=0.1227, pruned_loss=0.028, audio_tagging_loss=0.008851, over 15353.00 frames. ], tot_loss[loss=0.09058, simple_loss=0.1096, pruned_loss=0.02509, audio_tagging_loss=0.01067, over 3042348.88 frames. ], batch size: 55, lr: 7.78e-03, grad_scale: 16.0 2023-11-19 09:18:23,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=668573.3333333334, ans=0.125 2023-11-19 09:18:27,189 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.62 vs. limit=15.0 2023-11-19 09:18:34,632 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=668640.0, ans=0.0 2023-11-19 09:18:37,942 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.064e+01 8.527e+01 9.124e+01 9.950e+01 1.525e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 09:18:49,436 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.56 vs. limit=15.0 2023-11-19 09:18:51,459 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=13.11 vs. limit=15.0 2023-11-19 09:19:03,079 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=668840.0, ans=0.0 2023-11-19 09:19:13,573 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4150, loss[loss=0.07203, simple_loss=0.0926, pruned_loss=0.01448, audio_tagging_loss=0.01125, over 14512.00 frames. ], tot_loss[loss=0.08953, simple_loss=0.1084, pruned_loss=0.02481, audio_tagging_loss=0.01052, over 3044915.63 frames. ], batch size: 56, lr: 7.78e-03, grad_scale: 16.0 2023-11-19 09:19:22,090 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.13 vs. limit=15.0 2023-11-19 09:19:34,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=668973.3333333334, ans=0.125 2023-11-19 09:19:34,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=668973.3333333334, ans=0.125 2023-11-19 09:19:35,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=669040.0, ans=0.125 2023-11-19 09:19:53,155 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:19:55,444 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=669106.6666666666, ans=0.1 2023-11-19 09:19:59,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=669173.3333333334, ans=0.2 2023-11-19 09:20:10,265 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4200, loss[loss=0.08564, simple_loss=0.1057, pruned_loss=0.02253, audio_tagging_loss=0.01026, over 15389.00 frames. ], tot_loss[loss=0.08846, simple_loss=0.1072, pruned_loss=0.02448, audio_tagging_loss=0.01039, over 3046424.51 frames. ], batch size: 57, lr: 7.78e-03, grad_scale: 16.0 2023-11-19 09:20:30,746 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.637e+01 8.694e+01 9.811e+01 1.116e+02 1.412e+02, threshold=1.962e+02, percent-clipped=0.0 2023-11-19 09:21:06,500 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4250, loss[loss=0.07049, simple_loss=0.08784, pruned_loss=0.0155, audio_tagging_loss=0.01106, over 16715.00 frames. ], tot_loss[loss=0.08839, simple_loss=0.1072, pruned_loss=0.02439, audio_tagging_loss=0.01042, over 3050518.12 frames. ], batch size: 63, lr: 7.77e-03, grad_scale: 16.0 2023-11-19 09:21:11,041 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.70 vs. limit=15.0 2023-11-19 09:21:15,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=669573.3333333334, ans=0.125 2023-11-19 09:21:17,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=669640.0, ans=0.0 2023-11-19 09:21:19,127 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=669640.0, ans=0.125 2023-11-19 09:21:33,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=669706.6666666666, ans=0.125 2023-11-19 09:21:41,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=669773.3333333334, ans=0.125 2023-11-19 09:21:43,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.95 vs. limit=15.0 2023-11-19 09:21:51,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=669840.0, ans=0.09899494936611666 2023-11-19 09:21:56,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=669840.0, ans=0.125 2023-11-19 09:21:58,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=669840.0, ans=0.07 2023-11-19 09:22:02,689 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4300, loss[loss=0.09428, simple_loss=0.1172, pruned_loss=0.02496, audio_tagging_loss=0.01071, over 14953.00 frames. ], tot_loss[loss=0.08917, simple_loss=0.1086, pruned_loss=0.02461, audio_tagging_loss=0.01028, over 3045185.65 frames. ], batch size: 54, lr: 7.77e-03, grad_scale: 16.0 2023-11-19 09:22:22,808 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.156e+01 8.641e+01 9.552e+01 1.070e+02 1.517e+02, threshold=1.910e+02, percent-clipped=0.0 2023-11-19 09:22:25,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=670040.0, ans=0.125 2023-11-19 09:22:35,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=670106.6666666666, ans=0.2 2023-11-19 09:22:58,318 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4350, loss[loss=0.09137, simple_loss=0.1143, pruned_loss=0.02527, audio_tagging_loss=0.008962, over 15041.00 frames. ], tot_loss[loss=0.0889, simple_loss=0.1081, pruned_loss=0.02455, audio_tagging_loss=0.0103, over 3042488.84 frames. ], batch size: 57, lr: 7.77e-03, grad_scale: 16.0 2023-11-19 09:23:03,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=670240.0, ans=0.0 2023-11-19 09:23:11,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=670306.6666666666, ans=0.125 2023-11-19 09:23:48,576 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.74 vs. limit=15.0 2023-11-19 09:23:51,194 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:23:51,504 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=3.16 vs. limit=15.0 2023-11-19 09:23:54,176 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4400, loss[loss=0.09452, simple_loss=0.1079, pruned_loss=0.02747, audio_tagging_loss=0.0131, over 13872.00 frames. ], tot_loss[loss=0.08876, simple_loss=0.1079, pruned_loss=0.02451, audio_tagging_loss=0.01031, over 3041761.99 frames. ], batch size: 54, lr: 7.77e-03, grad_scale: 32.0 2023-11-19 09:23:56,849 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=8.38 vs. limit=15.0 2023-11-19 09:24:00,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=670573.3333333334, ans=0.07 2023-11-19 09:24:02,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=670573.3333333334, ans=0.0 2023-11-19 09:24:04,362 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.97 vs. limit=15.0 2023-11-19 09:24:08,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=670640.0, ans=0.125 2023-11-19 09:24:13,650 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=670640.0, ans=0.0 2023-11-19 09:24:14,411 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.327e+01 8.076e+01 8.724e+01 9.307e+01 1.083e+02, threshold=1.745e+02, percent-clipped=0.0 2023-11-19 09:24:27,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=670773.3333333334, ans=0.0 2023-11-19 09:24:27,633 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.86 vs. limit=6.0 2023-11-19 09:24:30,647 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.77 vs. limit=15.0 2023-11-19 09:24:31,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass_mid.scale_min, batch_count=670773.3333333334, ans=0.2 2023-11-19 09:24:36,609 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=670773.3333333334, ans=0.2 2023-11-19 09:24:39,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=670840.0, ans=0.1 2023-11-19 09:24:49,636 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4450, loss[loss=0.08305, simple_loss=0.1046, pruned_loss=0.0238, audio_tagging_loss=0.006937, over 15247.00 frames. ], tot_loss[loss=0.08894, simple_loss=0.1082, pruned_loss=0.02457, audio_tagging_loss=0.01027, over 3048083.56 frames. ], batch size: 57, lr: 7.77e-03, grad_scale: 32.0 2023-11-19 09:25:16,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=671040.0, ans=0.125 2023-11-19 09:25:45,403 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4500, loss[loss=0.1135, simple_loss=0.1449, pruned_loss=0.03298, audio_tagging_loss=0.008116, over 16845.00 frames. ], tot_loss[loss=0.08942, simple_loss=0.1088, pruned_loss=0.02476, audio_tagging_loss=0.01026, over 3054101.99 frames. ], batch size: 63, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:25:56,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=671306.6666666666, ans=0.125 2023-11-19 09:26:06,066 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.847e+01 8.307e+01 9.155e+01 9.901e+01 1.565e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 09:26:13,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=671373.3333333334, ans=0.0 2023-11-19 09:26:16,049 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.45 vs. limit=15.0 2023-11-19 09:26:16,828 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=671373.3333333334, ans=0.125 2023-11-19 09:26:27,426 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=10.60 vs. limit=15.0 2023-11-19 09:26:28,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=671440.0, ans=0.1 2023-11-19 09:26:41,061 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4550, loss[loss=0.09691, simple_loss=0.1182, pruned_loss=0.02497, audio_tagging_loss=0.01282, over 15908.00 frames. ], tot_loss[loss=0.08937, simple_loss=0.1084, pruned_loss=0.02485, audio_tagging_loss=0.01034, over 3049225.37 frames. ], batch size: 61, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:26:44,908 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=13.84 vs. limit=15.0 2023-11-19 09:26:48,790 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=671573.3333333334, ans=0.125 2023-11-19 09:26:51,812 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.82 vs. limit=15.0 2023-11-19 09:26:58,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=671640.0, ans=0.0 2023-11-19 09:26:59,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=8.02 vs. limit=15.0 2023-11-19 09:27:02,381 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=671706.6666666666, ans=0.125 2023-11-19 09:27:08,823 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=10.98 vs. limit=15.0 2023-11-19 09:27:17,696 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=13.09 vs. limit=15.0 2023-11-19 09:27:22,354 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:27:36,509 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4600, loss[loss=0.09639, simple_loss=0.1227, pruned_loss=0.02616, audio_tagging_loss=0.008883, over 16011.00 frames. ], tot_loss[loss=0.08985, simple_loss=0.1088, pruned_loss=0.02506, audio_tagging_loss=0.01038, over 3056845.50 frames. ], batch size: 57, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:27:37,345 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=671906.6666666666, ans=0.1 2023-11-19 09:27:48,175 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:27:56,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=671973.3333333334, ans=0.125 2023-11-19 09:27:58,267 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.100e+01 8.562e+01 9.559e+01 1.086e+02 1.814e+02, threshold=1.912e+02, percent-clipped=0.0 2023-11-19 09:28:10,138 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=672106.6666666666, ans=0.05 2023-11-19 09:28:12,178 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=672106.6666666666, ans=0.0 2023-11-19 09:28:16,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=672106.6666666666, ans=0.125 2023-11-19 09:28:17,477 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-19 09:28:32,586 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4650, loss[loss=0.08111, simple_loss=0.1041, pruned_loss=0.01859, audio_tagging_loss=0.01046, over 16398.00 frames. ], tot_loss[loss=0.08846, simple_loss=0.1069, pruned_loss=0.02442, audio_tagging_loss=0.01061, over 3063129.81 frames. ], batch size: 59, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:28:32,756 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=672240.0, ans=0.125 2023-11-19 09:28:37,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=672240.0, ans=0.0 2023-11-19 09:28:43,918 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=672306.6666666666, ans=0.1 2023-11-19 09:28:52,490 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:28:52,598 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=672306.6666666666, ans=0.125 2023-11-19 09:28:57,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=672373.3333333334, ans=0.2 2023-11-19 09:29:11,080 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=672440.0, ans=0.125 2023-11-19 09:29:25,409 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:29:28,494 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4700, loss[loss=0.0898, simple_loss=0.1072, pruned_loss=0.02727, audio_tagging_loss=0.008923, over 14408.00 frames. ], tot_loss[loss=0.0879, simple_loss=0.106, pruned_loss=0.02416, audio_tagging_loss=0.01074, over 3061405.86 frames. ], batch size: 54, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:29:44,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer1.prob, batch_count=672640.0, ans=0.125 2023-11-19 09:29:45,596 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.31 vs. limit=12.0 2023-11-19 09:29:47,988 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=672640.0, ans=0.125 2023-11-19 09:29:49,358 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.45 vs. limit=22.5 2023-11-19 09:29:49,806 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.777e+01 8.341e+01 9.226e+01 1.015e+02 1.641e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 09:29:50,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=672706.6666666666, ans=0.125 2023-11-19 09:29:52,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=672706.6666666666, ans=0.125 2023-11-19 09:29:59,219 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=672706.6666666666, ans=0.2 2023-11-19 09:30:14,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.16 vs. limit=6.0 2023-11-19 09:30:24,350 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4750, loss[loss=0.08863, simple_loss=0.1091, pruned_loss=0.02475, audio_tagging_loss=0.009341, over 15078.00 frames. ], tot_loss[loss=0.08799, simple_loss=0.1062, pruned_loss=0.02411, audio_tagging_loss=0.01078, over 3056111.68 frames. ], batch size: 57, lr: 7.76e-03, grad_scale: 16.0 2023-11-19 09:30:24,966 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.75 vs. limit=12.0 2023-11-19 09:30:37,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=672973.3333333334, ans=0.0 2023-11-19 09:30:39,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=672973.3333333334, ans=0.0 2023-11-19 09:31:06,755 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=673106.6666666666, ans=0.125 2023-11-19 09:31:20,499 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4800, loss[loss=0.07412, simple_loss=0.08455, pruned_loss=0.02039, audio_tagging_loss=0.01146, over 15346.00 frames. ], tot_loss[loss=0.08791, simple_loss=0.106, pruned_loss=0.02396, audio_tagging_loss=0.01096, over 3052146.15 frames. ], batch size: 59, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:31:21,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=673240.0, ans=0.0 2023-11-19 09:31:31,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=5.28 vs. limit=12.0 2023-11-19 09:31:41,641 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.288e+01 8.950e+01 9.768e+01 1.286e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 09:31:50,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=673373.3333333334, ans=0.2 2023-11-19 09:32:09,030 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=673506.6666666666, ans=0.0 2023-11-19 09:32:10,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=673506.6666666666, ans=0.125 2023-11-19 09:32:11,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=673506.6666666666, ans=0.125 2023-11-19 09:32:16,666 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4850, loss[loss=0.09656, simple_loss=0.1229, pruned_loss=0.02205, audio_tagging_loss=0.01309, over 15650.00 frames. ], tot_loss[loss=0.08773, simple_loss=0.1054, pruned_loss=0.02392, audio_tagging_loss=0.01108, over 3044127.69 frames. ], batch size: 56, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:32:22,155 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=673573.3333333334, ans=0.2 2023-11-19 09:32:56,185 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=673773.3333333334, ans=0.125 2023-11-19 09:33:11,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=673906.6666666666, ans=0.125 2023-11-19 09:33:12,505 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4900, loss[loss=0.08925, simple_loss=0.1129, pruned_loss=0.0229, audio_tagging_loss=0.009923, over 15282.00 frames. ], tot_loss[loss=0.08835, simple_loss=0.1066, pruned_loss=0.02407, audio_tagging_loss=0.01095, over 3047224.85 frames. ], batch size: 57, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:33:23,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=673973.3333333334, ans=0.1 2023-11-19 09:33:33,592 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.273e+01 8.373e+01 9.037e+01 1.012e+02 1.386e+02, threshold=1.807e+02, percent-clipped=0.0 2023-11-19 09:33:41,018 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.87 vs. limit=15.0 2023-11-19 09:34:04,400 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=674173.3333333334, ans=0.2 2023-11-19 09:34:04,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=674173.3333333334, ans=0.125 2023-11-19 09:34:07,882 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 4950, loss[loss=0.06777, simple_loss=0.07575, pruned_loss=0.01917, audio_tagging_loss=0.01072, over 15205.00 frames. ], tot_loss[loss=0.08762, simple_loss=0.106, pruned_loss=0.02393, audio_tagging_loss=0.01071, over 3044912.46 frames. ], batch size: 60, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:34:27,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=674306.6666666666, ans=0.1 2023-11-19 09:34:28,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=674306.6666666666, ans=0.125 2023-11-19 09:34:38,325 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=674373.3333333334, ans=0.1 2023-11-19 09:34:51,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=674506.6666666666, ans=0.125 2023-11-19 09:34:54,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=674506.6666666666, ans=0.0 2023-11-19 09:35:02,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=674506.6666666666, ans=10.0 2023-11-19 09:35:04,110 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5000, loss[loss=0.1143, simple_loss=0.1491, pruned_loss=0.03222, audio_tagging_loss=0.007553, over 16272.00 frames. ], tot_loss[loss=0.08772, simple_loss=0.1064, pruned_loss=0.02404, audio_tagging_loss=0.01048, over 3042970.02 frames. ], batch size: 56, lr: 7.75e-03, grad_scale: 32.0 2023-11-19 09:35:20,647 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=9.69 vs. limit=15.0 2023-11-19 09:35:25,265 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.803e+01 8.355e+01 9.053e+01 1.007e+02 1.287e+02, threshold=1.811e+02, percent-clipped=0.0 2023-11-19 09:35:29,958 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.01 vs. limit=22.5 2023-11-19 09:35:40,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=674773.3333333334, ans=0.125 2023-11-19 09:35:59,643 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5050, loss[loss=0.08979, simple_loss=0.1027, pruned_loss=0.0274, audio_tagging_loss=0.01105, over 14184.00 frames. ], tot_loss[loss=0.08714, simple_loss=0.1057, pruned_loss=0.0239, audio_tagging_loss=0.0104, over 3051959.02 frames. ], batch size: 54, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:36:17,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=674973.3333333334, ans=0.125 2023-11-19 09:36:25,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=675040.0, ans=0.125 2023-11-19 09:36:26,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=675040.0, ans=0.1 2023-11-19 09:36:27,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=675040.0, ans=0.035 2023-11-19 09:36:29,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=675040.0, ans=0.0 2023-11-19 09:36:29,837 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.92 vs. limit=10.0 2023-11-19 09:36:37,106 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.37 vs. limit=15.0 2023-11-19 09:36:38,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675106.6666666666, ans=0.1 2023-11-19 09:36:38,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=675106.6666666666, ans=0.2 2023-11-19 09:36:55,053 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5100, loss[loss=0.08594, simple_loss=0.1012, pruned_loss=0.02607, audio_tagging_loss=0.009283, over 14674.00 frames. ], tot_loss[loss=0.08698, simple_loss=0.1055, pruned_loss=0.02388, audio_tagging_loss=0.01036, over 3047365.19 frames. ], batch size: 55, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:37:01,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=10.05 vs. limit=15.0 2023-11-19 09:37:04,370 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.67 vs. limit=22.5 2023-11-19 09:37:05,962 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=675306.6666666666, ans=0.0 2023-11-19 09:37:10,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=675306.6666666666, ans=0.1 2023-11-19 09:37:14,340 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=675306.6666666666, ans=0.2 2023-11-19 09:37:16,148 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.361e+01 9.263e+01 1.052e+02 1.984e+02, threshold=1.853e+02, percent-clipped=1.0 2023-11-19 09:37:22,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675373.3333333334, ans=0.1 2023-11-19 09:37:40,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=675506.6666666666, ans=0.1 2023-11-19 09:37:41,727 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=675506.6666666666, ans=0.125 2023-11-19 09:37:43,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=675506.6666666666, ans=0.0 2023-11-19 09:37:50,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=675573.3333333334, ans=0.04949747468305833 2023-11-19 09:37:51,145 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5150, loss[loss=0.09289, simple_loss=0.1194, pruned_loss=0.02287, audio_tagging_loss=0.01033, over 14382.00 frames. ], tot_loss[loss=0.0869, simple_loss=0.1054, pruned_loss=0.02376, audio_tagging_loss=0.01045, over 3046385.26 frames. ], batch size: 53, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:38:10,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=675640.0, ans=0.04949747468305833 2023-11-19 09:38:13,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=675706.6666666666, ans=0.1 2023-11-19 09:38:28,670 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.38 vs. limit=12.0 2023-11-19 09:38:29,808 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=675773.3333333334, ans=0.0 2023-11-19 09:38:41,107 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.12 vs. limit=10.0 2023-11-19 09:38:43,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=675840.0, ans=0.1 2023-11-19 09:38:46,046 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5200, loss[loss=0.08768, simple_loss=0.1053, pruned_loss=0.02215, audio_tagging_loss=0.01287, over 14272.00 frames. ], tot_loss[loss=0.08738, simple_loss=0.1058, pruned_loss=0.02402, audio_tagging_loss=0.01046, over 3039310.70 frames. ], batch size: 55, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:39:07,237 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.810e+01 8.521e+01 9.161e+01 1.002e+02 1.521e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 09:39:13,566 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=5.68 vs. limit=10.0 2023-11-19 09:39:24,628 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.whiten, num_groups=1, num_channels=512, metric=9.65 vs. limit=12.0 2023-11-19 09:39:25,484 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.attention_skip_rate, batch_count=676106.6666666666, ans=0.0 2023-11-19 09:39:27,791 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.76 vs. limit=22.5 2023-11-19 09:39:39,627 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=676173.3333333334, ans=0.0 2023-11-19 09:39:41,618 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5250, loss[loss=0.09724, simple_loss=0.1254, pruned_loss=0.02506, audio_tagging_loss=0.009469, over 15394.00 frames. ], tot_loss[loss=0.08729, simple_loss=0.1059, pruned_loss=0.02388, audio_tagging_loss=0.01048, over 3036791.12 frames. ], batch size: 58, lr: 7.74e-03, grad_scale: 32.0 2023-11-19 09:39:45,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=676240.0, ans=0.0 2023-11-19 09:39:55,605 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=676306.6666666666, ans=0.125 2023-11-19 09:40:07,697 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=4.461e-02 2023-11-19 09:40:15,494 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=676440.0, ans=0.0 2023-11-19 09:40:25,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=676506.6666666666, ans=0.125 2023-11-19 09:40:37,336 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5300, loss[loss=0.0882, simple_loss=0.1132, pruned_loss=0.02138, audio_tagging_loss=0.01022, over 15545.00 frames. ], tot_loss[loss=0.08765, simple_loss=0.1066, pruned_loss=0.02392, audio_tagging_loss=0.01044, over 3040254.61 frames. ], batch size: 56, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:40:50,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=676640.0, ans=0.125 2023-11-19 09:40:51,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=676640.0, ans=0.125 2023-11-19 09:40:58,531 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.778e+01 8.435e+01 9.072e+01 1.015e+02 1.516e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-19 09:40:59,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=676706.6666666666, ans=0.125 2023-11-19 09:41:02,288 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.45 vs. limit=6.0 2023-11-19 09:41:06,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.10 vs. limit=22.5 2023-11-19 09:41:32,013 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=676906.6666666666, ans=0.07 2023-11-19 09:41:32,749 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5350, loss[loss=0.07809, simple_loss=0.0894, pruned_loss=0.02151, audio_tagging_loss=0.01188, over 14641.00 frames. ], tot_loss[loss=0.08743, simple_loss=0.1061, pruned_loss=0.02383, audio_tagging_loss=0.01053, over 3036370.87 frames. ], batch size: 57, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:41:39,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=676906.6666666666, ans=0.125 2023-11-19 09:41:40,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.88 vs. limit=22.5 2023-11-19 09:42:04,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=677040.0, ans=0.2 2023-11-19 09:42:18,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=677173.3333333334, ans=0.2 2023-11-19 09:42:19,972 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.33 vs. limit=22.5 2023-11-19 09:42:20,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=677173.3333333334, ans=0.1 2023-11-19 09:42:28,294 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5400, loss[loss=0.09134, simple_loss=0.1133, pruned_loss=0.02619, audio_tagging_loss=0.0085, over 16061.00 frames. ], tot_loss[loss=0.08732, simple_loss=0.1057, pruned_loss=0.02389, audio_tagging_loss=0.01056, over 3035865.48 frames. ], batch size: 58, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:42:29,531 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.max_positive, batch_count=677240.0, ans=0.95 2023-11-19 09:42:37,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=677240.0, ans=0.04949747468305833 2023-11-19 09:42:40,621 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=677306.6666666666, ans=0.125 2023-11-19 09:42:45,230 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=9.80 vs. limit=15.0 2023-11-19 09:42:46,665 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:42:49,671 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.399e+01 8.356e+01 9.040e+01 1.006e+02 1.272e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 09:43:05,704 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=677440.0, ans=0.05 2023-11-19 09:43:18,292 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=11.19 vs. limit=15.0 2023-11-19 09:43:19,301 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.95 vs. limit=6.0 2023-11-19 09:43:22,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=677506.6666666666, ans=0.125 2023-11-19 09:43:24,109 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5450, loss[loss=0.07413, simple_loss=0.09048, pruned_loss=0.01889, audio_tagging_loss=0.01, over 16436.00 frames. ], tot_loss[loss=0.0873, simple_loss=0.1059, pruned_loss=0.02375, audio_tagging_loss=0.01059, over 3038658.24 frames. ], batch size: 63, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:43:50,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=677706.6666666666, ans=0.2 2023-11-19 09:44:02,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=677773.3333333334, ans=0.125 2023-11-19 09:44:03,773 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=677773.3333333334, ans=0.0 2023-11-19 09:44:16,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=677840.0, ans=0.95 2023-11-19 09:44:20,041 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5500, loss[loss=0.1033, simple_loss=0.1233, pruned_loss=0.03254, audio_tagging_loss=0.009089, over 16296.00 frames. ], tot_loss[loss=0.08843, simple_loss=0.1074, pruned_loss=0.02416, audio_tagging_loss=0.01059, over 3045330.21 frames. ], batch size: 58, lr: 7.73e-03, grad_scale: 32.0 2023-11-19 09:44:41,364 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.921e+01 8.594e+01 9.259e+01 1.001e+02 1.326e+02, threshold=1.852e+02, percent-clipped=0.0 2023-11-19 09:45:14,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=678173.3333333334, ans=0.1 2023-11-19 09:45:16,531 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5550, loss[loss=0.09002, simple_loss=0.114, pruned_loss=0.02181, audio_tagging_loss=0.01118, over 16811.00 frames. ], tot_loss[loss=0.0886, simple_loss=0.1076, pruned_loss=0.02424, audio_tagging_loss=0.01058, over 3047589.27 frames. ], batch size: 62, lr: 7.72e-03, grad_scale: 32.0 2023-11-19 09:45:33,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=678306.6666666666, ans=0.125 2023-11-19 09:45:47,647 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.52 vs. limit=15.0 2023-11-19 09:45:58,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.prob, batch_count=678440.0, ans=0.125 2023-11-19 09:46:00,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=678506.6666666666, ans=0.0 2023-11-19 09:46:04,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.11 vs. limit=12.0 2023-11-19 09:46:12,186 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5600, loss[loss=0.09722, simple_loss=0.1189, pruned_loss=0.02435, audio_tagging_loss=0.01342, over 15918.00 frames. ], tot_loss[loss=0.08878, simple_loss=0.1078, pruned_loss=0.02425, audio_tagging_loss=0.01063, over 3051229.02 frames. ], batch size: 58, lr: 7.72e-03, grad_scale: 32.0 2023-11-19 09:46:22,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.22 vs. limit=15.0 2023-11-19 09:46:34,640 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.090e+01 8.355e+01 9.090e+01 1.021e+02 1.619e+02, threshold=1.818e+02, percent-clipped=0.0 2023-11-19 09:46:52,573 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:47:03,006 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=678840.0, ans=0.1 2023-11-19 09:47:07,977 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5650, loss[loss=0.07808, simple_loss=0.08717, pruned_loss=0.02177, audio_tagging_loss=0.01272, over 14683.00 frames. ], tot_loss[loss=0.08887, simple_loss=0.1079, pruned_loss=0.02423, audio_tagging_loss=0.01069, over 3055737.87 frames. ], batch size: 56, lr: 7.72e-03, grad_scale: 16.0 2023-11-19 09:47:35,828 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.10 vs. limit=6.0 2023-11-19 09:47:58,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=679173.3333333334, ans=0.125 2023-11-19 09:48:04,394 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5700, loss[loss=0.07613, simple_loss=0.1026, pruned_loss=0.01785, audio_tagging_loss=0.006955, over 15750.00 frames. ], tot_loss[loss=0.08926, simple_loss=0.1085, pruned_loss=0.02447, audio_tagging_loss=0.01052, over 3059197.02 frames. ], batch size: 58, lr: 7.72e-03, grad_scale: 16.0 2023-11-19 09:48:26,548 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.364e+01 8.673e+01 9.391e+01 1.015e+02 1.366e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-19 09:48:51,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=679506.6666666666, ans=0.125 2023-11-19 09:48:51,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=679506.6666666666, ans=0.0 2023-11-19 09:48:53,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=679506.6666666666, ans=0.125 2023-11-19 09:48:59,869 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5750, loss[loss=0.08615, simple_loss=0.1067, pruned_loss=0.02335, audio_tagging_loss=0.009434, over 15424.00 frames. ], tot_loss[loss=0.08808, simple_loss=0.107, pruned_loss=0.02419, audio_tagging_loss=0.01041, over 3056628.87 frames. ], batch size: 56, lr: 7.72e-03, grad_scale: 16.0 2023-11-19 09:49:11,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=679640.0, ans=0.125 2023-11-19 09:49:15,434 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=679640.0, ans=0.125 2023-11-19 09:49:46,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.62 vs. limit=10.0 2023-11-19 09:49:49,597 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.66 vs. limit=15.0 2023-11-19 09:49:55,318 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5800, loss[loss=0.1006, simple_loss=0.1208, pruned_loss=0.034, audio_tagging_loss=0.00617, over 15348.00 frames. ], tot_loss[loss=0.08809, simple_loss=0.107, pruned_loss=0.02428, audio_tagging_loss=0.01031, over 3058643.73 frames. ], batch size: 58, lr: 7.72e-03, grad_scale: 16.0 2023-11-19 09:49:55,869 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.02 vs. limit=10.0 2023-11-19 09:49:58,757 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=679906.6666666666, ans=0.125 2023-11-19 09:50:17,689 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.685e+01 8.360e+01 9.012e+01 9.906e+01 1.267e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 09:50:19,028 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=680040.0, ans=0.125 2023-11-19 09:50:31,722 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=680106.6666666666, ans=0.2 2023-11-19 09:50:43,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=680173.3333333334, ans=0.125 2023-11-19 09:50:50,913 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5850, loss[loss=0.05953, simple_loss=0.07371, pruned_loss=0.01277, audio_tagging_loss=0.009905, over 16493.00 frames. ], tot_loss[loss=0.0885, simple_loss=0.1075, pruned_loss=0.02444, audio_tagging_loss=0.0103, over 3049626.11 frames. ], batch size: 62, lr: 7.71e-03, grad_scale: 16.0 2023-11-19 09:51:11,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=680306.6666666666, ans=0.125 2023-11-19 09:51:11,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten.whitening_limit, batch_count=680306.6666666666, ans=15.0 2023-11-19 09:51:21,818 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.40 vs. limit=12.0 2023-11-19 09:51:42,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=680506.6666666666, ans=0.1 2023-11-19 09:51:47,067 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5900, loss[loss=0.1133, simple_loss=0.1452, pruned_loss=0.03144, audio_tagging_loss=0.009205, over 16050.00 frames. ], tot_loss[loss=0.08861, simple_loss=0.1078, pruned_loss=0.0244, audio_tagging_loss=0.01031, over 3049690.28 frames. ], batch size: 59, lr: 7.71e-03, grad_scale: 16.0 2023-11-19 09:51:56,350 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=680573.3333333334, ans=0.0 2023-11-19 09:52:03,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=680640.0, ans=0.0 2023-11-19 09:52:08,763 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.892e+01 8.198e+01 8.843e+01 9.810e+01 1.400e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 09:52:19,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=680773.3333333334, ans=0.1 2023-11-19 09:52:20,413 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.92 vs. limit=15.0 2023-11-19 09:52:30,848 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.87 vs. limit=12.0 2023-11-19 09:52:42,555 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 5950, loss[loss=0.09009, simple_loss=0.1073, pruned_loss=0.02491, audio_tagging_loss=0.01151, over 15377.00 frames. ], tot_loss[loss=0.08864, simple_loss=0.1079, pruned_loss=0.02442, audio_tagging_loss=0.01029, over 3051885.40 frames. ], batch size: 57, lr: 7.71e-03, grad_scale: 16.0 2023-11-19 09:52:53,670 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.17 vs. limit=15.0 2023-11-19 09:53:23,523 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=12.50 vs. limit=22.5 2023-11-19 09:53:38,040 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6000, loss[loss=0.1142, simple_loss=0.1444, pruned_loss=0.03269, audio_tagging_loss=0.009322, over 15390.00 frames. ], tot_loss[loss=0.08855, simple_loss=0.1078, pruned_loss=0.02429, audio_tagging_loss=0.01034, over 3049076.46 frames. ], batch size: 53, lr: 7.71e-03, grad_scale: 32.0 2023-11-19 09:53:38,043 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 09:54:10,903 INFO [train_asr.py:1147] (0/4) Epoch 9, validation: loss=0.06636, simple_loss=0.05607, pruned_loss=0.006778, audio_tagging_loss=0.03155, over 4681554.00 frames. 2023-11-19 09:54:10,903 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 09:54:28,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=681306.6666666666, ans=0.125 2023-11-19 09:54:33,947 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.777e+01 8.283e+01 9.118e+01 1.003e+02 1.340e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 09:54:50,682 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=13.89 vs. limit=15.0 2023-11-19 09:54:51,007 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 09:55:06,813 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6050, loss[loss=0.07369, simple_loss=0.08451, pruned_loss=0.0188, audio_tagging_loss=0.01264, over 15487.00 frames. ], tot_loss[loss=0.08809, simple_loss=0.1074, pruned_loss=0.02408, audio_tagging_loss=0.01033, over 3045038.12 frames. ], batch size: 62, lr: 7.71e-03, grad_scale: 16.0 2023-11-19 09:55:20,239 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=681640.0, ans=0.125 2023-11-19 09:55:24,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=681640.0, ans=0.1 2023-11-19 09:55:30,984 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.88 vs. limit=15.0 2023-11-19 09:56:02,351 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6100, loss[loss=0.08164, simple_loss=0.1006, pruned_loss=0.02212, audio_tagging_loss=0.009234, over 15494.00 frames. ], tot_loss[loss=0.08759, simple_loss=0.1067, pruned_loss=0.02392, audio_tagging_loss=0.01031, over 3045651.28 frames. ], batch size: 57, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:56:06,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=681906.6666666666, ans=0.2 2023-11-19 09:56:14,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=681973.3333333334, ans=0.0 2023-11-19 09:56:19,377 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=681973.3333333334, ans=0.1 2023-11-19 09:56:22,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=681973.3333333334, ans=0.0 2023-11-19 09:56:22,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.98 vs. limit=15.0 2023-11-19 09:56:26,131 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.696e+01 8.498e+01 9.519e+01 1.052e+02 1.737e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 09:56:26,916 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.64 vs. limit=22.5 2023-11-19 09:56:44,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=682106.6666666666, ans=0.125 2023-11-19 09:56:49,435 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.16 vs. limit=15.0 2023-11-19 09:56:57,824 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6150, loss[loss=0.08544, simple_loss=0.1077, pruned_loss=0.02353, audio_tagging_loss=0.008079, over 15354.00 frames. ], tot_loss[loss=0.08749, simple_loss=0.1062, pruned_loss=0.02397, audio_tagging_loss=0.01043, over 3042473.60 frames. ], batch size: 55, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:57:24,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=682373.3333333334, ans=0.07 2023-11-19 09:57:25,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=682373.3333333334, ans=0.1 2023-11-19 09:57:52,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=682573.3333333334, ans=0.2 2023-11-19 09:57:53,353 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6200, loss[loss=0.09523, simple_loss=0.1216, pruned_loss=0.02437, audio_tagging_loss=0.01005, over 16427.00 frames. ], tot_loss[loss=0.08693, simple_loss=0.1055, pruned_loss=0.02369, audio_tagging_loss=0.01051, over 3048390.11 frames. ], batch size: 62, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:58:01,008 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=682573.3333333334, ans=0.125 2023-11-19 09:58:03,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=682640.0, ans=0.125 2023-11-19 09:58:12,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=682640.0, ans=0.0 2023-11-19 09:58:16,315 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.684e+01 8.555e+01 9.157e+01 9.904e+01 1.201e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 09:58:37,379 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2023-11-19 09:58:39,584 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.22 vs. limit=15.0 2023-11-19 09:58:49,117 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6250, loss[loss=0.1126, simple_loss=0.1371, pruned_loss=0.0338, audio_tagging_loss=0.01026, over 16112.00 frames. ], tot_loss[loss=0.08721, simple_loss=0.1056, pruned_loss=0.02387, audio_tagging_loss=0.01057, over 3048913.28 frames. ], batch size: 57, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:58:52,904 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.28 vs. limit=6.0 2023-11-19 09:59:03,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=682973.3333333334, ans=0.2 2023-11-19 09:59:10,507 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 09:59:16,777 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.98 vs. limit=10.0 2023-11-19 09:59:32,843 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=683173.3333333334, ans=0.125 2023-11-19 09:59:41,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=683173.3333333334, ans=0.2 2023-11-19 09:59:44,721 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6300, loss[loss=0.1137, simple_loss=0.1388, pruned_loss=0.03485, audio_tagging_loss=0.009483, over 15927.00 frames. ], tot_loss[loss=0.08813, simple_loss=0.1067, pruned_loss=0.02421, audio_tagging_loss=0.01056, over 3052644.23 frames. ], batch size: 57, lr: 7.70e-03, grad_scale: 16.0 2023-11-19 09:59:45,421 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=1.97 vs. limit=6.0 2023-11-19 09:59:46,964 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=683240.0, ans=0.125 2023-11-19 09:59:49,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=683240.0, ans=0.0 2023-11-19 10:00:07,718 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.970e+01 8.511e+01 9.206e+01 1.011e+02 2.353e+02, threshold=1.841e+02, percent-clipped=1.0 2023-11-19 10:00:22,001 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=683440.0, ans=0.125 2023-11-19 10:00:25,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=683440.0, ans=0.2 2023-11-19 10:00:31,982 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=683506.6666666666, ans=0.125 2023-11-19 10:00:40,555 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6350, loss[loss=0.08576, simple_loss=0.0994, pruned_loss=0.02537, audio_tagging_loss=0.01069, over 15407.00 frames. ], tot_loss[loss=0.08708, simple_loss=0.1054, pruned_loss=0.02367, audio_tagging_loss=0.01072, over 3052421.15 frames. ], batch size: 59, lr: 7.69e-03, grad_scale: 16.0 2023-11-19 10:00:52,362 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=683640.0, ans=0.125 2023-11-19 10:00:59,139 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=683640.0, ans=0.0 2023-11-19 10:01:16,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=683773.3333333334, ans=0.125 2023-11-19 10:01:27,722 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=12.90 vs. limit=15.0 2023-11-19 10:01:34,634 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=683906.6666666666, ans=0.0 2023-11-19 10:01:35,459 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6400, loss[loss=0.1117, simple_loss=0.1317, pruned_loss=0.03439, audio_tagging_loss=0.01142, over 15648.00 frames. ], tot_loss[loss=0.08656, simple_loss=0.1042, pruned_loss=0.02342, audio_tagging_loss=0.01105, over 3051649.00 frames. ], batch size: 58, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:01:35,710 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.min_positive, batch_count=683906.6666666666, ans=0.025 2023-11-19 10:01:45,131 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten1.whitening_limit, batch_count=683906.6666666666, ans=10.0 2023-11-19 10:01:59,226 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.664e+01 8.378e+01 8.903e+01 9.717e+01 1.251e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-19 10:02:14,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=684106.6666666666, ans=0.125 2023-11-19 10:02:30,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=684240.0, ans=0.2 2023-11-19 10:02:30,941 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6450, loss[loss=0.06598, simple_loss=0.07877, pruned_loss=0.01497, audio_tagging_loss=0.01162, over 15787.00 frames. ], tot_loss[loss=0.08666, simple_loss=0.1046, pruned_loss=0.02332, audio_tagging_loss=0.01106, over 3052908.71 frames. ], batch size: 59, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:03:09,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=684440.0, ans=0.0 2023-11-19 10:03:27,109 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6500, loss[loss=0.06988, simple_loss=0.07706, pruned_loss=0.01416, audio_tagging_loss=0.01719, over 15006.00 frames. ], tot_loss[loss=0.08663, simple_loss=0.1041, pruned_loss=0.02355, audio_tagging_loss=0.01101, over 3046364.28 frames. ], batch size: 58, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:03:29,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=684573.3333333334, ans=0.125 2023-11-19 10:03:50,208 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.647e+01 8.426e+01 9.031e+01 9.982e+01 1.610e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-19 10:03:50,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=684706.6666666666, ans=0.125 2023-11-19 10:04:14,608 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=684840.0, ans=0.125 2023-11-19 10:04:22,322 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6550, loss[loss=0.07624, simple_loss=0.09614, pruned_loss=0.01761, audio_tagging_loss=0.01056, over 15055.00 frames. ], tot_loss[loss=0.0867, simple_loss=0.1047, pruned_loss=0.02351, audio_tagging_loss=0.01085, over 3052601.90 frames. ], batch size: 57, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:04:26,205 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=684906.6666666666, ans=0.0 2023-11-19 10:04:29,918 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module2.whiten, num_groups=1, num_channels=192, metric=11.34 vs. limit=15.0 2023-11-19 10:04:47,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=685040.0, ans=0.0 2023-11-19 10:05:18,132 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6600, loss[loss=0.1131, simple_loss=0.143, pruned_loss=0.03134, audio_tagging_loss=0.01021, over 15407.00 frames. ], tot_loss[loss=0.08645, simple_loss=0.1046, pruned_loss=0.02343, audio_tagging_loss=0.0107, over 3047623.16 frames. ], batch size: 57, lr: 7.69e-03, grad_scale: 32.0 2023-11-19 10:05:28,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=685306.6666666666, ans=0.2 2023-11-19 10:05:32,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=685306.6666666666, ans=0.0 2023-11-19 10:05:32,451 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.51 vs. limit=6.0 2023-11-19 10:05:33,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=685306.6666666666, ans=0.0 2023-11-19 10:05:41,605 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.978e+01 8.207e+01 8.810e+01 9.589e+01 1.176e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-19 10:06:14,467 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6650, loss[loss=0.07761, simple_loss=0.0959, pruned_loss=0.01797, audio_tagging_loss=0.01169, over 15310.00 frames. ], tot_loss[loss=0.08682, simple_loss=0.1053, pruned_loss=0.02357, audio_tagging_loss=0.01059, over 3046589.99 frames. ], batch size: 59, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:06:14,863 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=4.04 vs. limit=15.0 2023-11-19 10:06:28,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=685640.0, ans=0.125 2023-11-19 10:06:38,872 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer_ff2.min_abs, batch_count=685706.6666666666, ans=0.1 2023-11-19 10:07:08,716 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.48 vs. limit=15.0 2023-11-19 10:07:09,323 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6700, loss[loss=0.07424, simple_loss=0.08824, pruned_loss=0.02107, audio_tagging_loss=0.009056, over 15140.00 frames. ], tot_loss[loss=0.08669, simple_loss=0.1052, pruned_loss=0.02351, audio_tagging_loss=0.01057, over 3047622.57 frames. ], batch size: 56, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:07:18,069 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=685906.6666666666, ans=0.125 2023-11-19 10:07:31,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=686040.0, ans=0.07 2023-11-19 10:07:32,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=686040.0, ans=0.0 2023-11-19 10:07:32,753 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=6.78 vs. limit=15.0 2023-11-19 10:07:33,114 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.849e+01 8.482e+01 9.410e+01 1.023e+02 1.409e+02, threshold=1.882e+02, percent-clipped=0.0 2023-11-19 10:07:50,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=686106.6666666666, ans=0.0 2023-11-19 10:07:54,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=686173.3333333334, ans=0.125 2023-11-19 10:07:58,366 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=686173.3333333334, ans=0.0 2023-11-19 10:08:01,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=686173.3333333334, ans=0.125 2023-11-19 10:08:05,669 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6750, loss[loss=0.08742, simple_loss=0.1013, pruned_loss=0.0219, audio_tagging_loss=0.01485, over 15394.00 frames. ], tot_loss[loss=0.08733, simple_loss=0.1058, pruned_loss=0.02386, audio_tagging_loss=0.0106, over 3040542.12 frames. ], batch size: 56, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:08:08,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=686240.0, ans=0.125 2023-11-19 10:08:12,757 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.14 vs. limit=22.5 2023-11-19 10:08:17,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=686306.6666666666, ans=0.125 2023-11-19 10:08:18,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=686306.6666666666, ans=0.125 2023-11-19 10:08:18,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=686306.6666666666, ans=0.0 2023-11-19 10:08:25,929 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=686306.6666666666, ans=0.0 2023-11-19 10:08:32,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=686373.3333333334, ans=0.1 2023-11-19 10:08:35,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=686373.3333333334, ans=0.125 2023-11-19 10:08:40,855 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=686440.0, ans=0.0 2023-11-19 10:09:01,566 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6800, loss[loss=0.1383, simple_loss=0.1629, pruned_loss=0.04749, audio_tagging_loss=0.009315, over 16793.00 frames. ], tot_loss[loss=0.08799, simple_loss=0.1064, pruned_loss=0.02419, audio_tagging_loss=0.01058, over 3033859.13 frames. ], batch size: 62, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:09:01,863 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=686573.3333333334, ans=0.0 2023-11-19 10:09:06,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=686573.3333333334, ans=0.2 2023-11-19 10:09:09,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=686573.3333333334, ans=0.0 2023-11-19 10:09:24,302 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.110e+01 8.200e+01 8.866e+01 9.843e+01 1.346e+02, threshold=1.773e+02, percent-clipped=0.0 2023-11-19 10:09:51,776 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2023-11-19 10:09:56,530 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6850, loss[loss=0.07234, simple_loss=0.08351, pruned_loss=0.01677, audio_tagging_loss=0.01382, over 15715.00 frames. ], tot_loss[loss=0.08686, simple_loss=0.1049, pruned_loss=0.0238, audio_tagging_loss=0.01059, over 3033221.88 frames. ], batch size: 59, lr: 7.68e-03, grad_scale: 32.0 2023-11-19 10:10:01,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=686906.6666666666, ans=0.0 2023-11-19 10:10:15,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=686973.3333333334, ans=0.1 2023-11-19 10:10:30,733 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:10:36,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=687106.6666666666, ans=0.125 2023-11-19 10:10:39,100 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=687106.6666666666, ans=0.125 2023-11-19 10:10:44,769 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.82 vs. limit=15.0 2023-11-19 10:10:47,040 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=687173.3333333334, ans=0.1 2023-11-19 10:10:52,110 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6900, loss[loss=0.08649, simple_loss=0.1008, pruned_loss=0.02597, audio_tagging_loss=0.01014, over 16864.00 frames. ], tot_loss[loss=0.08736, simple_loss=0.1058, pruned_loss=0.02392, audio_tagging_loss=0.01052, over 3034132.01 frames. ], batch size: 65, lr: 7.67e-03, grad_scale: 32.0 2023-11-19 10:10:58,290 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.89 vs. limit=15.0 2023-11-19 10:10:59,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=687240.0, ans=0.95 2023-11-19 10:11:15,064 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=687373.3333333334, ans=0.0 2023-11-19 10:11:15,750 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.905e+01 8.720e+01 9.438e+01 1.043e+02 1.545e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-19 10:11:19,212 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.453e-01 2023-11-19 10:11:21,293 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=687373.3333333334, ans=0.2 2023-11-19 10:11:34,262 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:11:47,997 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 6950, loss[loss=0.09273, simple_loss=0.1118, pruned_loss=0.02518, audio_tagging_loss=0.01162, over 16617.00 frames. ], tot_loss[loss=0.08766, simple_loss=0.1065, pruned_loss=0.02392, audio_tagging_loss=0.01049, over 3048519.35 frames. ], batch size: 64, lr: 7.67e-03, grad_scale: 32.0 2023-11-19 10:11:52,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=687573.3333333334, ans=0.125 2023-11-19 10:11:54,345 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.36 vs. limit=15.0 2023-11-19 10:12:00,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=687640.0, ans=0.125 2023-11-19 10:12:17,745 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=687706.6666666666, ans=0.0 2023-11-19 10:12:18,904 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=687706.6666666666, ans=0.07 2023-11-19 10:12:25,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.prob, batch_count=687773.3333333334, ans=0.125 2023-11-19 10:12:43,302 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7000, loss[loss=0.08185, simple_loss=0.1019, pruned_loss=0.0215, audio_tagging_loss=0.009416, over 15823.00 frames. ], tot_loss[loss=0.08759, simple_loss=0.1062, pruned_loss=0.02396, audio_tagging_loss=0.01054, over 3043252.23 frames. ], batch size: 58, lr: 7.67e-03, grad_scale: 16.0 2023-11-19 10:12:55,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=687973.3333333334, ans=0.125 2023-11-19 10:13:08,454 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.313e+01 8.438e+01 9.310e+01 1.017e+02 1.231e+02, threshold=1.862e+02, percent-clipped=0.0 2023-11-19 10:13:09,857 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=688040.0, ans=0.2 2023-11-19 10:13:23,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=688106.6666666666, ans=0.0 2023-11-19 10:13:25,334 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.15 vs. limit=15.0 2023-11-19 10:13:33,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.61 vs. limit=15.0 2023-11-19 10:13:39,127 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7050, loss[loss=0.09233, simple_loss=0.1106, pruned_loss=0.02593, audio_tagging_loss=0.01108, over 15789.00 frames. ], tot_loss[loss=0.08692, simple_loss=0.1053, pruned_loss=0.02358, audio_tagging_loss=0.0107, over 3043469.20 frames. ], batch size: 59, lr: 7.67e-03, grad_scale: 16.0 2023-11-19 10:13:45,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=688240.0, ans=0.1 2023-11-19 10:13:49,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=688306.6666666666, ans=0.2 2023-11-19 10:14:03,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=688373.3333333334, ans=0.125 2023-11-19 10:14:18,587 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=688440.0, ans=0.1 2023-11-19 10:14:32,811 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.68 vs. limit=15.0 2023-11-19 10:14:33,336 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=688506.6666666666, ans=0.0 2023-11-19 10:14:34,401 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=688573.3333333334, ans=0.04949747468305833 2023-11-19 10:14:34,507 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=688573.3333333334, ans=0.125 2023-11-19 10:14:35,309 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7100, loss[loss=0.118, simple_loss=0.1507, pruned_loss=0.03522, audio_tagging_loss=0.007495, over 15113.00 frames. ], tot_loss[loss=0.08695, simple_loss=0.1051, pruned_loss=0.02371, audio_tagging_loss=0.01071, over 3044107.78 frames. ], batch size: 55, lr: 7.67e-03, grad_scale: 16.0 2023-11-19 10:14:54,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten.whitening_limit, batch_count=688640.0, ans=15.0 2023-11-19 10:14:59,036 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.651e+01 8.454e+01 9.144e+01 1.007e+02 1.381e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-19 10:15:30,855 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7150, loss[loss=0.0739, simple_loss=0.09263, pruned_loss=0.01739, audio_tagging_loss=0.01019, over 14308.00 frames. ], tot_loss[loss=0.08644, simple_loss=0.1044, pruned_loss=0.02342, audio_tagging_loss=0.01084, over 3045134.51 frames. ], batch size: 54, lr: 7.67e-03, grad_scale: 16.0 2023-11-19 10:15:52,612 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=689040.0, ans=0.1 2023-11-19 10:15:55,168 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:16:26,366 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7200, loss[loss=0.09931, simple_loss=0.1159, pruned_loss=0.02823, audio_tagging_loss=0.01315, over 14188.00 frames. ], tot_loss[loss=0.08602, simple_loss=0.1038, pruned_loss=0.02326, audio_tagging_loss=0.01086, over 3040250.03 frames. ], batch size: 52, lr: 7.66e-03, grad_scale: 32.0 2023-11-19 10:16:27,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=689240.0, ans=0.125 2023-11-19 10:16:40,402 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=12.49 vs. limit=15.0 2023-11-19 10:16:50,998 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.339e+01 8.352e+01 9.080e+01 1.000e+02 1.342e+02, threshold=1.816e+02, percent-clipped=0.0 2023-11-19 10:16:55,469 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=689373.3333333334, ans=0.125 2023-11-19 10:17:14,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=689506.6666666666, ans=0.125 2023-11-19 10:17:14,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=689506.6666666666, ans=0.0 2023-11-19 10:17:21,596 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7250, loss[loss=0.06136, simple_loss=0.0739, pruned_loss=0.01326, audio_tagging_loss=0.01115, over 13887.00 frames. ], tot_loss[loss=0.08669, simple_loss=0.1047, pruned_loss=0.02355, audio_tagging_loss=0.01081, over 3046162.84 frames. ], batch size: 54, lr: 7.66e-03, grad_scale: 32.0 2023-11-19 10:17:21,824 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=689573.3333333334, ans=0.125 2023-11-19 10:17:21,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=689573.3333333334, ans=0.0 2023-11-19 10:17:23,984 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=689573.3333333334, ans=0.0 2023-11-19 10:17:38,864 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=689640.0, ans=0.2 2023-11-19 10:17:39,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=689640.0, ans=0.125 2023-11-19 10:17:48,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=689706.6666666666, ans=0.2 2023-11-19 10:17:56,110 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=689773.3333333334, ans=0.1 2023-11-19 10:18:07,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=689840.0, ans=0.0 2023-11-19 10:18:17,660 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7300, loss[loss=0.09194, simple_loss=0.1203, pruned_loss=0.02229, audio_tagging_loss=0.009476, over 15706.00 frames. ], tot_loss[loss=0.08692, simple_loss=0.1054, pruned_loss=0.02357, audio_tagging_loss=0.01062, over 3050174.99 frames. ], batch size: 59, lr: 7.66e-03, grad_scale: 16.0 2023-11-19 10:18:39,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=690040.0, ans=0.125 2023-11-19 10:18:42,971 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.905e+01 8.177e+01 8.479e+01 9.252e+01 1.452e+02, threshold=1.696e+02, percent-clipped=0.0 2023-11-19 10:19:12,533 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7350, loss[loss=0.08318, simple_loss=0.1008, pruned_loss=0.02302, audio_tagging_loss=0.009777, over 14624.00 frames. ], tot_loss[loss=0.08721, simple_loss=0.1059, pruned_loss=0.02382, audio_tagging_loss=0.01046, over 3049062.89 frames. ], batch size: 56, lr: 7.66e-03, grad_scale: 16.0 2023-11-19 10:19:19,700 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=690240.0, ans=0.07 2023-11-19 10:19:21,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=690240.0, ans=0.2 2023-11-19 10:19:27,824 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.45 vs. limit=22.5 2023-11-19 10:19:57,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=690506.6666666666, ans=0.125 2023-11-19 10:20:08,407 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7400, loss[loss=0.06258, simple_loss=0.06592, pruned_loss=0.0147, audio_tagging_loss=0.01492, over 16157.00 frames. ], tot_loss[loss=0.08688, simple_loss=0.1055, pruned_loss=0.02371, audio_tagging_loss=0.0104, over 3043956.40 frames. ], batch size: 63, lr: 7.66e-03, grad_scale: 16.0 2023-11-19 10:20:34,035 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.592e+01 8.284e+01 9.235e+01 1.033e+02 1.364e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 10:20:35,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=690706.6666666666, ans=0.2 2023-11-19 10:20:40,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=690706.6666666666, ans=0.2 2023-11-19 10:20:48,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=690773.3333333334, ans=0.125 2023-11-19 10:21:04,097 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7450, loss[loss=0.07595, simple_loss=0.08569, pruned_loss=0.02206, audio_tagging_loss=0.01105, over 14342.00 frames. ], tot_loss[loss=0.08741, simple_loss=0.1061, pruned_loss=0.02401, audio_tagging_loss=0.01035, over 3041035.76 frames. ], batch size: 56, lr: 7.65e-03, grad_scale: 16.0 2023-11-19 10:21:06,097 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2023-11-19 10:21:11,232 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=690906.6666666666, ans=0.09899494936611666 2023-11-19 10:21:15,360 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=690973.3333333334, ans=0.0 2023-11-19 10:21:30,016 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.whiten, num_groups=1, num_channels=192, metric=3.94 vs. limit=12.0 2023-11-19 10:21:32,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=691040.0, ans=0.0 2023-11-19 10:21:33,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.attention_skip_rate, batch_count=691040.0, ans=0.0 2023-11-19 10:21:35,297 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=691040.0, ans=0.2 2023-11-19 10:21:59,387 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7500, loss[loss=0.06942, simple_loss=0.08669, pruned_loss=0.01444, audio_tagging_loss=0.01164, over 14713.00 frames. ], tot_loss[loss=0.08818, simple_loss=0.1072, pruned_loss=0.02428, audio_tagging_loss=0.0103, over 3039975.37 frames. ], batch size: 56, lr: 7.65e-03, grad_scale: 16.0 2023-11-19 10:22:00,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=691240.0, ans=0.1 2023-11-19 10:22:05,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=691240.0, ans=0.1 2023-11-19 10:22:17,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=691306.6666666666, ans=0.125 2023-11-19 10:22:25,171 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.501e+01 8.537e+01 9.196e+01 9.974e+01 1.563e+02, threshold=1.839e+02, percent-clipped=0.0 2023-11-19 10:22:36,795 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.69 vs. limit=10.0 2023-11-19 10:22:37,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=691440.0, ans=0.125 2023-11-19 10:22:39,934 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.11 vs. limit=22.5 2023-11-19 10:22:54,769 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7550, loss[loss=0.08967, simple_loss=0.1048, pruned_loss=0.02568, audio_tagging_loss=0.01159, over 15252.00 frames. ], tot_loss[loss=0.08787, simple_loss=0.1067, pruned_loss=0.02423, audio_tagging_loss=0.01028, over 3042837.03 frames. ], batch size: 56, lr: 7.65e-03, grad_scale: 16.0 2023-11-19 10:22:57,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=691573.3333333334, ans=0.1 2023-11-19 10:23:28,136 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.92 vs. limit=10.0 2023-11-19 10:23:50,736 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7600, loss[loss=0.09419, simple_loss=0.1095, pruned_loss=0.02791, audio_tagging_loss=0.01154, over 13889.00 frames. ], tot_loss[loss=0.08725, simple_loss=0.1059, pruned_loss=0.02391, audio_tagging_loss=0.01037, over 3040256.42 frames. ], batch size: 53, lr: 7.65e-03, grad_scale: 32.0 2023-11-19 10:23:58,348 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=691906.6666666666, ans=0.125 2023-11-19 10:24:04,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=691973.3333333334, ans=0.125 2023-11-19 10:24:11,653 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.97 vs. limit=22.5 2023-11-19 10:24:16,210 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.911e+01 8.367e+01 9.110e+01 1.007e+02 1.295e+02, threshold=1.822e+02, percent-clipped=0.0 2023-11-19 10:24:16,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.min_abs, batch_count=692040.0, ans=0.5 2023-11-19 10:24:26,958 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.04 vs. limit=15.0 2023-11-19 10:24:40,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=692173.3333333334, ans=0.2 2023-11-19 10:24:44,533 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:24:44,567 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=692173.3333333334, ans=0.125 2023-11-19 10:24:46,402 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7650, loss[loss=0.07561, simple_loss=0.08943, pruned_loss=0.02019, audio_tagging_loss=0.0107, over 15470.00 frames. ], tot_loss[loss=0.08772, simple_loss=0.1063, pruned_loss=0.02418, audio_tagging_loss=0.01039, over 3042235.31 frames. ], batch size: 58, lr: 7.65e-03, grad_scale: 16.0 2023-11-19 10:24:57,826 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=692306.6666666666, ans=0.125 2023-11-19 10:25:42,058 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7700, loss[loss=0.08338, simple_loss=0.09616, pruned_loss=0.02262, audio_tagging_loss=0.01268, over 15539.00 frames. ], tot_loss[loss=0.0873, simple_loss=0.1059, pruned_loss=0.02393, audio_tagging_loss=0.01043, over 3048309.05 frames. ], batch size: 58, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:26:08,253 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.058e+01 8.466e+01 9.076e+01 9.722e+01 1.155e+02, threshold=1.815e+02, percent-clipped=0.0 2023-11-19 10:26:22,367 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=692773.3333333334, ans=0.2 2023-11-19 10:26:36,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=692840.0, ans=0.0 2023-11-19 10:26:37,698 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.69 vs. limit=15.0 2023-11-19 10:26:38,235 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7750, loss[loss=0.09988, simple_loss=0.1261, pruned_loss=0.02818, audio_tagging_loss=0.008673, over 15284.00 frames. ], tot_loss[loss=0.08783, simple_loss=0.1066, pruned_loss=0.02407, audio_tagging_loss=0.01045, over 3043149.44 frames. ], batch size: 56, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:26:38,499 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=692906.6666666666, ans=0.125 2023-11-19 10:26:38,877 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.14 vs. limit=15.0 2023-11-19 10:26:39,534 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=692906.6666666666, ans=0.125 2023-11-19 10:27:03,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.attention_skip_rate, batch_count=693040.0, ans=0.0 2023-11-19 10:27:17,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=693106.6666666666, ans=0.0 2023-11-19 10:27:19,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.max_abs, batch_count=693106.6666666666, ans=10.0 2023-11-19 10:27:24,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=693173.3333333334, ans=0.0 2023-11-19 10:27:31,428 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=693173.3333333334, ans=0.125 2023-11-19 10:27:33,224 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7800, loss[loss=0.07638, simple_loss=0.087, pruned_loss=0.02168, audio_tagging_loss=0.01121, over 14588.00 frames. ], tot_loss[loss=0.08787, simple_loss=0.1066, pruned_loss=0.02409, audio_tagging_loss=0.01047, over 3046319.46 frames. ], batch size: 55, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:27:44,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.scale_min, batch_count=693306.6666666666, ans=0.2 2023-11-19 10:27:48,245 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-104000.pt 2023-11-19 10:28:02,856 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.342e+01 8.622e+01 9.449e+01 1.060e+02 1.939e+02, threshold=1.890e+02, percent-clipped=1.0 2023-11-19 10:28:10,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=693440.0, ans=0.125 2023-11-19 10:28:11,574 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=693440.0, ans=0.0 2023-11-19 10:28:17,850 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=3.437e-02 2023-11-19 10:28:27,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=693506.6666666666, ans=0.2 2023-11-19 10:28:30,892 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.10 vs. limit=15.0 2023-11-19 10:28:31,432 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7850, loss[loss=0.09961, simple_loss=0.1329, pruned_loss=0.02229, audio_tagging_loss=0.01089, over 15411.00 frames. ], tot_loss[loss=0.08812, simple_loss=0.107, pruned_loss=0.02409, audio_tagging_loss=0.01053, over 3042683.56 frames. ], batch size: 57, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:28:32,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=693573.3333333334, ans=0.125 2023-11-19 10:28:44,517 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=693640.0, ans=0.2 2023-11-19 10:29:06,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=693773.3333333334, ans=0.1 2023-11-19 10:29:27,570 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7900, loss[loss=0.0726, simple_loss=0.08628, pruned_loss=0.01699, audio_tagging_loss=0.01246, over 14877.00 frames. ], tot_loss[loss=0.08846, simple_loss=0.1075, pruned_loss=0.02412, audio_tagging_loss=0.01058, over 3039312.69 frames. ], batch size: 56, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:29:33,406 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.51 vs. limit=12.0 2023-11-19 10:29:41,712 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.35 vs. limit=12.0 2023-11-19 10:29:41,903 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.54 vs. limit=15.0 2023-11-19 10:29:53,843 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.703e+01 8.460e+01 9.050e+01 1.000e+02 1.219e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 10:29:58,302 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=694040.0, ans=0.125 2023-11-19 10:29:58,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=694040.0, ans=0.05 2023-11-19 10:30:09,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=694106.6666666666, ans=0.0 2023-11-19 10:30:22,405 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 7950, loss[loss=0.1218, simple_loss=0.1448, pruned_loss=0.03761, audio_tagging_loss=0.01182, over 15733.00 frames. ], tot_loss[loss=0.0883, simple_loss=0.1073, pruned_loss=0.024, audio_tagging_loss=0.01064, over 3046085.67 frames. ], batch size: 57, lr: 7.64e-03, grad_scale: 16.0 2023-11-19 10:30:26,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=694240.0, ans=0.125 2023-11-19 10:30:36,304 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:31:11,467 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=694506.6666666666, ans=0.0 2023-11-19 10:31:11,773 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.71 vs. limit=15.0 2023-11-19 10:31:18,657 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8000, loss[loss=0.0893, simple_loss=0.1143, pruned_loss=0.0212, audio_tagging_loss=0.01093, over 16524.00 frames. ], tot_loss[loss=0.08795, simple_loss=0.1066, pruned_loss=0.02391, audio_tagging_loss=0.01074, over 3045188.93 frames. ], batch size: 62, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:31:24,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=694573.3333333334, ans=0.125 2023-11-19 10:31:37,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=694640.0, ans=0.1 2023-11-19 10:31:39,641 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=9.96 vs. limit=15.0 2023-11-19 10:31:45,476 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.200e+01 8.170e+01 9.028e+01 9.822e+01 2.160e+02, threshold=1.806e+02, percent-clipped=1.0 2023-11-19 10:32:03,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=694840.0, ans=0.2 2023-11-19 10:32:14,650 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8050, loss[loss=0.08079, simple_loss=0.09199, pruned_loss=0.02156, audio_tagging_loss=0.01323, over 14652.00 frames. ], tot_loss[loss=0.0883, simple_loss=0.1071, pruned_loss=0.024, audio_tagging_loss=0.01077, over 3045860.50 frames. ], batch size: 56, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:32:36,865 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.33 vs. limit=15.0 2023-11-19 10:32:37,483 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=695040.0, ans=0.0 2023-11-19 10:32:37,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten.whitening_limit, batch_count=695040.0, ans=15.0 2023-11-19 10:32:46,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=14.38 vs. limit=15.0 2023-11-19 10:32:49,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=695106.6666666666, ans=0.1 2023-11-19 10:32:50,111 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=695106.6666666666, ans=0.125 2023-11-19 10:33:03,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=695173.3333333334, ans=0.0 2023-11-19 10:33:10,031 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8100, loss[loss=0.08071, simple_loss=0.09794, pruned_loss=0.02296, audio_tagging_loss=0.008787, over 15397.00 frames. ], tot_loss[loss=0.08754, simple_loss=0.1062, pruned_loss=0.0238, audio_tagging_loss=0.01063, over 3043571.67 frames. ], batch size: 57, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:33:24,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=695306.6666666666, ans=0.125 2023-11-19 10:33:37,036 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.784e+01 8.319e+01 9.007e+01 9.983e+01 1.168e+02, threshold=1.801e+02, percent-clipped=0.0 2023-11-19 10:34:00,889 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=695506.6666666666, ans=0.1 2023-11-19 10:34:05,410 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8150, loss[loss=0.0764, simple_loss=0.09268, pruned_loss=0.01609, audio_tagging_loss=0.01397, over 14741.00 frames. ], tot_loss[loss=0.08739, simple_loss=0.106, pruned_loss=0.0239, audio_tagging_loss=0.01049, over 3038743.61 frames. ], batch size: 54, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:34:23,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=695640.0, ans=0.1 2023-11-19 10:34:27,741 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.32 vs. limit=12.0 2023-11-19 10:34:30,768 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.13 vs. limit=22.5 2023-11-19 10:34:34,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=695706.6666666666, ans=0.125 2023-11-19 10:34:35,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=695706.6666666666, ans=0.125 2023-11-19 10:34:40,425 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.50 vs. limit=6.0 2023-11-19 10:34:42,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=695773.3333333334, ans=0.1 2023-11-19 10:34:45,066 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=695773.3333333334, ans=0.125 2023-11-19 10:34:53,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=695840.0, ans=0.1 2023-11-19 10:34:54,256 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.46 vs. limit=15.0 2023-11-19 10:35:01,181 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8200, loss[loss=0.07268, simple_loss=0.08942, pruned_loss=0.01889, audio_tagging_loss=0.009081, over 14722.00 frames. ], tot_loss[loss=0.08763, simple_loss=0.1065, pruned_loss=0.02397, audio_tagging_loss=0.01042, over 3038260.58 frames. ], batch size: 55, lr: 7.63e-03, grad_scale: 32.0 2023-11-19 10:35:02,261 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:35:27,078 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.672e+01 8.400e+01 8.844e+01 9.876e+01 1.152e+02, threshold=1.769e+02, percent-clipped=0.0 2023-11-19 10:35:36,291 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=696106.6666666666, ans=0.0 2023-11-19 10:35:43,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=696106.6666666666, ans=0.0 2023-11-19 10:35:51,771 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=6.23 vs. limit=10.0 2023-11-19 10:35:56,650 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8250, loss[loss=0.08816, simple_loss=0.1128, pruned_loss=0.02318, audio_tagging_loss=0.008604, over 14822.00 frames. ], tot_loss[loss=0.08814, simple_loss=0.1074, pruned_loss=0.02425, audio_tagging_loss=0.01019, over 3037562.87 frames. ], batch size: 53, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:36:11,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=696306.6666666666, ans=0.0 2023-11-19 10:36:13,760 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.06 vs. limit=15.0 2023-11-19 10:36:20,564 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=696373.3333333334, ans=0.1 2023-11-19 10:36:36,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=696440.0, ans=0.0 2023-11-19 10:36:52,156 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8300, loss[loss=0.09362, simple_loss=0.1102, pruned_loss=0.02912, audio_tagging_loss=0.009385, over 15399.00 frames. ], tot_loss[loss=0.08811, simple_loss=0.1072, pruned_loss=0.02421, audio_tagging_loss=0.01032, over 3039692.08 frames. ], batch size: 60, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:36:55,618 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=696573.3333333334, ans=0.125 2023-11-19 10:36:58,780 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.min_positive, batch_count=696573.3333333334, ans=0.025 2023-11-19 10:37:01,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=696640.0, ans=0.2 2023-11-19 10:37:04,220 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2023-11-19 10:37:14,646 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=696706.6666666666, ans=0.125 2023-11-19 10:37:18,742 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.294e+01 8.396e+01 9.218e+01 1.018e+02 1.275e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 10:37:20,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=696706.6666666666, ans=0.125 2023-11-19 10:37:34,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=696773.3333333334, ans=0.2 2023-11-19 10:37:44,221 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=9.67 vs. limit=22.5 2023-11-19 10:37:44,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=696840.0, ans=0.125 2023-11-19 10:37:47,186 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8350, loss[loss=0.06913, simple_loss=0.07681, pruned_loss=0.01832, audio_tagging_loss=0.01241, over 15169.00 frames. ], tot_loss[loss=0.08717, simple_loss=0.1063, pruned_loss=0.0237, audio_tagging_loss=0.01031, over 3045502.36 frames. ], batch size: 59, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:38:07,865 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.63 vs. limit=15.0 2023-11-19 10:38:18,761 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.96 vs. limit=6.0 2023-11-19 10:38:28,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=697106.6666666666, ans=0.1 2023-11-19 10:38:37,587 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.47 vs. limit=10.0 2023-11-19 10:38:43,138 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8400, loss[loss=0.07411, simple_loss=0.09624, pruned_loss=0.01748, audio_tagging_loss=0.008514, over 15191.00 frames. ], tot_loss[loss=0.08628, simple_loss=0.105, pruned_loss=0.02346, audio_tagging_loss=0.01032, over 3041550.25 frames. ], batch size: 56, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:38:45,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=697240.0, ans=0.0 2023-11-19 10:39:09,205 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.657e+01 8.184e+01 9.115e+01 9.863e+01 1.459e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-19 10:39:20,362 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.46 vs. limit=15.0 2023-11-19 10:39:24,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=697440.0, ans=0.0 2023-11-19 10:39:34,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=697506.6666666666, ans=10.0 2023-11-19 10:39:37,755 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8450, loss[loss=0.06106, simple_loss=0.07059, pruned_loss=0.01466, audio_tagging_loss=0.0111, over 17505.00 frames. ], tot_loss[loss=0.08667, simple_loss=0.1057, pruned_loss=0.02353, audio_tagging_loss=0.01031, over 3040599.57 frames. ], batch size: 67, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:39:38,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=697573.3333333334, ans=0.125 2023-11-19 10:39:38,282 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.32 vs. limit=10.0 2023-11-19 10:39:44,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=697573.3333333334, ans=0.125 2023-11-19 10:39:57,975 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=697640.0, ans=0.125 2023-11-19 10:40:14,890 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:40:24,125 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.87 vs. limit=10.0 2023-11-19 10:40:26,953 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:40:32,209 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=697906.6666666666, ans=0.0 2023-11-19 10:40:33,450 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8500, loss[loss=0.06499, simple_loss=0.08129, pruned_loss=0.0141, audio_tagging_loss=0.01025, over 14177.00 frames. ], tot_loss[loss=0.08808, simple_loss=0.1074, pruned_loss=0.02405, audio_tagging_loss=0.01032, over 3051424.84 frames. ], batch size: 54, lr: 7.62e-03, grad_scale: 32.0 2023-11-19 10:40:59,992 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.997e+01 8.736e+01 1.015e+02 1.178e+02 2.396e+02, threshold=2.030e+02, percent-clipped=2.0 2023-11-19 10:41:07,103 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.min_positive, batch_count=698106.6666666666, ans=0.025 2023-11-19 10:41:13,640 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.79 vs. limit=15.0 2023-11-19 10:41:21,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=698173.3333333334, ans=0.125 2023-11-19 10:41:29,465 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8550, loss[loss=0.0876, simple_loss=0.1127, pruned_loss=0.02331, audio_tagging_loss=0.007932, over 16291.00 frames. ], tot_loss[loss=0.08757, simple_loss=0.1071, pruned_loss=0.02367, audio_tagging_loss=0.01037, over 3058538.38 frames. ], batch size: 61, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:41:29,998 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.90 vs. limit=15.0 2023-11-19 10:41:31,784 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=698240.0, ans=0.125 2023-11-19 10:41:38,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=698240.0, ans=0.125 2023-11-19 10:42:23,875 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8600, loss[loss=0.09178, simple_loss=0.113, pruned_loss=0.02473, audio_tagging_loss=0.01055, over 15247.00 frames. ], tot_loss[loss=0.08722, simple_loss=0.1063, pruned_loss=0.02353, audio_tagging_loss=0.01054, over 3056729.49 frames. ], batch size: 56, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:42:24,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=698573.3333333334, ans=0.125 2023-11-19 10:42:42,090 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=698640.0, ans=0.1 2023-11-19 10:42:46,226 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=698706.6666666666, ans=0.2 2023-11-19 10:42:46,283 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.980e-01 2023-11-19 10:42:50,808 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.955e+01 8.414e+01 9.085e+01 1.004e+02 1.428e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 10:42:56,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=698773.3333333334, ans=0.0 2023-11-19 10:42:58,527 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=7.29 vs. limit=10.0 2023-11-19 10:43:01,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=698773.3333333334, ans=0.125 2023-11-19 10:43:03,382 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=5.052e-01 2023-11-19 10:43:19,390 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8650, loss[loss=0.08308, simple_loss=0.1057, pruned_loss=0.02055, audio_tagging_loss=0.00966, over 14880.00 frames. ], tot_loss[loss=0.08738, simple_loss=0.1067, pruned_loss=0.02359, audio_tagging_loss=0.01044, over 3050838.20 frames. ], batch size: 56, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:43:25,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=698906.6666666666, ans=0.0 2023-11-19 10:43:25,740 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.94 vs. limit=15.0 2023-11-19 10:44:15,097 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8700, loss[loss=0.08447, simple_loss=0.1045, pruned_loss=0.02065, audio_tagging_loss=0.01159, over 15676.00 frames. ], tot_loss[loss=0.08807, simple_loss=0.1072, pruned_loss=0.02399, audio_tagging_loss=0.01049, over 3053311.38 frames. ], batch size: 59, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:44:23,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=699240.0, ans=0.125 2023-11-19 10:44:24,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=699240.0, ans=0.2 2023-11-19 10:44:37,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=699373.3333333334, ans=0.0 2023-11-19 10:44:40,714 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=699373.3333333334, ans=0.0 2023-11-19 10:44:41,443 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.962e+01 8.409e+01 9.264e+01 1.013e+02 1.808e+02, threshold=1.853e+02, percent-clipped=0.0 2023-11-19 10:45:07,393 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=699506.6666666666, ans=0.125 2023-11-19 10:45:10,503 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8750, loss[loss=0.103, simple_loss=0.132, pruned_loss=0.02744, audio_tagging_loss=0.009547, over 15591.00 frames. ], tot_loss[loss=0.08868, simple_loss=0.1079, pruned_loss=0.02419, audio_tagging_loss=0.01056, over 3048079.19 frames. ], batch size: 59, lr: 7.61e-03, grad_scale: 32.0 2023-11-19 10:45:18,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=699573.3333333334, ans=0.0 2023-11-19 10:45:50,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=699773.3333333334, ans=0.1 2023-11-19 10:46:05,647 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8800, loss[loss=0.09662, simple_loss=0.1146, pruned_loss=0.02301, audio_tagging_loss=0.01633, over 14900.00 frames. ], tot_loss[loss=0.0888, simple_loss=0.1082, pruned_loss=0.02406, audio_tagging_loss=0.01065, over 3047104.75 frames. ], batch size: 55, lr: 7.60e-03, grad_scale: 32.0 2023-11-19 10:46:08,100 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.14 vs. limit=15.0 2023-11-19 10:46:11,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=699906.6666666666, ans=0.0 2023-11-19 10:46:33,878 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.401e+01 8.476e+01 9.103e+01 9.978e+01 1.212e+02, threshold=1.821e+02, percent-clipped=0.0 2023-11-19 10:46:48,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=700106.6666666666, ans=0.0 2023-11-19 10:46:54,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=700173.3333333334, ans=0.125 2023-11-19 10:46:56,133 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=700173.3333333334, ans=0.0 2023-11-19 10:46:57,155 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:47:01,761 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8850, loss[loss=0.11, simple_loss=0.1379, pruned_loss=0.03149, audio_tagging_loss=0.009529, over 15738.00 frames. ], tot_loss[loss=0.0891, simple_loss=0.1082, pruned_loss=0.02431, audio_tagging_loss=0.01066, over 3044943.80 frames. ], batch size: 57, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:47:04,109 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=700240.0, ans=0.0 2023-11-19 10:47:12,254 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:47:36,021 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=700440.0, ans=0.1 2023-11-19 10:47:56,186 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8900, loss[loss=0.08246, simple_loss=0.09372, pruned_loss=0.02381, audio_tagging_loss=0.01179, over 15103.00 frames. ], tot_loss[loss=0.0891, simple_loss=0.1086, pruned_loss=0.02428, audio_tagging_loss=0.01053, over 3053309.79 frames. ], batch size: 58, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:48:02,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=700573.3333333334, ans=0.2 2023-11-19 10:48:14,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=700640.0, ans=0.1 2023-11-19 10:48:20,195 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=700706.6666666666, ans=0.0 2023-11-19 10:48:25,772 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.967e+01 8.384e+01 9.220e+01 1.025e+02 1.340e+02, threshold=1.844e+02, percent-clipped=0.0 2023-11-19 10:48:44,675 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=12.62 vs. limit=22.5 2023-11-19 10:48:49,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=700840.0, ans=0.05 2023-11-19 10:48:52,101 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 8950, loss[loss=0.09904, simple_loss=0.1251, pruned_loss=0.02769, audio_tagging_loss=0.008788, over 14578.00 frames. ], tot_loss[loss=0.08867, simple_loss=0.1083, pruned_loss=0.02414, audio_tagging_loss=0.01035, over 3052197.89 frames. ], batch size: 54, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:49:46,547 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=701240.0, ans=0.0 2023-11-19 10:49:47,836 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9000, loss[loss=0.05426, simple_loss=0.05897, pruned_loss=0.01508, audio_tagging_loss=0.009688, over 15541.00 frames. ], tot_loss[loss=0.08777, simple_loss=0.107, pruned_loss=0.02387, audio_tagging_loss=0.0104, over 3048706.35 frames. ], batch size: 62, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:49:47,838 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 10:50:13,514 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8745, 4.9409, 4.9883, 4.9776], device='cuda:0') 2023-11-19 10:50:20,517 INFO [train_asr.py:1147] (0/4) Epoch 9, validation: loss=0.06655, simple_loss=0.05588, pruned_loss=0.006694, audio_tagging_loss=0.03192, over 4681554.00 frames. 2023-11-19 10:50:20,517 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 10:50:26,296 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=12.0 2023-11-19 10:50:50,123 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.156e+01 8.217e+01 9.187e+01 1.011e+02 1.342e+02, threshold=1.837e+02, percent-clipped=0.0 2023-11-19 10:51:13,415 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=701506.6666666666, ans=0.0 2023-11-19 10:51:16,390 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9050, loss[loss=0.09714, simple_loss=0.1273, pruned_loss=0.02628, audio_tagging_loss=0.007206, over 15381.00 frames. ], tot_loss[loss=0.08802, simple_loss=0.1073, pruned_loss=0.024, audio_tagging_loss=0.01034, over 3055098.87 frames. ], batch size: 55, lr: 7.60e-03, grad_scale: 16.0 2023-11-19 10:51:20,607 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.55 vs. limit=15.0 2023-11-19 10:51:35,657 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=701640.0, ans=0.125 2023-11-19 10:51:47,703 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=701706.6666666666, ans=0.0 2023-11-19 10:51:51,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=701773.3333333334, ans=0.2 2023-11-19 10:52:07,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=701840.0, ans=0.125 2023-11-19 10:52:10,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=701840.0, ans=0.125 2023-11-19 10:52:12,367 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9100, loss[loss=0.1013, simple_loss=0.1211, pruned_loss=0.03019, audio_tagging_loss=0.01051, over 13926.00 frames. ], tot_loss[loss=0.08672, simple_loss=0.1058, pruned_loss=0.02355, audio_tagging_loss=0.01029, over 3052562.82 frames. ], batch size: 56, lr: 7.59e-03, grad_scale: 8.0 2023-11-19 10:52:35,261 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=702040.0, ans=0.125 2023-11-19 10:52:38,129 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.41 vs. limit=15.0 2023-11-19 10:52:41,734 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.195e+01 8.498e+01 9.044e+01 9.975e+01 1.337e+02, threshold=1.809e+02, percent-clipped=0.0 2023-11-19 10:52:56,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=702173.3333333334, ans=0.0 2023-11-19 10:53:01,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=702173.3333333334, ans=0.125 2023-11-19 10:53:07,277 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9150, loss[loss=0.09754, simple_loss=0.1107, pruned_loss=0.03136, audio_tagging_loss=0.01084, over 15448.00 frames. ], tot_loss[loss=0.08668, simple_loss=0.1057, pruned_loss=0.02347, audio_tagging_loss=0.01036, over 3050038.76 frames. ], batch size: 58, lr: 7.59e-03, grad_scale: 8.0 2023-11-19 10:53:44,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=702440.0, ans=0.125 2023-11-19 10:53:54,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=702506.6666666666, ans=0.125 2023-11-19 10:54:02,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=702573.3333333334, ans=0.125 2023-11-19 10:54:02,853 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9200, loss[loss=0.1151, simple_loss=0.1383, pruned_loss=0.03668, audio_tagging_loss=0.009261, over 15409.00 frames. ], tot_loss[loss=0.0872, simple_loss=0.1062, pruned_loss=0.02369, audio_tagging_loss=0.01039, over 3050090.45 frames. ], batch size: 56, lr: 7.59e-03, grad_scale: 16.0 2023-11-19 10:54:19,238 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=6.29 vs. limit=12.0 2023-11-19 10:54:20,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer1.prob, batch_count=702640.0, ans=0.125 2023-11-19 10:54:33,448 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.088e+01 8.309e+01 9.173e+01 1.001e+02 1.258e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 10:54:36,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=702773.3333333334, ans=0.0 2023-11-19 10:54:59,955 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9250, loss[loss=0.08931, simple_loss=0.09534, pruned_loss=0.02741, audio_tagging_loss=0.01423, over 15362.00 frames. ], tot_loss[loss=0.08734, simple_loss=0.1063, pruned_loss=0.0238, audio_tagging_loss=0.01037, over 3051070.03 frames. ], batch size: 57, lr: 7.59e-03, grad_scale: 16.0 2023-11-19 10:55:31,777 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 10:55:31,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=703106.6666666666, ans=0.125 2023-11-19 10:55:42,807 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=703173.3333333334, ans=0.125 2023-11-19 10:55:43,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=703173.3333333334, ans=0.2 2023-11-19 10:55:45,963 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=703173.3333333334, ans=0.025 2023-11-19 10:55:54,235 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9300, loss[loss=0.0933, simple_loss=0.1147, pruned_loss=0.02893, audio_tagging_loss=0.007023, over 14353.00 frames. ], tot_loss[loss=0.08647, simple_loss=0.1054, pruned_loss=0.02336, audio_tagging_loss=0.0104, over 3046405.95 frames. ], batch size: 53, lr: 7.59e-03, grad_scale: 16.0 2023-11-19 10:56:03,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=703240.0, ans=0.125 2023-11-19 10:56:20,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=703373.3333333334, ans=0.125 2023-11-19 10:56:25,077 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.507e+01 9.203e+01 9.999e+01 1.405e+02, threshold=1.841e+02, percent-clipped=0.0 2023-11-19 10:56:32,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=703440.0, ans=0.125 2023-11-19 10:56:38,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=703506.6666666666, ans=0.0 2023-11-19 10:56:41,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=703506.6666666666, ans=0.125 2023-11-19 10:56:50,081 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9350, loss[loss=0.07611, simple_loss=0.08642, pruned_loss=0.01973, audio_tagging_loss=0.01317, over 15433.00 frames. ], tot_loss[loss=0.08649, simple_loss=0.1053, pruned_loss=0.0233, audio_tagging_loss=0.01055, over 3045571.96 frames. ], batch size: 56, lr: 7.59e-03, grad_scale: 16.0 2023-11-19 10:57:00,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=703640.0, ans=0.0 2023-11-19 10:57:09,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=14.68 vs. limit=15.0 2023-11-19 10:57:19,303 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=703706.6666666666, ans=0.2 2023-11-19 10:57:27,260 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=703773.3333333334, ans=0.1 2023-11-19 10:57:45,622 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn1.whiten, num_groups=1, num_channels=192, metric=12.88 vs. limit=22.5 2023-11-19 10:57:46,475 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9400, loss[loss=0.07768, simple_loss=0.09204, pruned_loss=0.01572, audio_tagging_loss=0.01595, over 15360.00 frames. ], tot_loss[loss=0.08676, simple_loss=0.1056, pruned_loss=0.02341, audio_tagging_loss=0.01053, over 3045199.96 frames. ], batch size: 56, lr: 7.58e-03, grad_scale: 16.0 2023-11-19 10:57:59,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=703973.3333333334, ans=0.1 2023-11-19 10:58:02,051 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.01 vs. limit=22.5 2023-11-19 10:58:06,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=703973.3333333334, ans=0.125 2023-11-19 10:58:06,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys.whitening_limit, batch_count=703973.3333333334, ans=6.0 2023-11-19 10:58:10,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=704040.0, ans=0.125 2023-11-19 10:58:13,847 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=704040.0, ans=0.125 2023-11-19 10:58:15,217 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.48 vs. limit=15.0 2023-11-19 10:58:15,737 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.161e+01 8.579e+01 9.388e+01 1.029e+02 1.331e+02, threshold=1.878e+02, percent-clipped=0.0 2023-11-19 10:58:17,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=704106.6666666666, ans=0.015 2023-11-19 10:58:20,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=704106.6666666666, ans=0.125 2023-11-19 10:58:39,702 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/jmSuJWEIizA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 10:58:41,782 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9450, loss[loss=0.07732, simple_loss=0.0971, pruned_loss=0.01846, audio_tagging_loss=0.01031, over 14082.00 frames. ], tot_loss[loss=0.08673, simple_loss=0.1055, pruned_loss=0.02342, audio_tagging_loss=0.01057, over 3044151.21 frames. ], batch size: 53, lr: 7.58e-03, grad_scale: 16.0 2023-11-19 10:58:56,762 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.10 vs. limit=15.0 2023-11-19 10:59:16,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=704440.0, ans=0.125 2023-11-19 10:59:19,500 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=704440.0, ans=0.0 2023-11-19 10:59:23,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=704440.0, ans=0.1 2023-11-19 10:59:36,678 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9500, loss[loss=0.1101, simple_loss=0.1376, pruned_loss=0.03205, audio_tagging_loss=0.009313, over 15457.00 frames. ], tot_loss[loss=0.0872, simple_loss=0.1062, pruned_loss=0.02351, audio_tagging_loss=0.01056, over 3046647.26 frames. ], batch size: 55, lr: 7.58e-03, grad_scale: 16.0 2023-11-19 10:59:37,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=704573.3333333334, ans=0.125 2023-11-19 11:00:05,352 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=17.67 vs. limit=15.0 2023-11-19 11:00:06,713 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.471e+01 8.371e+01 9.001e+01 9.881e+01 1.664e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 11:00:09,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=704773.3333333334, ans=0.0 2023-11-19 11:00:11,668 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=704773.3333333334, ans=0.125 2023-11-19 11:00:22,213 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=704840.0, ans=0.05 2023-11-19 11:00:26,898 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=704840.0, ans=0.0 2023-11-19 11:00:32,116 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9550, loss[loss=0.0837, simple_loss=0.09494, pruned_loss=0.02374, audio_tagging_loss=0.0125, over 15686.00 frames. ], tot_loss[loss=0.0881, simple_loss=0.1073, pruned_loss=0.02377, audio_tagging_loss=0.01069, over 3044351.63 frames. ], batch size: 57, lr: 7.58e-03, grad_scale: 16.0 2023-11-19 11:00:40,821 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer2.prob, batch_count=704906.6666666666, ans=0.125 2023-11-19 11:00:56,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=705040.0, ans=0.1 2023-11-19 11:01:15,678 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=705173.3333333334, ans=0.1 2023-11-19 11:01:21,853 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=705173.3333333334, ans=0.125 2023-11-19 11:01:28,004 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9600, loss[loss=0.09433, simple_loss=0.1164, pruned_loss=0.02461, audio_tagging_loss=0.01153, over 15662.00 frames. ], tot_loss[loss=0.08933, simple_loss=0.1089, pruned_loss=0.02424, audio_tagging_loss=0.01064, over 3047189.63 frames. ], batch size: 56, lr: 7.58e-03, grad_scale: 32.0 2023-11-19 11:01:58,228 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.539e+01 8.373e+01 9.061e+01 9.893e+01 1.304e+02, threshold=1.812e+02, percent-clipped=0.0 2023-11-19 11:02:23,459 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9650, loss[loss=0.1018, simple_loss=0.1315, pruned_loss=0.0276, audio_tagging_loss=0.008453, over 15261.00 frames. ], tot_loss[loss=0.08838, simple_loss=0.1077, pruned_loss=0.0239, audio_tagging_loss=0.01065, over 3040983.77 frames. ], batch size: 54, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:02:41,474 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=705640.0, ans=0.1 2023-11-19 11:02:52,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.min_positive, batch_count=705706.6666666666, ans=0.05 2023-11-19 11:02:59,877 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=705773.3333333334, ans=0.125 2023-11-19 11:03:13,277 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=21.63 vs. limit=22.5 2023-11-19 11:03:18,585 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9700, loss[loss=0.0951, simple_loss=0.1181, pruned_loss=0.02812, audio_tagging_loss=0.007919, over 14677.00 frames. ], tot_loss[loss=0.08813, simple_loss=0.1074, pruned_loss=0.02395, audio_tagging_loss=0.01047, over 3039899.57 frames. ], batch size: 54, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:03:24,671 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:03:26,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=705906.6666666666, ans=0.0 2023-11-19 11:03:39,377 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.46 vs. limit=12.0 2023-11-19 11:03:39,930 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=706040.0, ans=0.0 2023-11-19 11:03:44,747 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=706040.0, ans=0.125 2023-11-19 11:03:48,688 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.927e+01 8.495e+01 9.250e+01 1.013e+02 1.601e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-19 11:04:01,046 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=706106.6666666666, ans=0.0 2023-11-19 11:04:04,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=706173.3333333334, ans=0.0 2023-11-19 11:04:13,980 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9750, loss[loss=0.08491, simple_loss=0.1112, pruned_loss=0.02037, audio_tagging_loss=0.008954, over 15674.00 frames. ], tot_loss[loss=0.08827, simple_loss=0.1079, pruned_loss=0.02405, audio_tagging_loss=0.0103, over 3047903.60 frames. ], batch size: 59, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:04:15,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=706240.0, ans=0.125 2023-11-19 11:04:18,188 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.62 vs. limit=15.0 2023-11-19 11:04:23,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=706240.0, ans=0.125 2023-11-19 11:04:55,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=706440.0, ans=0.125 2023-11-19 11:05:03,665 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=12.78 vs. limit=15.0 2023-11-19 11:05:04,553 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.25 vs. limit=10.0 2023-11-19 11:05:09,870 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9800, loss[loss=0.08584, simple_loss=0.1068, pruned_loss=0.01982, audio_tagging_loss=0.0126, over 15952.00 frames. ], tot_loss[loss=0.08876, simple_loss=0.1083, pruned_loss=0.02428, audio_tagging_loss=0.01034, over 3047936.09 frames. ], batch size: 56, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:05:12,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.73 vs. limit=10.0 2023-11-19 11:05:22,323 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:05:23,625 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=12.68 vs. limit=15.0 2023-11-19 11:05:37,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer1.prob, batch_count=706706.6666666666, ans=0.125 2023-11-19 11:05:39,680 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.021e+01 8.318e+01 9.223e+01 1.040e+02 1.482e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 11:05:41,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=706706.6666666666, ans=0.04949747468305833 2023-11-19 11:05:51,529 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer1.prob, batch_count=706773.3333333334, ans=0.125 2023-11-19 11:05:54,143 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.30 vs. limit=6.0 2023-11-19 11:05:58,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.93 vs. limit=15.0 2023-11-19 11:05:58,940 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:05:59,842 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/Bo4LcZjitzU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:06:05,687 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9850, loss[loss=0.09292, simple_loss=0.11, pruned_loss=0.02705, audio_tagging_loss=0.01086, over 15340.00 frames. ], tot_loss[loss=0.08823, simple_loss=0.1076, pruned_loss=0.02411, audio_tagging_loss=0.0103, over 3046841.47 frames. ], batch size: 57, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:06:05,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=706906.6666666666, ans=0.015 2023-11-19 11:06:16,856 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=706973.3333333334, ans=0.0 2023-11-19 11:06:18,355 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.03 vs. limit=22.5 2023-11-19 11:06:35,899 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=707040.0, ans=0.1 2023-11-19 11:06:56,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=707173.3333333334, ans=0.125 2023-11-19 11:07:01,179 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9900, loss[loss=0.07573, simple_loss=0.1014, pruned_loss=0.01534, audio_tagging_loss=0.009675, over 14485.00 frames. ], tot_loss[loss=0.08784, simple_loss=0.1072, pruned_loss=0.02399, audio_tagging_loss=0.01024, over 3047931.37 frames. ], batch size: 54, lr: 7.57e-03, grad_scale: 32.0 2023-11-19 11:07:18,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=707306.6666666666, ans=0.0 2023-11-19 11:07:26,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=707373.3333333334, ans=0.125 2023-11-19 11:07:31,312 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.468e+01 8.507e+01 9.296e+01 1.006e+02 1.418e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 11:07:32,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=707373.3333333334, ans=0.5 2023-11-19 11:07:45,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=707506.6666666666, ans=0.125 2023-11-19 11:07:52,222 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=707506.6666666666, ans=0.125 2023-11-19 11:07:57,335 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 9950, loss[loss=0.08274, simple_loss=0.1053, pruned_loss=0.02049, audio_tagging_loss=0.00961, over 14682.00 frames. ], tot_loss[loss=0.08724, simple_loss=0.1062, pruned_loss=0.02379, audio_tagging_loss=0.01034, over 3050856.07 frames. ], batch size: 54, lr: 7.56e-03, grad_scale: 32.0 2023-11-19 11:08:00,893 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.14 vs. limit=15.0 2023-11-19 11:08:05,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=707573.3333333334, ans=0.0 2023-11-19 11:08:14,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=707640.0, ans=0.1 2023-11-19 11:08:17,239 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.95 vs. limit=22.5 2023-11-19 11:08:21,764 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:08:26,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=707706.6666666666, ans=0.0 2023-11-19 11:08:36,835 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=707773.3333333334, ans=0.125 2023-11-19 11:08:52,384 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10000, loss[loss=0.1003, simple_loss=0.1178, pruned_loss=0.02891, audio_tagging_loss=0.0125, over 15139.00 frames. ], tot_loss[loss=0.08771, simple_loss=0.1068, pruned_loss=0.02398, audio_tagging_loss=0.01035, over 3044656.17 frames. ], batch size: 57, lr: 7.56e-03, grad_scale: 32.0 2023-11-19 11:09:23,372 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.949e+01 8.309e+01 8.896e+01 1.009e+02 1.315e+02, threshold=1.779e+02, percent-clipped=0.0 2023-11-19 11:09:35,470 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=708106.6666666666, ans=0.125 2023-11-19 11:09:45,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=708173.3333333334, ans=0.035 2023-11-19 11:09:49,101 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10050, loss[loss=0.08008, simple_loss=0.08894, pruned_loss=0.02402, audio_tagging_loss=0.01159, over 15809.00 frames. ], tot_loss[loss=0.08801, simple_loss=0.1069, pruned_loss=0.02413, audio_tagging_loss=0.01045, over 3043296.98 frames. ], batch size: 59, lr: 7.56e-03, grad_scale: 32.0 2023-11-19 11:09:49,253 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=708240.0, ans=0.0 2023-11-19 11:09:52,575 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=708240.0, ans=0.0 2023-11-19 11:09:57,642 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=708240.0, ans=0.125 2023-11-19 11:10:13,001 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.59 vs. limit=15.0 2023-11-19 11:10:19,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=708373.3333333334, ans=0.1 2023-11-19 11:10:44,154 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10100, loss[loss=0.0834, simple_loss=0.09668, pruned_loss=0.02196, audio_tagging_loss=0.0131, over 16065.00 frames. ], tot_loss[loss=0.08813, simple_loss=0.1073, pruned_loss=0.02399, audio_tagging_loss=0.01052, over 3043713.90 frames. ], batch size: 63, lr: 7.56e-03, grad_scale: 32.0 2023-11-19 11:10:59,088 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=708640.0, ans=0.0 2023-11-19 11:11:15,749 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.056e+01 8.397e+01 8.991e+01 1.000e+02 1.217e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 11:11:19,097 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=708773.3333333334, ans=0.0 2023-11-19 11:11:28,401 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.08 vs. limit=12.0 2023-11-19 11:11:29,052 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/_eq1Ry0UZGU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:11:29,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=708840.0, ans=0.2 2023-11-19 11:11:32,556 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=15.38 vs. limit=22.5 2023-11-19 11:11:40,080 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10150, loss[loss=0.09962, simple_loss=0.1225, pruned_loss=0.02783, audio_tagging_loss=0.01054, over 15671.00 frames. ], tot_loss[loss=0.08838, simple_loss=0.1074, pruned_loss=0.02408, audio_tagging_loss=0.01059, over 3040358.83 frames. ], batch size: 57, lr: 7.56e-03, grad_scale: 16.0 2023-11-19 11:11:44,135 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=708906.6666666666, ans=0.125 2023-11-19 11:12:01,822 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.94 vs. limit=6.0 2023-11-19 11:12:05,498 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/cw-21cbk02A_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:12:05,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=709040.0, ans=0.2 2023-11-19 11:12:12,405 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=709106.6666666666, ans=0.0 2023-11-19 11:12:14,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=709106.6666666666, ans=0.125 2023-11-19 11:12:24,677 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=709173.3333333334, ans=0.125 2023-11-19 11:12:26,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=709173.3333333334, ans=0.0 2023-11-19 11:12:34,137 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=709173.3333333334, ans=0.1 2023-11-19 11:12:36,051 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10200, loss[loss=0.08906, simple_loss=0.1113, pruned_loss=0.02258, audio_tagging_loss=0.01084, over 14857.00 frames. ], tot_loss[loss=0.08858, simple_loss=0.1078, pruned_loss=0.02411, audio_tagging_loss=0.01055, over 3041215.44 frames. ], batch size: 55, lr: 7.56e-03, grad_scale: 16.0 2023-11-19 11:12:38,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer2.prob, batch_count=709240.0, ans=0.125 2023-11-19 11:12:48,946 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer1.prob, batch_count=709306.6666666666, ans=0.125 2023-11-19 11:12:56,600 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/hOT6Yokob90_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:12:58,802 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=709373.3333333334, ans=0.125 2023-11-19 11:13:06,566 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.459e+01 8.611e+01 9.302e+01 1.032e+02 1.464e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 11:13:11,525 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=709440.0, ans=0.0 2023-11-19 11:13:11,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.scale_min, batch_count=709440.0, ans=0.2 2023-11-19 11:13:18,565 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=709440.0, ans=0.125 2023-11-19 11:13:21,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=709506.6666666666, ans=0.125 2023-11-19 11:13:30,928 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10250, loss[loss=0.08434, simple_loss=0.1022, pruned_loss=0.02424, audio_tagging_loss=0.008997, over 14280.00 frames. ], tot_loss[loss=0.08799, simple_loss=0.1067, pruned_loss=0.02396, audio_tagging_loss=0.01068, over 3047403.51 frames. ], batch size: 55, lr: 7.55e-03, grad_scale: 16.0 2023-11-19 11:13:38,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=709573.3333333334, ans=0.125 2023-11-19 11:13:42,332 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=709640.0, ans=0.0 2023-11-19 11:14:26,447 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10300, loss[loss=0.08634, simple_loss=0.1093, pruned_loss=0.02098, audio_tagging_loss=0.01071, over 16262.00 frames. ], tot_loss[loss=0.08806, simple_loss=0.107, pruned_loss=0.02391, audio_tagging_loss=0.01062, over 3048340.82 frames. ], batch size: 59, lr: 7.55e-03, grad_scale: 16.0 2023-11-19 11:14:41,033 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=709973.3333333334, ans=0.125 2023-11-19 11:14:45,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=709973.3333333334, ans=0.125 2023-11-19 11:14:50,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=710040.0, ans=0.125 2023-11-19 11:14:50,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=710040.0, ans=0.125 2023-11-19 11:14:58,178 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.799e+01 8.163e+01 8.880e+01 9.962e+01 1.363e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-19 11:15:15,385 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=710173.3333333334, ans=0.125 2023-11-19 11:15:17,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=710173.3333333334, ans=0.125 2023-11-19 11:15:23,132 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10350, loss[loss=0.09061, simple_loss=0.1143, pruned_loss=0.02201, audio_tagging_loss=0.01144, over 16096.00 frames. ], tot_loss[loss=0.08831, simple_loss=0.1073, pruned_loss=0.02394, audio_tagging_loss=0.01075, over 3040838.05 frames. ], batch size: 59, lr: 7.55e-03, grad_scale: 16.0 2023-11-19 11:15:39,636 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.22 vs. limit=12.0 2023-11-19 11:15:48,315 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=710373.3333333334, ans=0.125 2023-11-19 11:15:56,549 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.67 vs. limit=10.0 2023-11-19 11:16:08,996 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=710506.6666666666, ans=0.2 2023-11-19 11:16:09,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=710506.6666666666, ans=0.0 2023-11-19 11:16:12,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=710506.6666666666, ans=0.09899494936611666 2023-11-19 11:16:18,293 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10400, loss[loss=0.08958, simple_loss=0.1091, pruned_loss=0.0263, audio_tagging_loss=0.008739, over 14612.00 frames. ], tot_loss[loss=0.08845, simple_loss=0.1073, pruned_loss=0.02399, audio_tagging_loss=0.0108, over 3034106.46 frames. ], batch size: 55, lr: 7.55e-03, grad_scale: 32.0 2023-11-19 11:16:19,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=710573.3333333334, ans=0.125 2023-11-19 11:16:23,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.whiten.whitening_limit, batch_count=710573.3333333334, ans=12.0 2023-11-19 11:16:25,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=710573.3333333334, ans=0.125 2023-11-19 11:16:26,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=710573.3333333334, ans=0.0 2023-11-19 11:16:28,520 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=710640.0, ans=0.0 2023-11-19 11:16:34,361 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=4.13 vs. limit=10.0 2023-11-19 11:16:38,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=710640.0, ans=0.0 2023-11-19 11:16:50,351 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.272e+01 8.278e+01 9.028e+01 1.000e+02 1.286e+02, threshold=1.806e+02, percent-clipped=0.0 2023-11-19 11:16:50,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=710706.6666666666, ans=0.0 2023-11-19 11:16:51,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=710773.3333333334, ans=0.1 2023-11-19 11:16:55,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=710773.3333333334, ans=0.05 2023-11-19 11:17:01,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=710773.3333333334, ans=0.0 2023-11-19 11:17:02,630 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.70 vs. limit=22.5 2023-11-19 11:17:14,234 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10450, loss[loss=0.08333, simple_loss=0.104, pruned_loss=0.02117, audio_tagging_loss=0.01015, over 14634.00 frames. ], tot_loss[loss=0.08785, simple_loss=0.1068, pruned_loss=0.02362, audio_tagging_loss=0.01082, over 3042686.15 frames. ], batch size: 53, lr: 7.55e-03, grad_scale: 32.0 2023-11-19 11:17:27,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=710973.3333333334, ans=0.125 2023-11-19 11:18:10,453 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10500, loss[loss=0.1105, simple_loss=0.1283, pruned_loss=0.03743, audio_tagging_loss=0.008942, over 15915.00 frames. ], tot_loss[loss=0.08701, simple_loss=0.1058, pruned_loss=0.02343, audio_tagging_loss=0.01067, over 3042747.27 frames. ], batch size: 60, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:18:13,288 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=711240.0, ans=0.125 2023-11-19 11:18:29,068 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=711306.6666666666, ans=0.0 2023-11-19 11:18:36,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=711373.3333333334, ans=0.125 2023-11-19 11:18:37,050 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=711373.3333333334, ans=0.125 2023-11-19 11:18:39,370 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=711373.3333333334, ans=0.125 2023-11-19 11:18:41,179 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.988e+01 8.288e+01 9.011e+01 9.876e+01 1.227e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 11:18:46,235 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=711440.0, ans=0.125 2023-11-19 11:19:02,054 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.skip_rate, batch_count=711506.6666666666, ans=0.035 2023-11-19 11:19:03,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten.whitening_limit, batch_count=711506.6666666666, ans=22.5 2023-11-19 11:19:06,043 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10550, loss[loss=0.07072, simple_loss=0.07935, pruned_loss=0.02133, audio_tagging_loss=0.009715, over 14219.00 frames. ], tot_loss[loss=0.08737, simple_loss=0.1065, pruned_loss=0.02365, audio_tagging_loss=0.01047, over 3040878.61 frames. ], batch size: 54, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:19:37,168 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=711706.6666666666, ans=0.125 2023-11-19 11:19:46,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=711773.3333333334, ans=0.1 2023-11-19 11:20:01,628 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10600, loss[loss=0.06434, simple_loss=0.06719, pruned_loss=0.021, audio_tagging_loss=0.009739, over 14966.00 frames. ], tot_loss[loss=0.08723, simple_loss=0.1067, pruned_loss=0.02363, audio_tagging_loss=0.01027, over 3049669.10 frames. ], batch size: 56, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:20:13,952 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=711973.3333333334, ans=0.1 2023-11-19 11:20:27,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=712040.0, ans=0.0 2023-11-19 11:20:32,533 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.928e+01 8.185e+01 9.113e+01 1.014e+02 1.245e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-19 11:20:51,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=712173.3333333334, ans=0.1 2023-11-19 11:20:56,906 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10650, loss[loss=0.1227, simple_loss=0.1495, pruned_loss=0.04065, audio_tagging_loss=0.007281, over 14558.00 frames. ], tot_loss[loss=0.08767, simple_loss=0.1072, pruned_loss=0.02384, audio_tagging_loss=0.01025, over 3050408.44 frames. ], batch size: 54, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:20:59,334 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=712240.0, ans=0.0 2023-11-19 11:21:10,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=712306.6666666666, ans=0.125 2023-11-19 11:21:16,559 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=712306.6666666666, ans=0.0 2023-11-19 11:21:25,552 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:21:39,774 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=712440.0, ans=0.125 2023-11-19 11:21:45,651 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=712506.6666666666, ans=0.125 2023-11-19 11:21:46,497 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.16 vs. limit=15.0 2023-11-19 11:21:53,347 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10700, loss[loss=0.08128, simple_loss=0.09937, pruned_loss=0.01991, audio_tagging_loss=0.01168, over 16427.00 frames. ], tot_loss[loss=0.08843, simple_loss=0.1081, pruned_loss=0.0242, audio_tagging_loss=0.01021, over 3047351.84 frames. ], batch size: 61, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:21:58,287 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.59 vs. limit=15.0 2023-11-19 11:21:58,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=712573.3333333334, ans=0.0 2023-11-19 11:22:11,012 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=712640.0, ans=0.125 2023-11-19 11:22:24,050 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.779e+01 8.116e+01 8.823e+01 9.508e+01 1.570e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-19 11:22:24,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten.whitening_limit, batch_count=712706.6666666666, ans=15.0 2023-11-19 11:22:28,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.45 vs. limit=22.5 2023-11-19 11:22:49,116 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10750, loss[loss=0.05855, simple_loss=0.06643, pruned_loss=0.01452, audio_tagging_loss=0.01082, over 14560.00 frames. ], tot_loss[loss=0.08779, simple_loss=0.1074, pruned_loss=0.02392, audio_tagging_loss=0.01019, over 3047197.16 frames. ], batch size: 57, lr: 7.54e-03, grad_scale: 32.0 2023-11-19 11:22:54,582 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=712906.6666666666, ans=0.0 2023-11-19 11:22:54,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=712906.6666666666, ans=0.0 2023-11-19 11:23:05,654 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=712973.3333333334, ans=0.1 2023-11-19 11:23:07,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=712973.3333333334, ans=0.125 2023-11-19 11:23:15,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=713040.0, ans=0.0 2023-11-19 11:23:19,647 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=713040.0, ans=0.0 2023-11-19 11:23:42,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=713173.3333333334, ans=0.0 2023-11-19 11:23:44,955 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10800, loss[loss=0.1027, simple_loss=0.1146, pruned_loss=0.0324, audio_tagging_loss=0.013, over 14854.00 frames. ], tot_loss[loss=0.08829, simple_loss=0.1079, pruned_loss=0.02407, audio_tagging_loss=0.01024, over 3050686.69 frames. ], batch size: 56, lr: 7.53e-03, grad_scale: 32.0 2023-11-19 11:24:14,674 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=14.69 vs. limit=15.0 2023-11-19 11:24:17,312 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.693e+01 8.285e+01 8.937e+01 9.621e+01 1.564e+02, threshold=1.787e+02, percent-clipped=0.0 2023-11-19 11:24:40,667 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10850, loss[loss=0.08519, simple_loss=0.1016, pruned_loss=0.02149, audio_tagging_loss=0.0129, over 16114.00 frames. ], tot_loss[loss=0.0881, simple_loss=0.1076, pruned_loss=0.02403, audio_tagging_loss=0.01027, over 3055161.82 frames. ], batch size: 60, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:24:41,894 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.86 vs. limit=15.0 2023-11-19 11:24:46,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=713573.3333333334, ans=0.1 2023-11-19 11:25:04,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward3.hidden_balancer.prob, batch_count=713706.6666666666, ans=0.125 2023-11-19 11:25:15,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=713773.3333333334, ans=0.0 2023-11-19 11:25:28,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys.whitening_limit, batch_count=713840.0, ans=6.0 2023-11-19 11:25:32,926 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XMxq2pgttuY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:25:36,648 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10900, loss[loss=0.07261, simple_loss=0.09082, pruned_loss=0.01763, audio_tagging_loss=0.009569, over 15376.00 frames. ], tot_loss[loss=0.08768, simple_loss=0.1069, pruned_loss=0.02388, audio_tagging_loss=0.01033, over 3058087.42 frames. ], batch size: 58, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:25:38,222 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=18.91 vs. limit=22.5 2023-11-19 11:25:46,826 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.37 vs. limit=15.0 2023-11-19 11:25:58,938 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=714040.0, ans=0.0 2023-11-19 11:25:58,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=714040.0, ans=0.0 2023-11-19 11:25:59,227 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=11.79 vs. limit=15.0 2023-11-19 11:26:08,744 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.026e+01 8.550e+01 9.422e+01 1.018e+02 1.383e+02, threshold=1.884e+02, percent-clipped=0.0 2023-11-19 11:26:11,071 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=714106.6666666666, ans=0.2 2023-11-19 11:26:21,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=714173.3333333334, ans=0.1 2023-11-19 11:26:31,830 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 10950, loss[loss=0.0849, simple_loss=0.1122, pruned_loss=0.01937, audio_tagging_loss=0.009442, over 15756.00 frames. ], tot_loss[loss=0.08745, simple_loss=0.1066, pruned_loss=0.02378, audio_tagging_loss=0.01037, over 3050599.95 frames. ], batch size: 57, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:26:34,250 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=714240.0, ans=0.125 2023-11-19 11:27:27,011 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11000, loss[loss=0.08319, simple_loss=0.1004, pruned_loss=0.02096, audio_tagging_loss=0.01203, over 17197.00 frames. ], tot_loss[loss=0.08827, simple_loss=0.1077, pruned_loss=0.02402, audio_tagging_loss=0.01039, over 3052088.40 frames. ], batch size: 67, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:27:28,452 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.59 vs. limit=15.0 2023-11-19 11:27:36,054 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/h6R5rMXN6pY_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:27:37,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=714640.0, ans=0.125 2023-11-19 11:27:59,343 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.973e+01 8.544e+01 9.124e+01 1.001e+02 1.240e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 11:27:59,684 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=714773.3333333334, ans=0.1 2023-11-19 11:28:03,788 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=714773.3333333334, ans=0.125 2023-11-19 11:28:03,953 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=12.02 vs. limit=15.0 2023-11-19 11:28:14,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=714840.0, ans=0.1 2023-11-19 11:28:22,460 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11050, loss[loss=0.07681, simple_loss=0.09608, pruned_loss=0.0204, audio_tagging_loss=0.008369, over 15345.00 frames. ], tot_loss[loss=0.08884, simple_loss=0.1083, pruned_loss=0.02418, audio_tagging_loss=0.01054, over 3051552.15 frames. ], batch size: 58, lr: 7.53e-03, grad_scale: 16.0 2023-11-19 11:28:44,279 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten.whitening_limit, batch_count=715040.0, ans=22.5 2023-11-19 11:28:49,601 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.prob, batch_count=715040.0, ans=0.125 2023-11-19 11:28:54,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.83 vs. limit=15.0 2023-11-19 11:28:58,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=715106.6666666666, ans=0.0 2023-11-19 11:29:17,832 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11100, loss[loss=0.06653, simple_loss=0.08109, pruned_loss=0.01553, audio_tagging_loss=0.01046, over 15853.00 frames. ], tot_loss[loss=0.08844, simple_loss=0.1073, pruned_loss=0.02412, audio_tagging_loss=0.01067, over 3048273.87 frames. ], batch size: 61, lr: 7.52e-03, grad_scale: 16.0 2023-11-19 11:29:22,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=715240.0, ans=0.125 2023-11-19 11:29:30,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=715306.6666666666, ans=0.125 2023-11-19 11:29:30,579 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=715306.6666666666, ans=0.2 2023-11-19 11:29:47,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=715373.3333333334, ans=0.125 2023-11-19 11:29:47,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff2_skip_rate, batch_count=715373.3333333334, ans=0.0 2023-11-19 11:29:50,619 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.040e+01 8.681e+01 9.190e+01 1.002e+02 1.339e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 11:30:04,041 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=715506.6666666666, ans=0.125 2023-11-19 11:30:08,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=715506.6666666666, ans=0.125 2023-11-19 11:30:13,758 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11150, loss[loss=0.09095, simple_loss=0.1209, pruned_loss=0.02399, audio_tagging_loss=0.006499, over 15422.00 frames. ], tot_loss[loss=0.0885, simple_loss=0.1071, pruned_loss=0.02414, audio_tagging_loss=0.01082, over 3052571.45 frames. ], batch size: 55, lr: 7.52e-03, grad_scale: 16.0 2023-11-19 11:30:30,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward2.hidden_balancer.prob, batch_count=715640.0, ans=0.125 2023-11-19 11:30:32,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=715640.0, ans=0.125 2023-11-19 11:30:38,387 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=715706.6666666666, ans=0.0 2023-11-19 11:30:39,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=715706.6666666666, ans=0.125 2023-11-19 11:30:49,443 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=715773.3333333334, ans=0.1 2023-11-19 11:30:50,779 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.90 vs. limit=6.0 2023-11-19 11:30:51,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=715773.3333333334, ans=0.1 2023-11-19 11:31:08,657 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11200, loss[loss=0.111, simple_loss=0.1328, pruned_loss=0.03358, audio_tagging_loss=0.011, over 15180.00 frames. ], tot_loss[loss=0.08686, simple_loss=0.105, pruned_loss=0.02341, audio_tagging_loss=0.01098, over 3050803.28 frames. ], batch size: 55, lr: 7.52e-03, grad_scale: 32.0 2023-11-19 11:31:19,487 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=715973.3333333334, ans=0.5 2023-11-19 11:31:24,449 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.78 vs. limit=22.5 2023-11-19 11:31:40,816 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=716040.0, ans=0.125 2023-11-19 11:31:41,668 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.895e+01 8.422e+01 9.292e+01 1.018e+02 1.279e+02, threshold=1.858e+02, percent-clipped=0.0 2023-11-19 11:31:43,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.scale_min, batch_count=716106.6666666666, ans=0.2 2023-11-19 11:31:48,521 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.18 vs. limit=15.0 2023-11-19 11:31:56,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_skip_rate, batch_count=716173.3333333334, ans=0.0 2023-11-19 11:32:05,052 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11250, loss[loss=0.08644, simple_loss=0.09961, pruned_loss=0.02409, audio_tagging_loss=0.01255, over 15622.00 frames. ], tot_loss[loss=0.08674, simple_loss=0.1049, pruned_loss=0.02332, audio_tagging_loss=0.01097, over 3050862.36 frames. ], batch size: 58, lr: 7.52e-03, grad_scale: 32.0 2023-11-19 11:32:07,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=716240.0, ans=0.0 2023-11-19 11:32:15,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=716306.6666666666, ans=0.2 2023-11-19 11:32:44,306 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:33:00,950 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11300, loss[loss=0.08724, simple_loss=0.1091, pruned_loss=0.02411, audio_tagging_loss=0.008585, over 15348.00 frames. ], tot_loss[loss=0.08756, simple_loss=0.1063, pruned_loss=0.02363, audio_tagging_loss=0.01076, over 3044710.67 frames. ], batch size: 56, lr: 7.52e-03, grad_scale: 32.0 2023-11-19 11:33:07,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=716573.3333333334, ans=0.125 2023-11-19 11:33:33,222 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.199e+01 8.698e+01 9.365e+01 1.016e+02 1.421e+02, threshold=1.873e+02, percent-clipped=0.0 2023-11-19 11:33:55,809 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11350, loss[loss=0.1137, simple_loss=0.151, pruned_loss=0.03076, audio_tagging_loss=0.007404, over 14918.00 frames. ], tot_loss[loss=0.08766, simple_loss=0.107, pruned_loss=0.02368, audio_tagging_loss=0.01048, over 3042740.47 frames. ], batch size: 55, lr: 7.51e-03, grad_scale: 32.0 2023-11-19 11:34:01,881 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=716906.6666666666, ans=0.07 2023-11-19 11:34:03,336 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.48 vs. limit=6.0 2023-11-19 11:34:16,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=716973.3333333334, ans=0.125 2023-11-19 11:34:19,291 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=10.60 vs. limit=15.0 2023-11-19 11:34:32,026 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=717106.6666666666, ans=0.125 2023-11-19 11:34:40,750 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=23.37 vs. limit=22.5 2023-11-19 11:34:47,390 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=717173.3333333334, ans=0.125 2023-11-19 11:34:51,259 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11400, loss[loss=0.1024, simple_loss=0.1185, pruned_loss=0.02991, audio_tagging_loss=0.01325, over 14729.00 frames. ], tot_loss[loss=0.08736, simple_loss=0.1071, pruned_loss=0.02354, audio_tagging_loss=0.01027, over 3047208.99 frames. ], batch size: 54, lr: 7.51e-03, grad_scale: 32.0 2023-11-19 11:35:06,638 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=717306.6666666666, ans=0.0 2023-11-19 11:35:19,982 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn1.whiten, num_groups=1, num_channels=192, metric=13.47 vs. limit=22.5 2023-11-19 11:35:24,346 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.948e+01 8.257e+01 8.952e+01 9.927e+01 1.340e+02, threshold=1.790e+02, percent-clipped=0.0 2023-11-19 11:35:30,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.55 vs. limit=22.5 2023-11-19 11:35:46,994 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11450, loss[loss=0.07141, simple_loss=0.09239, pruned_loss=0.01581, audio_tagging_loss=0.009408, over 15663.00 frames. ], tot_loss[loss=0.087, simple_loss=0.1064, pruned_loss=0.0236, audio_tagging_loss=0.01019, over 3050703.64 frames. ], batch size: 59, lr: 7.51e-03, grad_scale: 16.0 2023-11-19 11:36:15,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=717706.6666666666, ans=0.0 2023-11-19 11:36:41,945 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11500, loss[loss=0.06382, simple_loss=0.07335, pruned_loss=0.01603, audio_tagging_loss=0.01111, over 14855.00 frames. ], tot_loss[loss=0.08692, simple_loss=0.1063, pruned_loss=0.02356, audio_tagging_loss=0.01021, over 3050700.07 frames. ], batch size: 57, lr: 7.51e-03, grad_scale: 16.0 2023-11-19 11:36:43,363 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=717906.6666666666, ans=0.2 2023-11-19 11:36:58,602 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=717973.3333333334, ans=0.0 2023-11-19 11:37:15,739 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.085e+01 8.484e+01 9.226e+01 9.872e+01 1.474e+02, threshold=1.845e+02, percent-clipped=0.0 2023-11-19 11:37:17,084 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=718106.6666666666, ans=0.2 2023-11-19 11:37:22,267 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=718106.6666666666, ans=0.2 2023-11-19 11:37:27,656 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:37:29,768 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=718173.3333333334, ans=0.2 2023-11-19 11:37:37,507 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11550, loss[loss=0.07805, simple_loss=0.103, pruned_loss=0.01779, audio_tagging_loss=0.008787, over 15098.00 frames. ], tot_loss[loss=0.08735, simple_loss=0.107, pruned_loss=0.02362, audio_tagging_loss=0.01023, over 3051250.42 frames. ], batch size: 57, lr: 7.51e-03, grad_scale: 16.0 2023-11-19 11:37:45,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=718240.0, ans=0.0 2023-11-19 11:37:45,510 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=718240.0, ans=0.1 2023-11-19 11:37:52,311 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=718306.6666666666, ans=0.1 2023-11-19 11:37:54,461 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=718306.6666666666, ans=0.125 2023-11-19 11:37:59,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=718373.3333333334, ans=0.2 2023-11-19 11:38:00,225 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=718373.3333333334, ans=0.125 2023-11-19 11:38:11,670 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/NeYOsnhOi4k_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 11:38:16,811 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=718440.0, ans=0.04949747468305833 2023-11-19 11:38:27,138 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=13.03 vs. limit=15.0 2023-11-19 11:38:32,543 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=12.51 vs. limit=15.0 2023-11-19 11:38:32,677 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.54 vs. limit=22.5 2023-11-19 11:38:33,214 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11600, loss[loss=0.08093, simple_loss=0.1054, pruned_loss=0.0157, audio_tagging_loss=0.01252, over 14642.00 frames. ], tot_loss[loss=0.08791, simple_loss=0.1079, pruned_loss=0.02374, audio_tagging_loss=0.01021, over 3045934.38 frames. ], batch size: 55, lr: 7.51e-03, grad_scale: 32.0 2023-11-19 11:38:51,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=718640.0, ans=0.125 2023-11-19 11:38:55,676 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=718706.6666666666, ans=0.125 2023-11-19 11:39:06,024 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.247e+01 8.169e+01 9.328e+01 1.011e+02 1.273e+02, threshold=1.866e+02, percent-clipped=0.0 2023-11-19 11:39:20,515 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer1.prob, batch_count=718840.0, ans=0.125 2023-11-19 11:39:20,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_positive, batch_count=718840.0, ans=0.05 2023-11-19 11:39:21,649 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=718840.0, ans=0.0 2023-11-19 11:39:28,770 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11650, loss[loss=0.08551, simple_loss=0.1094, pruned_loss=0.02105, audio_tagging_loss=0.009772, over 15328.00 frames. ], tot_loss[loss=0.08747, simple_loss=0.1071, pruned_loss=0.02359, audio_tagging_loss=0.01031, over 3045614.36 frames. ], batch size: 61, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:39:31,731 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=7.97 vs. limit=15.0 2023-11-19 11:39:33,283 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff3_skip_rate, batch_count=718906.6666666666, ans=0.0 2023-11-19 11:39:39,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=718973.3333333334, ans=0.125 2023-11-19 11:39:39,231 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.out_combiner.scale_min, batch_count=718973.3333333334, ans=0.2 2023-11-19 11:39:40,578 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.56 vs. limit=15.0 2023-11-19 11:39:42,709 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.96 vs. limit=15.0 2023-11-19 11:39:46,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=718973.3333333334, ans=0.125 2023-11-19 11:39:52,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=719040.0, ans=0.125 2023-11-19 11:40:21,744 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.13 vs. limit=12.0 2023-11-19 11:40:24,342 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11700, loss[loss=0.06594, simple_loss=0.06896, pruned_loss=0.01928, audio_tagging_loss=0.01218, over 15621.00 frames. ], tot_loss[loss=0.08655, simple_loss=0.1056, pruned_loss=0.02334, audio_tagging_loss=0.0104, over 3051893.05 frames. ], batch size: 62, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:40:25,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=719240.0, ans=0.1 2023-11-19 11:40:57,451 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.764e+01 8.259e+01 8.836e+01 9.705e+01 1.355e+02, threshold=1.767e+02, percent-clipped=0.0 2023-11-19 11:41:13,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.82 vs. limit=15.0 2023-11-19 11:41:19,882 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11750, loss[loss=0.1207, simple_loss=0.1442, pruned_loss=0.04111, audio_tagging_loss=0.007455, over 15247.00 frames. ], tot_loss[loss=0.08679, simple_loss=0.1055, pruned_loss=0.02354, audio_tagging_loss=0.01052, over 3052224.49 frames. ], batch size: 56, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:41:39,745 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 11:41:42,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=719706.6666666666, ans=0.0 2023-11-19 11:41:48,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=719706.6666666666, ans=0.125 2023-11-19 11:41:55,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=719773.3333333334, ans=0.0 2023-11-19 11:42:02,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=719840.0, ans=0.1 2023-11-19 11:42:08,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=719840.0, ans=0.07 2023-11-19 11:42:11,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=719840.0, ans=0.125 2023-11-19 11:42:14,820 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11800, loss[loss=0.07794, simple_loss=0.08604, pruned_loss=0.02211, audio_tagging_loss=0.01282, over 14603.00 frames. ], tot_loss[loss=0.08737, simple_loss=0.1059, pruned_loss=0.02384, audio_tagging_loss=0.01055, over 3051175.75 frames. ], batch size: 56, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:42:29,391 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-108000.pt 2023-11-19 11:42:49,860 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.108e+01 8.669e+01 9.441e+01 1.062e+02 1.406e+02, threshold=1.888e+02, percent-clipped=0.0 2023-11-19 11:42:53,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=720106.6666666666, ans=0.125 2023-11-19 11:43:11,845 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11850, loss[loss=0.08207, simple_loss=0.1055, pruned_loss=0.01957, audio_tagging_loss=0.00974, over 14537.00 frames. ], tot_loss[loss=0.08818, simple_loss=0.1066, pruned_loss=0.02418, audio_tagging_loss=0.01071, over 3046399.75 frames. ], batch size: 53, lr: 7.50e-03, grad_scale: 32.0 2023-11-19 11:43:16,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=720240.0, ans=0.1 2023-11-19 11:43:16,606 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=19.86 vs. limit=22.5 2023-11-19 11:43:25,072 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=720306.6666666666, ans=0.125 2023-11-19 11:43:35,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=720373.3333333334, ans=0.125 2023-11-19 11:43:45,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=720440.0, ans=0.2 2023-11-19 11:43:48,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.prob, batch_count=720440.0, ans=0.125 2023-11-19 11:44:07,205 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11900, loss[loss=0.07567, simple_loss=0.084, pruned_loss=0.01975, audio_tagging_loss=0.01391, over 15931.00 frames. ], tot_loss[loss=0.08787, simple_loss=0.1063, pruned_loss=0.02393, audio_tagging_loss=0.01077, over 3048033.82 frames. ], batch size: 60, lr: 7.50e-03, grad_scale: 16.0 2023-11-19 11:44:13,660 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=720573.3333333334, ans=0.125 2023-11-19 11:44:41,514 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.859e+01 8.394e+01 8.886e+01 9.944e+01 1.266e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-19 11:44:57,210 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=720840.0, ans=0.0 2023-11-19 11:45:03,436 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 11950, loss[loss=0.07937, simple_loss=0.09265, pruned_loss=0.01981, audio_tagging_loss=0.01323, over 14928.00 frames. ], tot_loss[loss=0.08849, simple_loss=0.1071, pruned_loss=0.02414, audio_tagging_loss=0.01078, over 3050474.55 frames. ], batch size: 57, lr: 7.49e-03, grad_scale: 16.0 2023-11-19 11:45:12,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=720906.6666666666, ans=0.0 2023-11-19 11:45:24,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.01 vs. limit=6.0 2023-11-19 11:45:33,000 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=721040.0, ans=0.125 2023-11-19 11:45:37,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=721106.6666666666, ans=0.2 2023-11-19 11:45:41,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=15.0 2023-11-19 11:45:56,787 INFO [train_asr.py:1115] (0/4) Epoch 9, batch 12000, loss[loss=0.07069, simple_loss=0.08806, pruned_loss=0.01813, audio_tagging_loss=0.008536, over 15776.00 frames. ], tot_loss[loss=0.08697, simple_loss=0.105, pruned_loss=0.02356, audio_tagging_loss=0.01089, over 3047318.05 frames. ], batch size: 59, lr: 7.49e-03, grad_scale: 32.0 2023-11-19 11:45:56,789 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 11:46:29,177 INFO [train_asr.py:1147] (0/4) Epoch 9, validation: loss=0.06606, simple_loss=0.05578, pruned_loss=0.006612, audio_tagging_loss=0.03155, over 4681554.00 frames. 2023-11-19 11:46:29,178 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 11:46:36,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=721240.0, ans=0.0 2023-11-19 11:46:37,829 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=12.66 vs. limit=15.0 2023-11-19 11:46:51,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=721373.3333333334, ans=0.125 2023-11-19 11:46:55,438 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/epoch-9.pt 2023-11-19 11:47:32,062 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 0, loss[loss=0.09472, simple_loss=0.08519, pruned_loss=0.02066, audio_tagging_loss=0.03146, over 16387.00 frames. ], tot_loss[loss=0.09472, simple_loss=0.08519, pruned_loss=0.02066, audio_tagging_loss=0.03146, over 16387.00 frames. ], batch size: 64, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:47:32,065 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 11:47:52,982 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.1.encoder.layers.0.self_attn_weights, attn_weights_entropy = tensor([5.3134, 4.9628, 4.7824, 5.1725], device='cuda:0') 2023-11-19 11:48:03,907 INFO [train_asr.py:1147] (0/4) Epoch 10, validation: loss=0.06458, simple_loss=0.05578, pruned_loss=0.006606, audio_tagging_loss=0.03009, over 4681554.00 frames. 2023-11-19 11:48:03,907 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 11:48:08,321 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=721400.0, ans=0.125 2023-11-19 11:48:11,157 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.461e+01 9.125e+01 9.697e+01 1.516e+02, threshold=1.825e+02, percent-clipped=0.0 2023-11-19 11:48:59,745 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 50, loss[loss=0.1166, simple_loss=0.1439, pruned_loss=0.02836, audio_tagging_loss=0.01631, over 15596.00 frames. ], tot_loss[loss=0.09728, simple_loss=0.1062, pruned_loss=0.0239, audio_tagging_loss=0.02029, over 696823.56 frames. ], batch size: 55, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:49:04,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=721733.3333333334, ans=0.1 2023-11-19 11:49:07,748 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=721733.3333333334, ans=0.125 2023-11-19 11:49:07,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=721733.3333333334, ans=0.125 2023-11-19 11:49:23,204 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=721866.6666666666, ans=0.0 2023-11-19 11:49:25,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=721866.6666666666, ans=0.125 2023-11-19 11:49:36,581 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=721933.3333333334, ans=0.2 2023-11-19 11:49:47,151 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=722000.0, ans=0.0 2023-11-19 11:49:52,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=722000.0, ans=0.125 2023-11-19 11:49:55,443 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 100, loss[loss=0.09084, simple_loss=0.101, pruned_loss=0.02227, audio_tagging_loss=0.01806, over 14563.00 frames. ], tot_loss[loss=0.09486, simple_loss=0.1042, pruned_loss=0.02304, audio_tagging_loss=0.01972, over 1212268.57 frames. ], batch size: 54, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:50:03,449 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.378e+01 8.830e+01 9.521e+01 1.052e+02 1.360e+02, threshold=1.904e+02, percent-clipped=0.0 2023-11-19 11:50:04,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=722066.6666666666, ans=0.125 2023-11-19 11:50:22,482 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=722200.0, ans=0.125 2023-11-19 11:50:38,514 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=722266.6666666666, ans=0.0 2023-11-19 11:50:40,168 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=13.75 vs. limit=22.5 2023-11-19 11:50:47,067 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.96 vs. limit=15.0 2023-11-19 11:50:51,920 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 150, loss[loss=0.07996, simple_loss=0.09088, pruned_loss=0.01981, audio_tagging_loss=0.0147, over 15690.00 frames. ], tot_loss[loss=0.09157, simple_loss=0.1023, pruned_loss=0.02267, audio_tagging_loss=0.01775, over 1620229.62 frames. ], batch size: 59, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:51:36,595 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=722666.6666666666, ans=0.125 2023-11-19 11:51:45,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=722666.6666666666, ans=0.0 2023-11-19 11:51:46,123 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=722666.6666666666, ans=0.125 2023-11-19 11:51:47,935 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 200, loss[loss=0.08525, simple_loss=0.1061, pruned_loss=0.0223, audio_tagging_loss=0.009919, over 16233.00 frames. ], tot_loss[loss=0.0905, simple_loss=0.1039, pruned_loss=0.02302, audio_tagging_loss=0.01552, over 1936905.68 frames. ], batch size: 61, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:51:50,382 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=722733.3333333334, ans=0.2 2023-11-19 11:51:56,596 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.422e+01 8.351e+01 9.298e+01 1.028e+02 1.327e+02, threshold=1.860e+02, percent-clipped=0.0 2023-11-19 11:52:01,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=722800.0, ans=0.125 2023-11-19 11:52:04,331 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=722800.0, ans=0.125 2023-11-19 11:52:13,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=722866.6666666666, ans=0.1 2023-11-19 11:52:27,229 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=6.05 vs. limit=15.0 2023-11-19 11:52:32,150 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=723000.0, ans=0.1 2023-11-19 11:52:34,196 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=723000.0, ans=0.2 2023-11-19 11:52:44,664 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 250, loss[loss=0.09268, simple_loss=0.1079, pruned_loss=0.02585, audio_tagging_loss=0.01288, over 16276.00 frames. ], tot_loss[loss=0.08888, simple_loss=0.1039, pruned_loss=0.02284, audio_tagging_loss=0.0141, over 2186541.25 frames. ], batch size: 60, lr: 7.12e-03, grad_scale: 32.0 2023-11-19 11:53:04,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=723133.3333333334, ans=0.125 2023-11-19 11:53:17,776 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=723266.6666666666, ans=0.2 2023-11-19 11:53:17,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=723266.6666666666, ans=0.0 2023-11-19 11:53:26,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=723266.6666666666, ans=0.025 2023-11-19 11:53:40,160 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 300, loss[loss=0.05381, simple_loss=0.0529, pruned_loss=0.01025, audio_tagging_loss=0.01711, over 16226.00 frames. ], tot_loss[loss=0.08935, simple_loss=0.1057, pruned_loss=0.02353, audio_tagging_loss=0.01299, over 2379975.58 frames. ], batch size: 64, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:53:46,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=723400.0, ans=0.125 2023-11-19 11:53:48,085 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.717e+01 8.399e+01 9.137e+01 9.924e+01 1.644e+02, threshold=1.827e+02, percent-clipped=0.0 2023-11-19 11:53:58,235 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=7.75 vs. limit=15.0 2023-11-19 11:54:03,792 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.max_abs, batch_count=723533.3333333334, ans=10.0 2023-11-19 11:54:05,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=723533.3333333334, ans=0.125 2023-11-19 11:54:18,092 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.prob, batch_count=723600.0, ans=0.125 2023-11-19 11:54:32,241 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.74 vs. limit=10.0 2023-11-19 11:54:34,124 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=723666.6666666666, ans=0.1 2023-11-19 11:54:36,133 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 350, loss[loss=0.09825, simple_loss=0.1109, pruned_loss=0.03194, audio_tagging_loss=0.01087, over 14682.00 frames. ], tot_loss[loss=0.08809, simple_loss=0.1048, pruned_loss=0.02328, audio_tagging_loss=0.01243, over 2529442.60 frames. ], batch size: 58, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:54:36,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=723733.3333333334, ans=0.95 2023-11-19 11:54:39,640 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=723733.3333333334, ans=0.0 2023-11-19 11:54:55,480 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=723800.0, ans=0.07 2023-11-19 11:55:23,456 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=512, metric=8.49 vs. limit=15.0 2023-11-19 11:55:27,935 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=724000.0, ans=0.1 2023-11-19 11:55:32,496 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 400, loss[loss=0.07391, simple_loss=0.09263, pruned_loss=0.02006, audio_tagging_loss=0.007538, over 15208.00 frames. ], tot_loss[loss=0.08668, simple_loss=0.1038, pruned_loss=0.02292, audio_tagging_loss=0.01186, over 2649036.26 frames. ], batch size: 56, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:55:33,834 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=724066.6666666666, ans=0.125 2023-11-19 11:55:39,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=724066.6666666666, ans=0.0 2023-11-19 11:55:39,904 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.670e+01 8.520e+01 9.395e+01 1.002e+02 1.359e+02, threshold=1.879e+02, percent-clipped=0.0 2023-11-19 11:55:40,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=724066.6666666666, ans=0.1 2023-11-19 11:55:45,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.attention_skip_rate, batch_count=724133.3333333334, ans=0.0 2023-11-19 11:56:28,207 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 450, loss[loss=0.1021, simple_loss=0.1256, pruned_loss=0.02911, audio_tagging_loss=0.01014, over 16095.00 frames. ], tot_loss[loss=0.08626, simple_loss=0.1037, pruned_loss=0.0229, audio_tagging_loss=0.01151, over 2730843.68 frames. ], batch size: 58, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:56:35,605 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=384, metric=5.00 vs. limit=15.0 2023-11-19 11:56:43,936 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=724466.6666666666, ans=0.0 2023-11-19 11:56:50,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=724533.3333333334, ans=0.0 2023-11-19 11:56:51,888 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module2.balancer2.prob, batch_count=724533.3333333334, ans=0.125 2023-11-19 11:56:58,428 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.37 vs. limit=15.0 2023-11-19 11:57:00,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=724600.0, ans=0.2 2023-11-19 11:57:23,928 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 500, loss[loss=0.08154, simple_loss=0.1046, pruned_loss=0.01999, audio_tagging_loss=0.009259, over 15640.00 frames. ], tot_loss[loss=0.08613, simple_loss=0.1039, pruned_loss=0.02295, audio_tagging_loss=0.01124, over 2811845.76 frames. ], batch size: 59, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:57:25,142 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=724733.3333333334, ans=0.1 2023-11-19 11:57:26,167 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=724733.3333333334, ans=0.125 2023-11-19 11:57:31,352 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.489e+01 8.496e+01 9.010e+01 1.030e+02 1.418e+02, threshold=1.802e+02, percent-clipped=0.0 2023-11-19 11:58:06,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=724933.3333333334, ans=0.125 2023-11-19 11:58:09,799 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.skip_rate, batch_count=725000.0, ans=0.07 2023-11-19 11:58:19,509 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 550, loss[loss=0.103, simple_loss=0.1259, pruned_loss=0.02958, audio_tagging_loss=0.01048, over 15141.00 frames. ], tot_loss[loss=0.08664, simple_loss=0.1049, pruned_loss=0.02316, audio_tagging_loss=0.01106, over 2869411.45 frames. ], batch size: 56, lr: 7.11e-03, grad_scale: 32.0 2023-11-19 11:58:20,894 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=725066.6666666666, ans=0.04949747468305833 2023-11-19 11:59:06,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=725333.3333333334, ans=0.1 2023-11-19 11:59:09,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=725333.3333333334, ans=0.125 2023-11-19 11:59:12,578 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=725333.3333333334, ans=0.125 2023-11-19 11:59:15,992 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 600, loss[loss=0.06484, simple_loss=0.07562, pruned_loss=0.01514, audio_tagging_loss=0.01189, over 14756.00 frames. ], tot_loss[loss=0.08619, simple_loss=0.1045, pruned_loss=0.02296, audio_tagging_loss=0.01095, over 2914915.27 frames. ], batch size: 56, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 11:59:18,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=5.27 vs. limit=15.0 2023-11-19 11:59:21,839 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.95 vs. limit=15.0 2023-11-19 11:59:23,901 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.497e+01 8.301e+01 8.690e+01 9.584e+01 1.385e+02, threshold=1.738e+02, percent-clipped=0.0 2023-11-19 11:59:37,983 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=725533.3333333334, ans=0.0 2023-11-19 11:59:53,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=725600.0, ans=0.07 2023-11-19 12:00:11,798 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 650, loss[loss=0.07144, simple_loss=0.08978, pruned_loss=0.01742, audio_tagging_loss=0.009132, over 15004.00 frames. ], tot_loss[loss=0.08661, simple_loss=0.1052, pruned_loss=0.02318, audio_tagging_loss=0.01084, over 2946544.78 frames. ], batch size: 57, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:00:15,203 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=725733.3333333334, ans=0.125 2023-11-19 12:00:27,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=725800.0, ans=0.2 2023-11-19 12:00:32,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=725866.6666666666, ans=0.04949747468305833 2023-11-19 12:00:56,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=726000.0, ans=0.0 2023-11-19 12:00:58,404 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=726000.0, ans=0.2 2023-11-19 12:01:06,896 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 700, loss[loss=0.1135, simple_loss=0.1458, pruned_loss=0.03362, audio_tagging_loss=0.006949, over 15389.00 frames. ], tot_loss[loss=0.08769, simple_loss=0.1068, pruned_loss=0.02356, audio_tagging_loss=0.01074, over 2972090.39 frames. ], batch size: 55, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:01:14,251 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.015e+01 8.278e+01 8.967e+01 9.847e+01 1.279e+02, threshold=1.793e+02, percent-clipped=0.0 2023-11-19 12:01:24,339 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.22 vs. limit=6.0 2023-11-19 12:01:28,308 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=7.57 vs. limit=15.0 2023-11-19 12:01:43,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.min_positive, batch_count=726266.6666666666, ans=0.05 2023-11-19 12:02:02,264 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 750, loss[loss=0.07169, simple_loss=0.08337, pruned_loss=0.01783, audio_tagging_loss=0.01218, over 14694.00 frames. ], tot_loss[loss=0.08775, simple_loss=0.1067, pruned_loss=0.02367, audio_tagging_loss=0.01071, over 2981103.07 frames. ], batch size: 58, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:02:08,973 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=726400.0, ans=0.04949747468305833 2023-11-19 12:02:17,950 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=726466.6666666666, ans=0.125 2023-11-19 12:02:59,401 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 800, loss[loss=0.08131, simple_loss=0.09614, pruned_loss=0.02081, audio_tagging_loss=0.01243, over 14404.00 frames. ], tot_loss[loss=0.08781, simple_loss=0.1068, pruned_loss=0.02362, audio_tagging_loss=0.01081, over 2997080.60 frames. ], batch size: 55, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:03:00,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=726733.3333333334, ans=0.07 2023-11-19 12:03:06,736 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.310e+01 8.428e+01 9.274e+01 1.007e+02 1.434e+02, threshold=1.855e+02, percent-clipped=0.0 2023-11-19 12:03:10,170 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=726800.0, ans=0.0 2023-11-19 12:03:25,078 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=726866.6666666666, ans=0.125 2023-11-19 12:03:35,931 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=726933.3333333334, ans=0.0 2023-11-19 12:03:35,955 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=726933.3333333334, ans=0.125 2023-11-19 12:03:54,769 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 850, loss[loss=0.08795, simple_loss=0.1033, pruned_loss=0.02513, audio_tagging_loss=0.01116, over 15177.00 frames. ], tot_loss[loss=0.08872, simple_loss=0.1075, pruned_loss=0.02407, audio_tagging_loss=0.01091, over 3007263.05 frames. ], batch size: 58, lr: 7.10e-03, grad_scale: 32.0 2023-11-19 12:03:55,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=727066.6666666666, ans=0.125 2023-11-19 12:04:10,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=727133.3333333334, ans=0.2 2023-11-19 12:04:20,372 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.43 vs. limit=10.0 2023-11-19 12:04:22,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=727200.0, ans=0.1 2023-11-19 12:04:27,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer_na.min_abs, batch_count=727266.6666666666, ans=0.02 2023-11-19 12:04:29,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=727266.6666666666, ans=0.125 2023-11-19 12:04:50,018 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 900, loss[loss=0.09394, simple_loss=0.1208, pruned_loss=0.02342, audio_tagging_loss=0.01014, over 15488.00 frames. ], tot_loss[loss=0.08811, simple_loss=0.1067, pruned_loss=0.02381, audio_tagging_loss=0.01097, over 3017905.72 frames. ], batch size: 56, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:04:54,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=727400.0, ans=0.125 2023-11-19 12:04:57,840 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.263e+01 8.793e+01 9.779e+01 1.235e+02, threshold=1.759e+02, percent-clipped=0.0 2023-11-19 12:05:05,426 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=727466.6666666666, ans=0.125 2023-11-19 12:05:05,493 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=727466.6666666666, ans=0.125 2023-11-19 12:05:05,573 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=727466.6666666666, ans=0.0 2023-11-19 12:05:23,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=727600.0, ans=0.2 2023-11-19 12:05:46,701 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 950, loss[loss=0.08928, simple_loss=0.1114, pruned_loss=0.02175, audio_tagging_loss=0.01182, over 15432.00 frames. ], tot_loss[loss=0.08761, simple_loss=0.1062, pruned_loss=0.02357, audio_tagging_loss=0.01095, over 3017800.71 frames. ], batch size: 58, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:05:54,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.prob, batch_count=727733.3333333334, ans=0.125 2023-11-19 12:05:57,619 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=2.60 vs. limit=15.0 2023-11-19 12:06:30,708 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=728000.0, ans=0.0 2023-11-19 12:06:32,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=728000.0, ans=0.1 2023-11-19 12:06:42,016 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1000, loss[loss=0.1215, simple_loss=0.149, pruned_loss=0.03757, audio_tagging_loss=0.009479, over 15483.00 frames. ], tot_loss[loss=0.08723, simple_loss=0.1059, pruned_loss=0.02353, audio_tagging_loss=0.01075, over 3020248.07 frames. ], batch size: 55, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:06:46,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=728066.6666666666, ans=0.0 2023-11-19 12:06:48,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=728066.6666666666, ans=0.1 2023-11-19 12:06:49,834 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.784e+01 8.258e+01 8.941e+01 9.779e+01 1.255e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-19 12:07:05,171 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5Y6u9AlD9S0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:07:05,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=728200.0, ans=0.0 2023-11-19 12:07:07,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff3_skip_rate, batch_count=728200.0, ans=0.0 2023-11-19 12:07:18,194 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.balancer2.prob, batch_count=728266.6666666666, ans=0.125 2023-11-19 12:07:23,942 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.47 vs. limit=10.0 2023-11-19 12:07:32,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=728333.3333333334, ans=0.2 2023-11-19 12:07:35,769 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=728333.3333333334, ans=0.0 2023-11-19 12:07:37,675 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1050, loss[loss=0.1037, simple_loss=0.1333, pruned_loss=0.02914, audio_tagging_loss=0.007897, over 15537.00 frames. ], tot_loss[loss=0.08659, simple_loss=0.1051, pruned_loss=0.02333, audio_tagging_loss=0.01071, over 3025208.64 frames. ], batch size: 54, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:07:50,048 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.min_positive, batch_count=728466.6666666666, ans=0.025 2023-11-19 12:07:50,423 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.51 vs. limit=10.0 2023-11-19 12:07:54,933 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=728466.6666666666, ans=0.0 2023-11-19 12:08:34,211 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1100, loss[loss=0.09369, simple_loss=0.1133, pruned_loss=0.02618, audio_tagging_loss=0.01088, over 13321.00 frames. ], tot_loss[loss=0.08691, simple_loss=0.1057, pruned_loss=0.02346, audio_tagging_loss=0.01059, over 3028508.38 frames. ], batch size: 52, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:08:36,369 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/AWHnJAqurec_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:08:39,159 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.33 vs. limit=22.5 2023-11-19 12:08:42,199 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.898e+01 8.136e+01 8.991e+01 9.834e+01 1.618e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 12:08:46,884 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=728800.0, ans=0.0 2023-11-19 12:09:26,980 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=8.83 vs. limit=15.0 2023-11-19 12:09:30,430 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1150, loss[loss=0.06179, simple_loss=0.07976, pruned_loss=0.01242, audio_tagging_loss=0.009493, over 15309.00 frames. ], tot_loss[loss=0.08667, simple_loss=0.1056, pruned_loss=0.02338, audio_tagging_loss=0.0105, over 3031836.78 frames. ], batch size: 59, lr: 7.09e-03, grad_scale: 32.0 2023-11-19 12:09:47,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass.skip_rate, batch_count=729133.3333333334, ans=0.04949747468305833 2023-11-19 12:09:56,134 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=729200.0, ans=0.125 2023-11-19 12:10:06,098 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=729266.6666666666, ans=0.09899494936611666 2023-11-19 12:10:12,061 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=15.05 vs. limit=15.0 2023-11-19 12:10:26,220 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1200, loss[loss=0.101, simple_loss=0.1221, pruned_loss=0.03056, audio_tagging_loss=0.009441, over 13733.00 frames. ], tot_loss[loss=0.08592, simple_loss=0.1047, pruned_loss=0.02314, audio_tagging_loss=0.01043, over 3035093.69 frames. ], batch size: 54, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 12:10:29,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=729400.0, ans=0.125 2023-11-19 12:10:30,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer1.prob, batch_count=729400.0, ans=0.125 2023-11-19 12:10:35,185 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.319e+01 8.399e+01 9.041e+01 1.012e+02 1.294e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 12:11:08,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=729600.0, ans=0.125 2023-11-19 12:11:11,733 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=729666.6666666666, ans=0.0 2023-11-19 12:11:14,909 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=729666.6666666666, ans=0.0 2023-11-19 12:11:21,631 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1250, loss[loss=0.06363, simple_loss=0.08266, pruned_loss=0.01235, audio_tagging_loss=0.009956, over 14903.00 frames. ], tot_loss[loss=0.08671, simple_loss=0.106, pruned_loss=0.02335, audio_tagging_loss=0.01035, over 3038127.19 frames. ], batch size: 57, lr: 7.08e-03, grad_scale: 32.0 2023-11-19 12:11:28,739 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=729733.3333333334, ans=0.0 2023-11-19 12:11:28,822 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=729733.3333333334, ans=0.1 2023-11-19 12:11:30,112 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.55 vs. limit=15.0 2023-11-19 12:11:42,997 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.15 vs. limit=10.0 2023-11-19 12:11:44,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=729866.6666666666, ans=0.05 2023-11-19 12:12:11,002 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:12:11,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=730000.0, ans=0.125 2023-11-19 12:12:17,091 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1300, loss[loss=0.082, simple_loss=0.09983, pruned_loss=0.02125, audio_tagging_loss=0.01083, over 15511.00 frames. ], tot_loss[loss=0.08593, simple_loss=0.1052, pruned_loss=0.02302, audio_tagging_loss=0.0103, over 3035527.28 frames. ], batch size: 60, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:12:26,852 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=730066.6666666666, ans=0.2 2023-11-19 12:12:27,632 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.838e+01 8.101e+01 8.789e+01 9.869e+01 1.258e+02, threshold=1.758e+02, percent-clipped=0.0 2023-11-19 12:12:36,893 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.07 vs. limit=6.0 2023-11-19 12:12:38,435 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=730200.0, ans=0.125 2023-11-19 12:12:41,268 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2023-11-19 12:12:45,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=730200.0, ans=0.0 2023-11-19 12:12:50,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=730266.6666666666, ans=0.125 2023-11-19 12:12:51,211 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=730266.6666666666, ans=0.125 2023-11-19 12:13:13,113 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1350, loss[loss=0.08267, simple_loss=0.08996, pruned_loss=0.02509, audio_tagging_loss=0.0126, over 14487.00 frames. ], tot_loss[loss=0.08651, simple_loss=0.1058, pruned_loss=0.02328, audio_tagging_loss=0.01033, over 3036412.60 frames. ], batch size: 56, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:13:41,923 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=730533.3333333334, ans=0.125 2023-11-19 12:13:52,817 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XdmbboqRBmQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:13:56,631 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:14:01,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=730666.6666666666, ans=0.0 2023-11-19 12:14:08,545 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1400, loss[loss=0.1024, simple_loss=0.122, pruned_loss=0.03243, audio_tagging_loss=0.008942, over 14990.00 frames. ], tot_loss[loss=0.08612, simple_loss=0.1049, pruned_loss=0.02326, audio_tagging_loss=0.01043, over 3036387.67 frames. ], batch size: 57, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:14:13,635 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=730733.3333333334, ans=0.1 2023-11-19 12:14:18,525 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.609e+01 8.095e+01 8.801e+01 9.622e+01 1.373e+02, threshold=1.760e+02, percent-clipped=0.0 2023-11-19 12:14:28,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=730800.0, ans=0.125 2023-11-19 12:14:29,907 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=730866.6666666666, ans=0.07 2023-11-19 12:14:38,943 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=730866.6666666666, ans=0.125 2023-11-19 12:14:42,086 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=730933.3333333334, ans=0.125 2023-11-19 12:15:04,108 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1450, loss[loss=0.1011, simple_loss=0.113, pruned_loss=0.03474, audio_tagging_loss=0.009838, over 14604.00 frames. ], tot_loss[loss=0.08708, simple_loss=0.1059, pruned_loss=0.02369, audio_tagging_loss=0.01043, over 3036467.63 frames. ], batch size: 56, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:15:06,486 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=731066.6666666666, ans=0.05 2023-11-19 12:15:16,791 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=731133.3333333334, ans=0.125 2023-11-19 12:15:23,409 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=731133.3333333334, ans=0.125 2023-11-19 12:15:30,441 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.05 vs. limit=15.0 2023-11-19 12:15:41,672 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.28 vs. limit=6.0 2023-11-19 12:15:54,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=731333.3333333334, ans=0.2 2023-11-19 12:16:00,107 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1500, loss[loss=0.08367, simple_loss=0.1007, pruned_loss=0.02224, audio_tagging_loss=0.01106, over 15309.00 frames. ], tot_loss[loss=0.08748, simple_loss=0.1064, pruned_loss=0.02379, audio_tagging_loss=0.01052, over 3032608.98 frames. ], batch size: 57, lr: 7.08e-03, grad_scale: 16.0 2023-11-19 12:16:08,939 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:16:09,649 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.302e+01 8.686e+01 9.376e+01 1.030e+02 1.552e+02, threshold=1.875e+02, percent-clipped=0.0 2023-11-19 12:16:17,781 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=731466.6666666666, ans=0.125 2023-11-19 12:16:23,674 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer1.prob, batch_count=731533.3333333334, ans=0.125 2023-11-19 12:16:37,016 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=731600.0, ans=0.09899494936611666 2023-11-19 12:16:55,767 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1550, loss[loss=0.0764, simple_loss=0.08676, pruned_loss=0.02264, audio_tagging_loss=0.01038, over 14566.00 frames. ], tot_loss[loss=0.08645, simple_loss=0.1047, pruned_loss=0.02334, audio_tagging_loss=0.01075, over 3036175.76 frames. ], batch size: 56, lr: 7.07e-03, grad_scale: 16.0 2023-11-19 12:16:57,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=731733.3333333334, ans=0.125 2023-11-19 12:16:58,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=731733.3333333334, ans=0.0 2023-11-19 12:17:02,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=731733.3333333334, ans=0.2 2023-11-19 12:17:08,644 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=6.53 vs. limit=15.0 2023-11-19 12:17:09,541 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=512, metric=4.01 vs. limit=15.0 2023-11-19 12:17:19,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer1.prob, batch_count=731866.6666666666, ans=0.125 2023-11-19 12:17:25,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=731866.6666666666, ans=0.0 2023-11-19 12:17:34,082 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=11.15 vs. limit=12.0 2023-11-19 12:17:45,561 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=732000.0, ans=0.0 2023-11-19 12:17:51,783 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1600, loss[loss=0.08537, simple_loss=0.1116, pruned_loss=0.01968, audio_tagging_loss=0.009908, over 16875.00 frames. ], tot_loss[loss=0.08555, simple_loss=0.1036, pruned_loss=0.02298, audio_tagging_loss=0.01077, over 3033502.35 frames. ], batch size: 62, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:17:52,099 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=732066.6666666666, ans=0.125 2023-11-19 12:18:01,803 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.642e+01 8.544e+01 9.122e+01 1.002e+02 1.471e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 12:18:03,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=732133.3333333334, ans=0.1 2023-11-19 12:18:16,247 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=732200.0, ans=0.125 2023-11-19 12:18:26,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=732266.6666666666, ans=0.125 2023-11-19 12:18:36,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=732333.3333333334, ans=0.035 2023-11-19 12:18:39,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=732333.3333333334, ans=0.0 2023-11-19 12:18:42,044 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=732333.3333333334, ans=0.1 2023-11-19 12:18:44,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.min_abs, batch_count=732333.3333333334, ans=0.5 2023-11-19 12:18:47,118 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1650, loss[loss=0.08614, simple_loss=0.1074, pruned_loss=0.02269, audio_tagging_loss=0.009772, over 14752.00 frames. ], tot_loss[loss=0.08583, simple_loss=0.1041, pruned_loss=0.02302, audio_tagging_loss=0.01078, over 3039975.71 frames. ], batch size: 53, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:19:08,184 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=4.68 vs. limit=15.0 2023-11-19 12:19:09,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=732533.3333333334, ans=0.0 2023-11-19 12:19:26,190 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=732600.0, ans=0.125 2023-11-19 12:19:27,450 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=732600.0, ans=0.125 2023-11-19 12:19:28,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=732600.0, ans=0.125 2023-11-19 12:19:32,208 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.conv_module2.whiten, num_groups=1, num_channels=192, metric=12.12 vs. limit=15.0 2023-11-19 12:19:42,567 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1700, loss[loss=0.08202, simple_loss=0.09903, pruned_loss=0.02169, audio_tagging_loss=0.01081, over 15554.00 frames. ], tot_loss[loss=0.08494, simple_loss=0.1033, pruned_loss=0.02251, audio_tagging_loss=0.01078, over 3052382.24 frames. ], batch size: 60, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:19:46,386 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=732733.3333333334, ans=0.0 2023-11-19 12:19:53,003 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.860e+01 8.193e+01 8.787e+01 9.627e+01 1.247e+02, threshold=1.757e+02, percent-clipped=0.0 2023-11-19 12:20:01,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=732800.0, ans=0.125 2023-11-19 12:20:09,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=732866.6666666666, ans=0.125 2023-11-19 12:20:10,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=732866.6666666666, ans=0.1 2023-11-19 12:20:23,990 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer1.prob, batch_count=732933.3333333334, ans=0.125 2023-11-19 12:20:25,119 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=732933.3333333334, ans=0.0 2023-11-19 12:20:27,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass_mid.scale_min, batch_count=733000.0, ans=0.2 2023-11-19 12:20:38,899 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1750, loss[loss=0.08882, simple_loss=0.1138, pruned_loss=0.0241, audio_tagging_loss=0.007803, over 15367.00 frames. ], tot_loss[loss=0.08468, simple_loss=0.1031, pruned_loss=0.02245, audio_tagging_loss=0.01066, over 3053539.99 frames. ], batch size: 55, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:20:53,585 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=8.17 vs. limit=15.0 2023-11-19 12:21:21,511 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.max_abs, batch_count=733266.6666666666, ans=10.0 2023-11-19 12:21:29,825 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.self_attn2.whiten, num_groups=1, num_channels=192, metric=11.83 vs. limit=22.5 2023-11-19 12:21:34,745 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1800, loss[loss=0.07091, simple_loss=0.08894, pruned_loss=0.01811, audio_tagging_loss=0.00833, over 14879.00 frames. ], tot_loss[loss=0.08623, simple_loss=0.1052, pruned_loss=0.0231, audio_tagging_loss=0.01053, over 3049162.13 frames. ], batch size: 56, lr: 7.07e-03, grad_scale: 32.0 2023-11-19 12:21:39,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=733400.0, ans=0.125 2023-11-19 12:21:40,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=733400.0, ans=0.0 2023-11-19 12:21:40,546 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=15.42 vs. limit=15.0 2023-11-19 12:21:44,111 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.335e+01 9.001e+01 1.003e+02 1.279e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 12:22:29,661 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1850, loss[loss=0.09438, simple_loss=0.12, pruned_loss=0.02418, audio_tagging_loss=0.01019, over 14437.00 frames. ], tot_loss[loss=0.08511, simple_loss=0.1038, pruned_loss=0.02269, audio_tagging_loss=0.01052, over 3044536.74 frames. ], batch size: 57, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:22:29,879 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=733733.3333333334, ans=0.0 2023-11-19 12:22:38,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=733733.3333333334, ans=0.125 2023-11-19 12:22:58,570 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=733866.6666666666, ans=0.125 2023-11-19 12:23:03,156 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=18.83 vs. limit=22.5 2023-11-19 12:23:21,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=734000.0, ans=0.0 2023-11-19 12:23:22,295 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=734000.0, ans=0.125 2023-11-19 12:23:26,248 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1900, loss[loss=0.09198, simple_loss=0.1091, pruned_loss=0.02754, audio_tagging_loss=0.009912, over 15636.00 frames. ], tot_loss[loss=0.08582, simple_loss=0.1049, pruned_loss=0.02296, audio_tagging_loss=0.01038, over 3044897.53 frames. ], batch size: 59, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:23:35,053 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.5.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:23:36,334 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.533e+01 9.158e+01 9.922e+01 1.269e+02, threshold=1.832e+02, percent-clipped=0.0 2023-11-19 12:23:44,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=734133.3333333334, ans=0.0 2023-11-19 12:23:45,101 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=734133.3333333334, ans=0.125 2023-11-19 12:23:47,043 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=734200.0, ans=0.2 2023-11-19 12:23:52,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=734200.0, ans=0.125 2023-11-19 12:24:00,718 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=734266.6666666666, ans=0.125 2023-11-19 12:24:21,914 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 1950, loss[loss=0.08598, simple_loss=0.1047, pruned_loss=0.02094, audio_tagging_loss=0.0127, over 14331.00 frames. ], tot_loss[loss=0.08498, simple_loss=0.1039, pruned_loss=0.0226, audio_tagging_loss=0.01042, over 3042577.61 frames. ], batch size: 53, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:24:32,772 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=734466.6666666666, ans=0.2 2023-11-19 12:24:43,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward2.hidden_balancer.prob, batch_count=734533.3333333334, ans=0.125 2023-11-19 12:24:50,729 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.65 vs. limit=15.0 2023-11-19 12:24:58,922 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=734600.0, ans=0.1 2023-11-19 12:25:01,648 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff3_skip_rate, batch_count=734600.0, ans=0.0 2023-11-19 12:25:10,245 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=734666.6666666666, ans=0.0 2023-11-19 12:25:13,479 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=734666.6666666666, ans=0.0 2023-11-19 12:25:17,482 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2000, loss[loss=0.09517, simple_loss=0.1074, pruned_loss=0.02964, audio_tagging_loss=0.01182, over 14546.00 frames. ], tot_loss[loss=0.08495, simple_loss=0.1036, pruned_loss=0.02265, audio_tagging_loss=0.01049, over 3034511.37 frames. ], batch size: 54, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:25:18,865 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=734733.3333333334, ans=0.125 2023-11-19 12:25:20,812 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=734733.3333333334, ans=0.2 2023-11-19 12:25:20,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=734733.3333333334, ans=0.0 2023-11-19 12:25:28,544 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.597e+01 8.084e+01 8.826e+01 9.531e+01 1.443e+02, threshold=1.765e+02, percent-clipped=0.0 2023-11-19 12:25:36,588 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=12.40 vs. limit=15.0 2023-11-19 12:25:45,389 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.61 vs. limit=6.0 2023-11-19 12:25:47,199 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.out_combiner.scale_min, batch_count=734866.6666666666, ans=0.2 2023-11-19 12:25:51,465 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer2.prob, batch_count=734933.3333333334, ans=0.125 2023-11-19 12:25:58,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=734933.3333333334, ans=0.125 2023-11-19 12:26:13,999 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2050, loss[loss=0.1212, simple_loss=0.1458, pruned_loss=0.04129, audio_tagging_loss=0.006988, over 15605.00 frames. ], tot_loss[loss=0.08592, simple_loss=0.105, pruned_loss=0.02301, audio_tagging_loss=0.01042, over 3035289.70 frames. ], batch size: 56, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:26:44,176 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=735200.0, ans=0.125 2023-11-19 12:26:46,998 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=735266.6666666666, ans=0.125 2023-11-19 12:26:49,329 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=13.66 vs. limit=15.0 2023-11-19 12:26:57,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=735333.3333333334, ans=0.125 2023-11-19 12:27:03,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer2.prob, batch_count=735333.3333333334, ans=0.125 2023-11-19 12:27:09,239 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2100, loss[loss=0.1152, simple_loss=0.1578, pruned_loss=0.0314, audio_tagging_loss=0.00492, over 14916.00 frames. ], tot_loss[loss=0.08598, simple_loss=0.1051, pruned_loss=0.02307, audio_tagging_loss=0.01035, over 3034997.67 frames. ], batch size: 53, lr: 7.06e-03, grad_scale: 32.0 2023-11-19 12:27:10,575 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:27:16,831 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=735400.0, ans=0.125 2023-11-19 12:27:18,788 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.692e+01 8.614e+01 9.149e+01 1.029e+02 1.234e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 12:27:22,296 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=735466.6666666666, ans=0.125 2023-11-19 12:27:34,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=735533.3333333334, ans=0.125 2023-11-19 12:27:47,108 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=735600.0, ans=0.2 2023-11-19 12:27:53,948 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:27:57,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=735666.6666666666, ans=0.1 2023-11-19 12:27:58,533 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=13.05 vs. limit=15.0 2023-11-19 12:28:04,279 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2150, loss[loss=0.09469, simple_loss=0.1232, pruned_loss=0.02473, audio_tagging_loss=0.008385, over 15661.00 frames. ], tot_loss[loss=0.08631, simple_loss=0.1055, pruned_loss=0.0232, audio_tagging_loss=0.01035, over 3035078.28 frames. ], batch size: 55, lr: 7.05e-03, grad_scale: 32.0 2023-11-19 12:28:12,505 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.scale_min, batch_count=735733.3333333334, ans=0.2 2023-11-19 12:28:18,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=735800.0, ans=0.125 2023-11-19 12:28:34,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=735866.6666666666, ans=0.09899494936611666 2023-11-19 12:28:38,260 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/XkQ8YVd8u38_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:28:51,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=736000.0, ans=0.125 2023-11-19 12:29:00,634 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2200, loss[loss=0.107, simple_loss=0.1317, pruned_loss=0.03275, audio_tagging_loss=0.008403, over 16212.00 frames. ], tot_loss[loss=0.08619, simple_loss=0.1056, pruned_loss=0.02304, audio_tagging_loss=0.01036, over 3042693.46 frames. ], batch size: 56, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 12:29:02,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=736066.6666666666, ans=0.0 2023-11-19 12:29:11,898 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.518e+01 8.663e+01 9.454e+01 1.034e+02 1.518e+02, threshold=1.891e+02, percent-clipped=0.0 2023-11-19 12:29:13,227 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=736133.3333333334, ans=0.125 2023-11-19 12:29:21,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=736133.3333333334, ans=0.125 2023-11-19 12:29:47,610 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=736333.3333333334, ans=0.125 2023-11-19 12:29:49,653 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.41 vs. limit=15.0 2023-11-19 12:29:56,573 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2250, loss[loss=0.07164, simple_loss=0.08945, pruned_loss=0.01647, audio_tagging_loss=0.01044, over 15435.00 frames. ], tot_loss[loss=0.08603, simple_loss=0.1052, pruned_loss=0.02304, audio_tagging_loss=0.01039, over 3038710.58 frames. ], batch size: 59, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 12:30:04,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=736400.0, ans=0.125 2023-11-19 12:30:12,208 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=736466.6666666666, ans=0.2 2023-11-19 12:30:13,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=736466.6666666666, ans=0.125 2023-11-19 12:30:23,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.min_abs, batch_count=736533.3333333334, ans=0.5 2023-11-19 12:30:24,085 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.24 vs. limit=12.0 2023-11-19 12:30:25,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=736533.3333333334, ans=0.0 2023-11-19 12:30:26,862 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=736533.3333333334, ans=0.0 2023-11-19 12:30:33,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=736600.0, ans=0.0 2023-11-19 12:30:44,621 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:30:51,183 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.00 vs. limit=15.0 2023-11-19 12:30:51,735 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2300, loss[loss=0.09635, simple_loss=0.1149, pruned_loss=0.02724, audio_tagging_loss=0.01163, over 14432.00 frames. ], tot_loss[loss=0.08564, simple_loss=0.1044, pruned_loss=0.02301, audio_tagging_loss=0.01045, over 3031625.68 frames. ], batch size: 56, lr: 7.05e-03, grad_scale: 8.0 2023-11-19 12:30:59,309 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:31:03,995 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.615e+01 8.238e+01 9.177e+01 1.028e+02 1.469e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 12:31:12,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.27 vs. limit=15.0 2023-11-19 12:31:16,962 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:31:18,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=736866.6666666666, ans=0.125 2023-11-19 12:31:23,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=736866.6666666666, ans=0.1 2023-11-19 12:31:36,455 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=737000.0, ans=0.0 2023-11-19 12:31:40,610 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/mx9RcUz8sr0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:31:48,024 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2350, loss[loss=0.08103, simple_loss=0.09605, pruned_loss=0.02057, audio_tagging_loss=0.01244, over 14468.00 frames. ], tot_loss[loss=0.08517, simple_loss=0.1033, pruned_loss=0.0229, audio_tagging_loss=0.01059, over 3029497.15 frames. ], batch size: 54, lr: 7.05e-03, grad_scale: 8.0 2023-11-19 12:32:00,346 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=737133.3333333334, ans=0.125 2023-11-19 12:32:03,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=737133.3333333334, ans=0.125 2023-11-19 12:32:12,352 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737200.0, ans=0.1 2023-11-19 12:32:15,549 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:32:33,932 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=737333.3333333334, ans=0.125 2023-11-19 12:32:43,225 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2400, loss[loss=0.09096, simple_loss=0.1186, pruned_loss=0.02385, audio_tagging_loss=0.007798, over 14595.00 frames. ], tot_loss[loss=0.08561, simple_loss=0.1042, pruned_loss=0.02293, audio_tagging_loss=0.01056, over 3032341.28 frames. ], batch size: 53, lr: 7.05e-03, grad_scale: 16.0 2023-11-19 12:32:43,378 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=737400.0, ans=0.0 2023-11-19 12:32:44,348 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=21.55 vs. limit=22.5 2023-11-19 12:32:55,918 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.168e+01 8.415e+01 9.190e+01 1.007e+02 1.395e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 12:32:58,643 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=10.43 vs. limit=15.0 2023-11-19 12:33:18,342 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass_mid.scale_min, batch_count=737600.0, ans=0.2 2023-11-19 12:33:29,652 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=737666.6666666666, ans=0.1 2023-11-19 12:33:38,940 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2450, loss[loss=0.07957, simple_loss=0.08825, pruned_loss=0.02379, audio_tagging_loss=0.01165, over 13481.00 frames. ], tot_loss[loss=0.08598, simple_loss=0.1048, pruned_loss=0.02306, audio_tagging_loss=0.01054, over 3031187.70 frames. ], batch size: 51, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:34:01,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=737866.6666666666, ans=0.125 2023-11-19 12:34:24,114 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=738000.0, ans=0.09899494936611666 2023-11-19 12:34:25,512 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.72 vs. limit=6.0 2023-11-19 12:34:25,666 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.76 vs. limit=15.0 2023-11-19 12:34:33,102 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=738066.6666666666, ans=0.125 2023-11-19 12:34:33,837 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2500, loss[loss=0.0958, simple_loss=0.1127, pruned_loss=0.02814, audio_tagging_loss=0.01132, over 15039.00 frames. ], tot_loss[loss=0.08658, simple_loss=0.1054, pruned_loss=0.02334, audio_tagging_loss=0.01052, over 3038487.33 frames. ], batch size: 56, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:34:43,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=738133.3333333334, ans=0.125 2023-11-19 12:34:45,801 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.081e+01 8.644e+01 9.382e+01 1.016e+02 1.260e+02, threshold=1.876e+02, percent-clipped=0.0 2023-11-19 12:35:04,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=738200.0, ans=0.1 2023-11-19 12:35:05,505 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=23.70 vs. limit=22.5 2023-11-19 12:35:11,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=738266.6666666666, ans=0.0 2023-11-19 12:35:26,609 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=512, metric=9.72 vs. limit=15.0 2023-11-19 12:35:29,256 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2550, loss[loss=0.08184, simple_loss=0.09961, pruned_loss=0.02276, audio_tagging_loss=0.009272, over 15555.00 frames. ], tot_loss[loss=0.08643, simple_loss=0.1055, pruned_loss=0.02323, audio_tagging_loss=0.01045, over 3038408.33 frames. ], batch size: 57, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:35:38,314 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.56 vs. limit=15.0 2023-11-19 12:36:08,961 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=738600.0, ans=0.125 2023-11-19 12:36:20,530 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff3_skip_rate, batch_count=738666.6666666666, ans=0.0 2023-11-19 12:36:26,088 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2600, loss[loss=0.07308, simple_loss=0.08812, pruned_loss=0.01983, audio_tagging_loss=0.009189, over 14480.00 frames. ], tot_loss[loss=0.08508, simple_loss=0.104, pruned_loss=0.02272, audio_tagging_loss=0.01036, over 3033483.34 frames. ], batch size: 56, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:36:33,613 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=738733.3333333334, ans=0.0 2023-11-19 12:36:37,711 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.339e+01 8.428e+01 9.293e+01 1.021e+02 1.415e+02, threshold=1.859e+02, percent-clipped=0.0 2023-11-19 12:36:49,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=738866.6666666666, ans=0.0 2023-11-19 12:36:54,957 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.scale_min, batch_count=738866.6666666666, ans=0.2 2023-11-19 12:37:09,655 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=739000.0, ans=0.1 2023-11-19 12:37:21,571 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2650, loss[loss=0.1086, simple_loss=0.133, pruned_loss=0.0327, audio_tagging_loss=0.009391, over 15971.00 frames. ], tot_loss[loss=0.08455, simple_loss=0.1036, pruned_loss=0.02243, audio_tagging_loss=0.01034, over 3040095.69 frames. ], batch size: 61, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:37:27,599 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=739066.6666666666, ans=0.0 2023-11-19 12:37:35,364 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.93 vs. limit=6.0 2023-11-19 12:37:42,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer2.prob, batch_count=739200.0, ans=0.125 2023-11-19 12:37:48,966 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.72 vs. limit=22.5 2023-11-19 12:38:16,962 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2700, loss[loss=0.09887, simple_loss=0.1271, pruned_loss=0.0269, audio_tagging_loss=0.008412, over 15945.00 frames. ], tot_loss[loss=0.08426, simple_loss=0.1031, pruned_loss=0.02233, audio_tagging_loss=0.01035, over 3046144.70 frames. ], batch size: 59, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:38:18,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=739400.0, ans=0.0 2023-11-19 12:38:28,323 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=739466.6666666666, ans=0.07 2023-11-19 12:38:28,496 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=8.41 vs. limit=15.0 2023-11-19 12:38:29,020 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.043e+01 8.415e+01 9.342e+01 1.060e+02 2.991e+02, threshold=1.868e+02, percent-clipped=1.0 2023-11-19 12:38:38,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer2.prob, batch_count=739533.3333333334, ans=0.125 2023-11-19 12:38:43,584 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=739533.3333333334, ans=0.05 2023-11-19 12:39:12,494 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2750, loss[loss=0.08734, simple_loss=0.1138, pruned_loss=0.02306, audio_tagging_loss=0.007362, over 13819.00 frames. ], tot_loss[loss=0.08487, simple_loss=0.104, pruned_loss=0.02256, audio_tagging_loss=0.0103, over 3046677.26 frames. ], batch size: 54, lr: 7.04e-03, grad_scale: 16.0 2023-11-19 12:39:19,518 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=739733.3333333334, ans=0.125 2023-11-19 12:39:23,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=739800.0, ans=0.125 2023-11-19 12:39:46,475 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.whiten, num_groups=1, num_channels=384, metric=7.71 vs. limit=12.0 2023-11-19 12:39:53,218 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=739933.3333333334, ans=0.0 2023-11-19 12:39:55,632 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=7.47 vs. limit=15.0 2023-11-19 12:39:58,592 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=740000.0, ans=0.0 2023-11-19 12:39:59,485 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/IMdT8_tuNp0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:40:04,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=740000.0, ans=0.1 2023-11-19 12:40:08,440 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2800, loss[loss=0.1087, simple_loss=0.1394, pruned_loss=0.0293, audio_tagging_loss=0.009676, over 15627.00 frames. ], tot_loss[loss=0.08451, simple_loss=0.1033, pruned_loss=0.02257, audio_tagging_loss=0.01029, over 3045938.33 frames. ], batch size: 56, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:40:19,353 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=740133.3333333334, ans=0.1 2023-11-19 12:40:20,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=740133.3333333334, ans=0.0 2023-11-19 12:40:21,269 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.890e+01 8.368e+01 8.840e+01 9.465e+01 1.289e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-19 12:40:29,361 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=740133.3333333334, ans=0.0 2023-11-19 12:40:58,620 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=740333.3333333334, ans=0.125 2023-11-19 12:41:00,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=740333.3333333334, ans=0.125 2023-11-19 12:41:04,777 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2850, loss[loss=0.1211, simple_loss=0.1507, pruned_loss=0.03754, audio_tagging_loss=0.008148, over 15128.00 frames. ], tot_loss[loss=0.08572, simple_loss=0.105, pruned_loss=0.02306, audio_tagging_loss=0.01016, over 3044320.28 frames. ], batch size: 55, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:41:05,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=740400.0, ans=0.0 2023-11-19 12:41:11,841 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=740400.0, ans=0.0 2023-11-19 12:41:37,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=740600.0, ans=0.125 2023-11-19 12:41:42,217 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=740600.0, ans=0.1 2023-11-19 12:42:00,349 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2900, loss[loss=0.07534, simple_loss=0.08635, pruned_loss=0.01753, audio_tagging_loss=0.01463, over 16233.00 frames. ], tot_loss[loss=0.08591, simple_loss=0.105, pruned_loss=0.02314, audio_tagging_loss=0.01026, over 3044155.81 frames. ], batch size: 64, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:42:00,577 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=740733.3333333334, ans=0.0 2023-11-19 12:42:12,379 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.633e+01 9.349e+01 9.901e+01 1.327e+02, threshold=1.870e+02, percent-clipped=0.0 2023-11-19 12:42:14,905 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.conv_module2.whiten, num_groups=1, num_channels=256, metric=7.51 vs. limit=15.0 2023-11-19 12:42:25,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=740866.6666666666, ans=0.1 2023-11-19 12:42:36,661 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.42 vs. limit=15.0 2023-11-19 12:42:55,860 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 2950, loss[loss=0.08107, simple_loss=0.09905, pruned_loss=0.02316, audio_tagging_loss=0.008382, over 14702.00 frames. ], tot_loss[loss=0.08684, simple_loss=0.1063, pruned_loss=0.0234, audio_tagging_loss=0.0103, over 3040435.40 frames. ], batch size: 55, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:43:05,926 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.22 vs. limit=22.5 2023-11-19 12:43:22,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=741200.0, ans=0.0 2023-11-19 12:43:27,642 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.79 vs. limit=15.0 2023-11-19 12:43:32,590 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=741266.6666666666, ans=0.2 2023-11-19 12:43:34,798 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=741266.6666666666, ans=0.125 2023-11-19 12:43:40,022 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.05 vs. limit=15.0 2023-11-19 12:43:52,260 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3000, loss[loss=0.1064, simple_loss=0.1354, pruned_loss=0.03062, audio_tagging_loss=0.008035, over 15821.00 frames. ], tot_loss[loss=0.08652, simple_loss=0.1057, pruned_loss=0.02327, audio_tagging_loss=0.01038, over 3042515.26 frames. ], batch size: 57, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:43:52,268 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 12:44:11,204 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.3.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([0.9106, 3.2308, 2.5652, 2.8548, 3.4997, 3.5666, 2.9810, 3.5967], device='cuda:0') 2023-11-19 12:44:24,102 INFO [train_asr.py:1147] (0/4) Epoch 10, validation: loss=0.06403, simple_loss=0.05543, pruned_loss=0.006395, audio_tagging_loss=0.02992, over 4681554.00 frames. 2023-11-19 12:44:24,102 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 12:44:25,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=741400.0, ans=0.2 2023-11-19 12:44:26,423 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=741400.0, ans=0.125 2023-11-19 12:44:35,740 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.397e+01 8.400e+01 9.242e+01 1.017e+02 1.416e+02, threshold=1.848e+02, percent-clipped=0.0 2023-11-19 12:45:09,539 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.4.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:45:19,201 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3050, loss[loss=0.09124, simple_loss=0.1125, pruned_loss=0.02631, audio_tagging_loss=0.008656, over 15413.00 frames. ], tot_loss[loss=0.08638, simple_loss=0.1056, pruned_loss=0.02315, audio_tagging_loss=0.01045, over 3038427.65 frames. ], batch size: 58, lr: 7.03e-03, grad_scale: 32.0 2023-11-19 12:45:26,850 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff3_skip_rate, batch_count=741733.3333333334, ans=0.0 2023-11-19 12:45:33,252 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=741800.0, ans=0.0 2023-11-19 12:45:38,495 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.attention_skip_rate, batch_count=741800.0, ans=0.0 2023-11-19 12:45:51,384 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/h0neUGB6j_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:45:55,937 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=741933.3333333334, ans=0.125 2023-11-19 12:46:02,753 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.ff3_skip_rate, batch_count=742000.0, ans=0.0 2023-11-19 12:46:06,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff3_skip_rate, batch_count=742000.0, ans=0.0 2023-11-19 12:46:08,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=742000.0, ans=0.2 2023-11-19 12:46:14,660 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3100, loss[loss=0.08607, simple_loss=0.09422, pruned_loss=0.0263, audio_tagging_loss=0.01266, over 13950.00 frames. ], tot_loss[loss=0.08688, simple_loss=0.1059, pruned_loss=0.02335, audio_tagging_loss=0.01059, over 3036351.75 frames. ], batch size: 55, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 12:46:19,329 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_positive, batch_count=742066.6666666666, ans=0.05 2023-11-19 12:46:26,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=742133.3333333334, ans=0.125 2023-11-19 12:46:26,934 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.581e+01 8.703e+01 9.646e+01 1.063e+02 1.410e+02, threshold=1.929e+02, percent-clipped=0.0 2023-11-19 12:46:43,553 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=742200.0, ans=0.09899494936611666 2023-11-19 12:47:07,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=742333.3333333334, ans=0.125 2023-11-19 12:47:10,463 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3150, loss[loss=0.07694, simple_loss=0.08686, pruned_loss=0.02091, audio_tagging_loss=0.01261, over 15051.00 frames. ], tot_loss[loss=0.08676, simple_loss=0.1059, pruned_loss=0.02322, audio_tagging_loss=0.01061, over 3029588.98 frames. ], batch size: 57, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 12:47:13,860 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=742400.0, ans=0.0 2023-11-19 12:47:14,981 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.out_proj.dropout_p, batch_count=742400.0, ans=0.1 2023-11-19 12:47:18,024 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=742400.0, ans=0.1 2023-11-19 12:47:48,910 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=742600.0, ans=0.125 2023-11-19 12:47:54,756 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.39 vs. limit=22.5 2023-11-19 12:48:06,058 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3200, loss[loss=0.08445, simple_loss=0.1037, pruned_loss=0.02213, audio_tagging_loss=0.01048, over 14390.00 frames. ], tot_loss[loss=0.08699, simple_loss=0.1062, pruned_loss=0.02322, audio_tagging_loss=0.01066, over 3034335.16 frames. ], batch size: 56, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 12:48:09,270 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module2.whiten, num_groups=1, num_channels=512, metric=6.85 vs. limit=15.0 2023-11-19 12:48:12,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=742733.3333333334, ans=0.0 2023-11-19 12:48:14,875 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=742733.3333333334, ans=0.2 2023-11-19 12:48:20,449 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.020e+01 8.457e+01 8.995e+01 9.869e+01 1.157e+02, threshold=1.799e+02, percent-clipped=0.0 2023-11-19 12:48:49,665 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=743000.0, ans=0.2 2023-11-19 12:49:02,760 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3250, loss[loss=0.06295, simple_loss=0.0745, pruned_loss=0.01501, audio_tagging_loss=0.0107, over 15863.00 frames. ], tot_loss[loss=0.08696, simple_loss=0.1062, pruned_loss=0.0231, audio_tagging_loss=0.01078, over 3046784.44 frames. ], batch size: 61, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 12:49:05,229 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer2.prob, batch_count=743066.6666666666, ans=0.125 2023-11-19 12:49:10,471 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=743066.6666666666, ans=0.125 2023-11-19 12:49:28,624 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 12:49:46,342 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.20 vs. limit=22.5 2023-11-19 12:49:50,890 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.self_attn2.whiten, num_groups=1, num_channels=192, metric=10.73 vs. limit=22.5 2023-11-19 12:49:54,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.nonlin_attention.balancer.max_positive, batch_count=743333.3333333334, ans=0.95 2023-11-19 12:49:57,202 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=743400.0, ans=0.1 2023-11-19 12:49:57,962 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3300, loss[loss=0.07246, simple_loss=0.09243, pruned_loss=0.01468, audio_tagging_loss=0.01157, over 14802.00 frames. ], tot_loss[loss=0.08575, simple_loss=0.1046, pruned_loss=0.02258, audio_tagging_loss=0.01086, over 3050895.84 frames. ], batch size: 55, lr: 7.02e-03, grad_scale: 32.0 2023-11-19 12:50:01,656 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys, num_groups=8, num_channels=256, metric=5.57 vs. limit=6.0 2023-11-19 12:50:10,522 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.824e+01 8.235e+01 8.992e+01 9.663e+01 1.572e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 12:50:10,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.min_positive, batch_count=743466.6666666666, ans=0.025 2023-11-19 12:50:12,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=743466.6666666666, ans=0.0 2023-11-19 12:50:18,403 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=3.07 vs. limit=6.0 2023-11-19 12:50:32,496 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=743600.0, ans=0.0 2023-11-19 12:50:34,628 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=743600.0, ans=0.125 2023-11-19 12:50:41,515 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.65 vs. limit=22.5 2023-11-19 12:50:52,010 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=743733.3333333334, ans=0.0 2023-11-19 12:50:52,827 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3350, loss[loss=0.08355, simple_loss=0.09994, pruned_loss=0.02198, audio_tagging_loss=0.0116, over 13961.00 frames. ], tot_loss[loss=0.0857, simple_loss=0.1044, pruned_loss=0.02266, audio_tagging_loss=0.01085, over 3048961.09 frames. ], batch size: 53, lr: 7.02e-03, grad_scale: 16.0 2023-11-19 12:51:06,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=743800.0, ans=0.1 2023-11-19 12:51:10,478 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=743800.0, ans=0.125 2023-11-19 12:51:28,995 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=743933.3333333334, ans=0.125 2023-11-19 12:51:31,023 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=743933.3333333334, ans=0.125 2023-11-19 12:51:37,593 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=744000.0, ans=0.125 2023-11-19 12:51:40,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=744000.0, ans=0.0 2023-11-19 12:51:43,836 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=744000.0, ans=0.0 2023-11-19 12:51:47,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=744000.0, ans=0.125 2023-11-19 12:51:49,086 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3400, loss[loss=0.08675, simple_loss=0.09819, pruned_loss=0.02733, audio_tagging_loss=0.01033, over 13188.00 frames. ], tot_loss[loss=0.08633, simple_loss=0.1054, pruned_loss=0.02303, audio_tagging_loss=0.01061, over 3051162.85 frames. ], batch size: 50, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 12:51:51,545 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=744066.6666666666, ans=0.0 2023-11-19 12:52:04,019 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.252e+01 8.485e+01 9.102e+01 1.010e+02 1.792e+02, threshold=1.820e+02, percent-clipped=0.0 2023-11-19 12:52:10,691 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=744200.0, ans=0.125 2023-11-19 12:52:32,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.attention_skip_rate, batch_count=744333.3333333334, ans=0.0 2023-11-19 12:52:45,413 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3450, loss[loss=0.06244, simple_loss=0.07273, pruned_loss=0.01418, audio_tagging_loss=0.0119, over 15367.00 frames. ], tot_loss[loss=0.08629, simple_loss=0.1055, pruned_loss=0.02305, audio_tagging_loss=0.01048, over 3052462.31 frames. ], batch size: 58, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 12:52:50,137 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.whiten, num_groups=1, num_channels=512, metric=10.73 vs. limit=12.0 2023-11-19 12:52:50,846 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=744400.0, ans=0.0 2023-11-19 12:53:02,830 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.72 vs. limit=22.5 2023-11-19 12:53:07,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.min_positive, batch_count=744533.3333333334, ans=0.025 2023-11-19 12:53:38,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=744666.6666666666, ans=0.1 2023-11-19 12:53:40,480 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3500, loss[loss=0.1169, simple_loss=0.1471, pruned_loss=0.03805, audio_tagging_loss=0.005247, over 16389.00 frames. ], tot_loss[loss=0.0861, simple_loss=0.1053, pruned_loss=0.02308, audio_tagging_loss=0.01038, over 3048591.58 frames. ], batch size: 60, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 12:53:44,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.ff2_skip_rate, batch_count=744733.3333333334, ans=0.0 2023-11-19 12:53:52,388 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=744800.0, ans=0.0 2023-11-19 12:53:55,412 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.974e+01 8.325e+01 9.039e+01 9.791e+01 1.416e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 12:54:04,921 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.51 vs. limit=15.0 2023-11-19 12:54:08,665 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/DdDpuDqOyrA_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:54:10,704 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.54 vs. limit=12.0 2023-11-19 12:54:20,267 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.30 vs. limit=15.0 2023-11-19 12:54:23,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=744933.3333333334, ans=0.0 2023-11-19 12:54:36,802 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3550, loss[loss=0.05516, simple_loss=0.05669, pruned_loss=0.01387, audio_tagging_loss=0.01294, over 14303.00 frames. ], tot_loss[loss=0.08532, simple_loss=0.1042, pruned_loss=0.02284, audio_tagging_loss=0.01037, over 3042214.27 frames. ], batch size: 55, lr: 7.01e-03, grad_scale: 16.0 2023-11-19 12:54:43,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=745066.6666666666, ans=0.125 2023-11-19 12:54:44,473 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff3_skip_rate, batch_count=745066.6666666666, ans=0.0 2023-11-19 12:54:45,408 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass_mid.scale_min, batch_count=745066.6666666666, ans=0.2 2023-11-19 12:54:45,523 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=745066.6666666666, ans=0.125 2023-11-19 12:54:51,160 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=745133.3333333334, ans=0.2 2023-11-19 12:54:54,785 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_na.min_abs, batch_count=745133.3333333334, ans=0.02 2023-11-19 12:55:09,688 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=745266.6666666666, ans=0.0 2023-11-19 12:55:12,491 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.whiten.whitening_limit, batch_count=745266.6666666666, ans=12.0 2023-11-19 12:55:13,869 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=745266.6666666666, ans=0.0 2023-11-19 12:55:21,643 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=745333.3333333334, ans=0.0 2023-11-19 12:55:24,976 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.conv_module1.whiten, num_groups=1, num_channels=384, metric=2.40 vs. limit=15.0 2023-11-19 12:55:25,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=745333.3333333334, ans=0.07 2023-11-19 12:55:27,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=745333.3333333334, ans=0.125 2023-11-19 12:55:31,988 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3600, loss[loss=0.06748, simple_loss=0.07391, pruned_loss=0.01921, audio_tagging_loss=0.01131, over 15367.00 frames. ], tot_loss[loss=0.08478, simple_loss=0.1036, pruned_loss=0.02256, audio_tagging_loss=0.01039, over 3037934.68 frames. ], batch size: 61, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 12:55:36,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=745400.0, ans=0.0 2023-11-19 12:55:41,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=745400.0, ans=0.125 2023-11-19 12:55:46,772 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.231e+01 8.451e+01 8.878e+01 9.732e+01 1.759e+02, threshold=1.776e+02, percent-clipped=0.0 2023-11-19 12:55:47,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=745466.6666666666, ans=0.125 2023-11-19 12:55:56,121 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=11.34 vs. limit=15.0 2023-11-19 12:56:00,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=745533.3333333334, ans=0.0 2023-11-19 12:56:03,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=745533.3333333334, ans=0.2 2023-11-19 12:56:03,239 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=7.16 vs. limit=15.0 2023-11-19 12:56:04,118 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=745533.3333333334, ans=0.2 2023-11-19 12:56:07,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=745600.0, ans=0.95 2023-11-19 12:56:10,374 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=745600.0, ans=0.125 2023-11-19 12:56:12,464 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.bypass.skip_rate, batch_count=745600.0, ans=0.09899494936611666 2023-11-19 12:56:26,624 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=745733.3333333334, ans=0.0 2023-11-19 12:56:26,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=745733.3333333334, ans=0.125 2023-11-19 12:56:27,476 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3650, loss[loss=0.09147, simple_loss=0.115, pruned_loss=0.02219, audio_tagging_loss=0.01179, over 15443.00 frames. ], tot_loss[loss=0.08552, simple_loss=0.1047, pruned_loss=0.02279, audio_tagging_loss=0.01036, over 3036529.55 frames. ], batch size: 59, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 12:56:27,636 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=745733.3333333334, ans=0.0 2023-11-19 12:56:34,394 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=9.63 vs. limit=12.0 2023-11-19 12:56:43,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module1.balancer1.prob, batch_count=745800.0, ans=0.125 2023-11-19 12:56:54,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer1.prob, batch_count=745866.6666666666, ans=0.125 2023-11-19 12:57:23,293 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.99 vs. limit=15.0 2023-11-19 12:57:23,625 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3700, loss[loss=0.1171, simple_loss=0.1516, pruned_loss=0.03386, audio_tagging_loss=0.007437, over 15719.00 frames. ], tot_loss[loss=0.0859, simple_loss=0.1051, pruned_loss=0.023, audio_tagging_loss=0.01034, over 3045316.33 frames. ], batch size: 56, lr: 7.01e-03, grad_scale: 32.0 2023-11-19 12:57:29,548 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.50 vs. limit=10.0 2023-11-19 12:57:36,766 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=746133.3333333334, ans=0.125 2023-11-19 12:57:37,612 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.811e+01 8.774e+01 9.543e+01 1.094e+02 1.492e+02, threshold=1.909e+02, percent-clipped=0.0 2023-11-19 12:57:39,271 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.05 vs. limit=15.0 2023-11-19 12:57:59,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=746266.6666666666, ans=0.125 2023-11-19 12:58:02,220 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer1.prob, batch_count=746266.6666666666, ans=0.125 2023-11-19 12:58:02,282 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=746266.6666666666, ans=0.125 2023-11-19 12:58:09,554 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746333.3333333334, ans=0.1 2023-11-19 12:58:19,029 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3750, loss[loss=0.09816, simple_loss=0.1182, pruned_loss=0.0293, audio_tagging_loss=0.009745, over 15589.00 frames. ], tot_loss[loss=0.08624, simple_loss=0.1054, pruned_loss=0.02313, audio_tagging_loss=0.0104, over 3046982.55 frames. ], batch size: 58, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 12:58:47,796 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=746533.3333333334, ans=0.125 2023-11-19 12:58:55,002 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=746600.0, ans=0.1 2023-11-19 12:58:56,981 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ZY_Bsi-RNuk_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 12:59:02,577 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-112000.pt 2023-11-19 12:59:17,460 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3800, loss[loss=0.06084, simple_loss=0.07389, pruned_loss=0.01137, audio_tagging_loss=0.01252, over 15529.00 frames. ], tot_loss[loss=0.08679, simple_loss=0.1061, pruned_loss=0.0233, audio_tagging_loss=0.01046, over 3050063.16 frames. ], batch size: 60, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 12:59:22,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=746733.3333333334, ans=0.125 2023-11-19 12:59:32,189 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.736e+01 8.389e+01 8.999e+01 1.009e+02 1.684e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 12:59:35,540 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer1.prob, batch_count=746800.0, ans=0.125 2023-11-19 12:59:41,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=746866.6666666666, ans=0.1 2023-11-19 12:59:46,614 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=6.63 vs. limit=10.0 2023-11-19 12:59:47,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer2.prob, batch_count=746866.6666666666, ans=0.125 2023-11-19 12:59:55,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=746933.3333333334, ans=0.0 2023-11-19 13:00:06,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=747000.0, ans=0.95 2023-11-19 13:00:11,551 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass_mid.scale_min, batch_count=747000.0, ans=0.2 2023-11-19 13:00:13,367 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3850, loss[loss=0.08977, simple_loss=0.1098, pruned_loss=0.02287, audio_tagging_loss=0.01198, over 16068.00 frames. ], tot_loss[loss=0.08662, simple_loss=0.106, pruned_loss=0.02322, audio_tagging_loss=0.01039, over 3049141.16 frames. ], batch size: 61, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 13:00:19,349 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.86 vs. limit=15.0 2023-11-19 13:00:24,503 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=512, metric=20.66 vs. limit=22.5 2023-11-19 13:00:27,286 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=747133.3333333334, ans=0.2 2023-11-19 13:00:37,653 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=13.36 vs. limit=15.0 2023-11-19 13:00:38,892 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff2_skip_rate, batch_count=747200.0, ans=0.0 2023-11-19 13:00:56,978 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=747333.3333333334, ans=0.0 2023-11-19 13:00:59,446 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.24 vs. limit=15.0 2023-11-19 13:01:06,441 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=747333.3333333334, ans=0.09899494936611666 2023-11-19 13:01:08,394 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3900, loss[loss=0.1048, simple_loss=0.126, pruned_loss=0.03121, audio_tagging_loss=0.01059, over 15420.00 frames. ], tot_loss[loss=0.08694, simple_loss=0.1063, pruned_loss=0.02324, audio_tagging_loss=0.01053, over 3045096.91 frames. ], batch size: 56, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 13:01:23,249 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.643e+01 8.736e+01 9.302e+01 1.013e+02 3.038e+02, threshold=1.860e+02, percent-clipped=1.0 2023-11-19 13:01:33,018 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=747533.3333333334, ans=0.0 2023-11-19 13:01:37,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=747533.3333333334, ans=0.0 2023-11-19 13:01:37,272 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=747533.3333333334, ans=0.1 2023-11-19 13:01:40,451 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=747533.3333333334, ans=0.0 2023-11-19 13:01:58,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=747666.6666666666, ans=0.0 2023-11-19 13:02:04,251 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 3950, loss[loss=0.07214, simple_loss=0.07545, pruned_loss=0.01626, audio_tagging_loss=0.01815, over 14970.00 frames. ], tot_loss[loss=0.08638, simple_loss=0.1054, pruned_loss=0.02297, audio_tagging_loss=0.01072, over 3033655.62 frames. ], batch size: 57, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 13:02:11,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.balancer_ff2.min_abs, batch_count=747733.3333333334, ans=0.1 2023-11-19 13:02:20,059 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=15.43 vs. limit=22.5 2023-11-19 13:02:28,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=747866.6666666666, ans=0.2 2023-11-19 13:02:28,175 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=747866.6666666666, ans=0.125 2023-11-19 13:02:33,539 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=747866.6666666666, ans=0.125 2023-11-19 13:02:40,425 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=747933.3333333334, ans=0.125 2023-11-19 13:02:43,713 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=747933.3333333334, ans=0.0 2023-11-19 13:03:01,251 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4000, loss[loss=0.09321, simple_loss=0.1142, pruned_loss=0.02776, audio_tagging_loss=0.008357, over 15353.00 frames. ], tot_loss[loss=0.0869, simple_loss=0.1057, pruned_loss=0.02326, audio_tagging_loss=0.01081, over 3037460.19 frames. ], batch size: 56, lr: 7.00e-03, grad_scale: 32.0 2023-11-19 13:03:03,589 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=748066.6666666666, ans=0.1 2023-11-19 13:03:12,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer1.prob, batch_count=748133.3333333334, ans=0.125 2023-11-19 13:03:14,911 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.731e+01 8.515e+01 9.235e+01 1.023e+02 1.834e+02, threshold=1.847e+02, percent-clipped=0.0 2023-11-19 13:03:21,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=748200.0, ans=0.0 2023-11-19 13:03:21,714 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.77 vs. limit=12.0 2023-11-19 13:03:56,348 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4050, loss[loss=0.1125, simple_loss=0.1477, pruned_loss=0.03304, audio_tagging_loss=0.00558, over 16042.00 frames. ], tot_loss[loss=0.08746, simple_loss=0.1064, pruned_loss=0.0236, audio_tagging_loss=0.01065, over 3034601.29 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:03:57,913 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.30 vs. limit=10.0 2023-11-19 13:03:58,493 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/-7b0f9TyPFU_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:04:05,485 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward3.hidden_balancer.prob, batch_count=748400.0, ans=0.125 2023-11-19 13:04:07,743 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.prob, batch_count=748466.6666666666, ans=0.125 2023-11-19 13:04:12,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer2.prob, batch_count=748466.6666666666, ans=0.125 2023-11-19 13:04:31,902 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=748600.0, ans=0.0 2023-11-19 13:04:50,623 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=748733.3333333334, ans=0.1 2023-11-19 13:04:51,376 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4100, loss[loss=0.06137, simple_loss=0.06827, pruned_loss=0.01376, audio_tagging_loss=0.01347, over 15484.00 frames. ], tot_loss[loss=0.08746, simple_loss=0.1069, pruned_loss=0.02343, audio_tagging_loss=0.0106, over 3036350.04 frames. ], batch size: 60, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:04:52,690 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=748733.3333333334, ans=0.125 2023-11-19 13:04:54,804 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=748733.3333333334, ans=0.125 2023-11-19 13:05:02,998 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.20 vs. limit=12.0 2023-11-19 13:05:04,056 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=11.01 vs. limit=15.0 2023-11-19 13:05:06,276 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.287e+01 8.382e+01 9.039e+01 9.605e+01 1.210e+02, threshold=1.808e+02, percent-clipped=0.0 2023-11-19 13:05:10,762 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=748800.0, ans=0.0 2023-11-19 13:05:24,915 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.33 vs. limit=15.0 2023-11-19 13:05:34,839 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=749000.0, ans=0.05 2023-11-19 13:05:46,730 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4150, loss[loss=0.0817, simple_loss=0.1031, pruned_loss=0.01982, audio_tagging_loss=0.01033, over 15539.00 frames. ], tot_loss[loss=0.08747, simple_loss=0.1069, pruned_loss=0.02355, audio_tagging_loss=0.01045, over 3040110.22 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:05:51,639 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=192, metric=7.67 vs. limit=15.0 2023-11-19 13:06:07,404 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=8.98 vs. limit=10.0 2023-11-19 13:06:12,324 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.95 vs. limit=22.5 2023-11-19 13:06:18,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_skip_rate, batch_count=749266.6666666666, ans=0.0 2023-11-19 13:06:25,586 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/5BkClLNthIQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:06:31,561 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.95 vs. limit=12.0 2023-11-19 13:06:32,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=749333.3333333334, ans=0.125 2023-11-19 13:06:32,758 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.43 vs. limit=10.0 2023-11-19 13:06:41,688 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4200, loss[loss=0.08398, simple_loss=0.1123, pruned_loss=0.01784, audio_tagging_loss=0.009962, over 15499.00 frames. ], tot_loss[loss=0.08709, simple_loss=0.1069, pruned_loss=0.02338, audio_tagging_loss=0.01025, over 3038966.97 frames. ], batch size: 55, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:06:55,967 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.871e+01 8.337e+01 9.071e+01 1.010e+02 1.242e+02, threshold=1.814e+02, percent-clipped=0.0 2023-11-19 13:07:30,082 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.44 vs. limit=15.0 2023-11-19 13:07:32,771 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=749666.6666666666, ans=0.125 2023-11-19 13:07:37,283 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4250, loss[loss=0.08637, simple_loss=0.1066, pruned_loss=0.02331, audio_tagging_loss=0.009771, over 15157.00 frames. ], tot_loss[loss=0.08784, simple_loss=0.1085, pruned_loss=0.02345, audio_tagging_loss=0.01016, over 3051873.66 frames. ], batch size: 56, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:07:46,003 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=749733.3333333334, ans=0.0 2023-11-19 13:07:54,368 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=749800.0, ans=0.0 2023-11-19 13:07:54,460 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=749800.0, ans=0.09899494936611666 2023-11-19 13:08:03,752 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.60 vs. limit=15.0 2023-11-19 13:08:16,264 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=749933.3333333334, ans=0.125 2023-11-19 13:08:19,840 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module1.balancer2.prob, batch_count=749933.3333333334, ans=0.125 2023-11-19 13:08:23,391 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.14 vs. limit=15.0 2023-11-19 13:08:32,536 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4300, loss[loss=0.08548, simple_loss=0.1062, pruned_loss=0.02184, audio_tagging_loss=0.01053, over 15087.00 frames. ], tot_loss[loss=0.08762, simple_loss=0.1082, pruned_loss=0.02337, audio_tagging_loss=0.01013, over 3053504.07 frames. ], batch size: 58, lr: 6.99e-03, grad_scale: 32.0 2023-11-19 13:08:45,915 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=750133.3333333334, ans=0.1 2023-11-19 13:08:47,738 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.890e+01 8.411e+01 9.252e+01 1.004e+02 1.296e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-19 13:08:56,298 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=750200.0, ans=0.125 2023-11-19 13:09:02,156 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=750200.0, ans=0.125 2023-11-19 13:09:07,416 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=750266.6666666666, ans=0.1 2023-11-19 13:09:08,527 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=750266.6666666666, ans=0.125 2023-11-19 13:09:28,921 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4350, loss[loss=0.1199, simple_loss=0.1543, pruned_loss=0.03347, audio_tagging_loss=0.009242, over 16517.00 frames. ], tot_loss[loss=0.08719, simple_loss=0.1076, pruned_loss=0.0232, audio_tagging_loss=0.01018, over 3052710.13 frames. ], batch size: 57, lr: 6.99e-03, grad_scale: 16.0 2023-11-19 13:09:39,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=750466.6666666666, ans=0.04949747468305833 2023-11-19 13:09:49,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=750533.3333333334, ans=0.2 2023-11-19 13:09:50,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=750533.3333333334, ans=0.1 2023-11-19 13:09:55,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=750533.3333333334, ans=0.125 2023-11-19 13:10:01,095 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=750600.0, ans=0.1 2023-11-19 13:10:12,799 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=10.69 vs. limit=15.0 2023-11-19 13:10:14,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=750666.6666666666, ans=0.125 2023-11-19 13:10:24,897 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4400, loss[loss=0.08629, simple_loss=0.1057, pruned_loss=0.02678, audio_tagging_loss=0.006643, over 14951.00 frames. ], tot_loss[loss=0.08659, simple_loss=0.1069, pruned_loss=0.02301, audio_tagging_loss=0.01015, over 3052154.45 frames. ], batch size: 54, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:10:30,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=750733.3333333334, ans=0.1 2023-11-19 13:10:31,682 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=750733.3333333334, ans=0.0 2023-11-19 13:10:39,851 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.801e+01 8.194e+01 9.026e+01 9.942e+01 1.275e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 13:11:14,330 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=751000.0, ans=0.125 2023-11-19 13:11:20,613 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4450, loss[loss=0.0968, simple_loss=0.1228, pruned_loss=0.02425, audio_tagging_loss=0.01117, over 16301.00 frames. ], tot_loss[loss=0.0868, simple_loss=0.1068, pruned_loss=0.02324, audio_tagging_loss=0.01017, over 3048600.96 frames. ], batch size: 58, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:11:42,832 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=11.18 vs. limit=15.0 2023-11-19 13:11:48,347 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer2.prob, batch_count=751200.0, ans=0.125 2023-11-19 13:11:53,711 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=751266.6666666666, ans=0.0 2023-11-19 13:12:09,774 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.27 vs. limit=12.0 2023-11-19 13:12:16,442 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4500, loss[loss=0.06687, simple_loss=0.08062, pruned_loss=0.01687, audio_tagging_loss=0.009689, over 15215.00 frames. ], tot_loss[loss=0.08655, simple_loss=0.1061, pruned_loss=0.02322, audio_tagging_loss=0.01026, over 3054131.60 frames. ], batch size: 58, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:12:32,371 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.486e+01 8.202e+01 9.163e+01 9.982e+01 1.315e+02, threshold=1.833e+02, percent-clipped=0.0 2023-11-19 13:12:41,174 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=751533.3333333334, ans=0.125 2023-11-19 13:12:49,089 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=751600.0, ans=0.0 2023-11-19 13:12:58,036 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.ff2_skip_rate, batch_count=751600.0, ans=0.0 2023-11-19 13:13:12,552 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4550, loss[loss=0.06977, simple_loss=0.0838, pruned_loss=0.01756, audio_tagging_loss=0.01031, over 14421.00 frames. ], tot_loss[loss=0.08579, simple_loss=0.105, pruned_loss=0.02295, audio_tagging_loss=0.01035, over 3049314.57 frames. ], batch size: 56, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:13:22,492 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=15.13 vs. limit=15.0 2023-11-19 13:13:32,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=751800.0, ans=0.125 2023-11-19 13:13:48,081 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=751933.3333333334, ans=0.07 2023-11-19 13:13:54,724 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/_II2Klfnn4Y_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:13:59,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass_mid.scale_min, batch_count=752000.0, ans=0.2 2023-11-19 13:14:07,800 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4600, loss[loss=0.07835, simple_loss=0.09372, pruned_loss=0.02031, audio_tagging_loss=0.01119, over 15017.00 frames. ], tot_loss[loss=0.08576, simple_loss=0.1052, pruned_loss=0.02282, audio_tagging_loss=0.01033, over 3059436.28 frames. ], batch size: 56, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:14:09,164 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=752066.6666666666, ans=0.0 2023-11-19 13:14:20,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.attention_skip_rate, batch_count=752133.3333333334, ans=0.0 2023-11-19 13:14:23,753 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.499e+01 8.092e+01 8.852e+01 9.685e+01 1.325e+02, threshold=1.770e+02, percent-clipped=0.0 2023-11-19 13:14:28,122 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.hidden_balancer.prob, batch_count=752133.3333333334, ans=0.125 2023-11-19 13:14:40,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff3_skip_rate, batch_count=752200.0, ans=0.0 2023-11-19 13:14:43,022 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=752266.6666666666, ans=0.1 2023-11-19 13:14:48,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=752266.6666666666, ans=0.015 2023-11-19 13:14:56,236 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.balancer2.prob, batch_count=752333.3333333334, ans=0.125 2023-11-19 13:14:56,506 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=22.5 2023-11-19 13:15:04,123 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4650, loss[loss=0.08931, simple_loss=0.1119, pruned_loss=0.02587, audio_tagging_loss=0.007474, over 14801.00 frames. ], tot_loss[loss=0.08616, simple_loss=0.1056, pruned_loss=0.02298, audio_tagging_loss=0.01041, over 3061581.91 frames. ], batch size: 55, lr: 6.98e-03, grad_scale: 32.0 2023-11-19 13:15:30,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=752533.3333333334, ans=0.0 2023-11-19 13:15:32,924 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.3.self_attn_weights, loss-sum=4.510e-01 2023-11-19 13:15:51,296 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=12.13 vs. limit=15.0 2023-11-19 13:15:53,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=752666.6666666666, ans=0.1 2023-11-19 13:15:59,525 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4700, loss[loss=0.08959, simple_loss=0.115, pruned_loss=0.02375, audio_tagging_loss=0.008318, over 16020.00 frames. ], tot_loss[loss=0.08633, simple_loss=0.1056, pruned_loss=0.02296, audio_tagging_loss=0.01055, over 3063311.97 frames. ], batch size: 62, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:16:12,905 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=752800.0, ans=0.1 2023-11-19 13:16:14,823 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.996e+01 8.706e+01 9.585e+01 1.066e+02 1.440e+02, threshold=1.917e+02, percent-clipped=0.0 2023-11-19 13:16:42,992 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=753000.0, ans=0.0 2023-11-19 13:16:44,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=753000.0, ans=0.125 2023-11-19 13:16:49,141 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=20.06 vs. limit=22.5 2023-11-19 13:16:54,055 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.balancer2.prob, batch_count=753066.6666666666, ans=0.125 2023-11-19 13:16:54,898 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4750, loss[loss=0.04918, simple_loss=0.04911, pruned_loss=0.009254, audio_tagging_loss=0.01537, over 13534.00 frames. ], tot_loss[loss=0.08641, simple_loss=0.1056, pruned_loss=0.02311, audio_tagging_loss=0.01049, over 3053543.14 frames. ], batch size: 53, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:16:57,314 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=753066.6666666666, ans=0.125 2023-11-19 13:17:03,293 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=4.57 vs. limit=12.0 2023-11-19 13:17:16,333 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer1.prob, batch_count=753200.0, ans=0.125 2023-11-19 13:17:33,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=753266.6666666666, ans=0.1 2023-11-19 13:17:37,548 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer1.prob, batch_count=753266.6666666666, ans=0.125 2023-11-19 13:17:50,481 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=753400.0, ans=0.125 2023-11-19 13:17:51,342 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4800, loss[loss=0.1048, simple_loss=0.1268, pruned_loss=0.031, audio_tagging_loss=0.01039, over 15324.00 frames. ], tot_loss[loss=0.08598, simple_loss=0.1048, pruned_loss=0.02288, audio_tagging_loss=0.01068, over 3052702.88 frames. ], batch size: 56, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:18:00,260 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.27 vs. limit=15.0 2023-11-19 13:18:04,265 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass.skip_rate, batch_count=753466.6666666666, ans=0.07 2023-11-19 13:18:06,144 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.276e+01 8.271e+01 9.115e+01 1.017e+02 1.442e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-19 13:18:13,529 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=8.34 vs. limit=15.0 2023-11-19 13:18:22,806 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=753600.0, ans=0.09899494936611666 2023-11-19 13:18:33,278 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=753600.0, ans=0.0 2023-11-19 13:18:45,783 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4850, loss[loss=0.07157, simple_loss=0.08731, pruned_loss=0.0181, audio_tagging_loss=0.009821, over 14959.00 frames. ], tot_loss[loss=0.08575, simple_loss=0.1046, pruned_loss=0.0227, audio_tagging_loss=0.01076, over 3059309.72 frames. ], batch size: 60, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:19:07,162 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=753866.6666666666, ans=0.2 2023-11-19 13:19:14,664 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer1.prob, batch_count=753866.6666666666, ans=0.125 2023-11-19 13:19:29,394 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=754000.0, ans=0.125 2023-11-19 13:19:34,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=754000.0, ans=0.0 2023-11-19 13:19:34,750 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=754000.0, ans=0.0 2023-11-19 13:19:41,818 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4900, loss[loss=0.08825, simple_loss=0.1076, pruned_loss=0.02302, audio_tagging_loss=0.01145, over 14576.00 frames. ], tot_loss[loss=0.08601, simple_loss=0.1049, pruned_loss=0.02293, audio_tagging_loss=0.01065, over 3054264.37 frames. ], batch size: 55, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:19:57,514 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.439e+01 8.307e+01 9.002e+01 9.755e+01 1.261e+02, threshold=1.800e+02, percent-clipped=0.0 2023-11-19 13:19:59,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=754133.3333333334, ans=0.1 2023-11-19 13:20:00,878 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.scale_min, batch_count=754133.3333333334, ans=0.2 2023-11-19 13:20:02,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=754133.3333333334, ans=15.0 2023-11-19 13:20:07,240 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer2.min_positive, batch_count=754200.0, ans=0.05 2023-11-19 13:20:21,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass_mid.scale_min, batch_count=754266.6666666666, ans=0.2 2023-11-19 13:20:22,240 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=8.05 vs. limit=10.0 2023-11-19 13:20:25,141 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=754333.3333333334, ans=0.125 2023-11-19 13:20:27,266 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer2.min_abs, batch_count=754333.3333333334, ans=0.5 2023-11-19 13:20:29,391 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=754333.3333333334, ans=0.125 2023-11-19 13:20:37,466 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 4950, loss[loss=0.09993, simple_loss=0.1258, pruned_loss=0.02824, audio_tagging_loss=0.008786, over 15052.00 frames. ], tot_loss[loss=0.08583, simple_loss=0.105, pruned_loss=0.02284, audio_tagging_loss=0.01048, over 3053087.54 frames. ], batch size: 56, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:20:38,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=754400.0, ans=0.125 2023-11-19 13:20:47,415 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=256, metric=14.35 vs. limit=22.5 2023-11-19 13:20:50,389 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.ff2_skip_rate, batch_count=754466.6666666666, ans=0.0 2023-11-19 13:21:03,304 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=754533.3333333334, ans=0.0 2023-11-19 13:21:25,977 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=7.55 vs. limit=15.0 2023-11-19 13:21:32,750 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5000, loss[loss=0.06544, simple_loss=0.0817, pruned_loss=0.01329, audio_tagging_loss=0.0113, over 14913.00 frames. ], tot_loss[loss=0.08614, simple_loss=0.1058, pruned_loss=0.02299, audio_tagging_loss=0.01024, over 3051133.78 frames. ], batch size: 58, lr: 6.97e-03, grad_scale: 32.0 2023-11-19 13:21:36,524 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=14.62 vs. limit=15.0 2023-11-19 13:21:43,286 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=10.07 vs. limit=15.0 2023-11-19 13:21:45,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=754800.0, ans=0.125 2023-11-19 13:21:48,521 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.750e+01 8.260e+01 8.986e+01 1.009e+02 1.320e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-19 13:21:54,121 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:21:57,724 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module1.balancer2.prob, batch_count=754866.6666666666, ans=0.125 2023-11-19 13:22:07,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten2.whitening_limit, batch_count=754933.3333333334, ans=15.0 2023-11-19 13:22:12,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=754933.3333333334, ans=0.1 2023-11-19 13:22:16,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755000.0, ans=0.1 2023-11-19 13:22:27,380 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=755066.6666666666, ans=0.0 2023-11-19 13:22:28,177 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5050, loss[loss=0.08231, simple_loss=0.1086, pruned_loss=0.02082, audio_tagging_loss=0.007201, over 14910.00 frames. ], tot_loss[loss=0.0854, simple_loss=0.1049, pruned_loss=0.02273, audio_tagging_loss=0.01021, over 3047348.36 frames. ], batch size: 55, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:22:44,213 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.77 vs. limit=6.0 2023-11-19 13:23:06,728 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=755266.6666666666, ans=0.1 2023-11-19 13:23:07,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer1.prob, batch_count=755266.6666666666, ans=0.125 2023-11-19 13:23:13,875 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.70 vs. limit=15.0 2023-11-19 13:23:16,842 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=755333.3333333334, ans=0.125 2023-11-19 13:23:24,485 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5100, loss[loss=0.05897, simple_loss=0.07412, pruned_loss=0.01375, audio_tagging_loss=0.008164, over 15301.00 frames. ], tot_loss[loss=0.08519, simple_loss=0.1046, pruned_loss=0.0227, audio_tagging_loss=0.01017, over 3049002.25 frames. ], batch size: 59, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:23:31,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=755400.0, ans=0.125 2023-11-19 13:23:39,699 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.354e+01 8.218e+01 8.903e+01 1.035e+02 1.339e+02, threshold=1.781e+02, percent-clipped=0.0 2023-11-19 13:23:46,838 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=755533.3333333334, ans=0.125 2023-11-19 13:23:50,367 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.24 vs. limit=15.0 2023-11-19 13:23:52,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer1.prob, batch_count=755533.3333333334, ans=0.125 2023-11-19 13:23:54,358 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer2.prob, batch_count=755533.3333333334, ans=0.125 2023-11-19 13:23:58,152 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=755600.0, ans=0.0 2023-11-19 13:24:17,161 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.max_positive, batch_count=755666.6666666666, ans=0.95 2023-11-19 13:24:20,066 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5150, loss[loss=0.06699, simple_loss=0.07991, pruned_loss=0.01721, audio_tagging_loss=0.009829, over 15331.00 frames. ], tot_loss[loss=0.08517, simple_loss=0.1047, pruned_loss=0.02265, audio_tagging_loss=0.01019, over 3054755.74 frames. ], batch size: 58, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:24:28,197 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.hidden_balancer.prob, batch_count=755733.3333333334, ans=0.125 2023-11-19 13:24:50,967 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=755866.6666666666, ans=0.1 2023-11-19 13:24:50,989 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=755866.6666666666, ans=0.0 2023-11-19 13:24:57,326 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=755933.3333333334, ans=0.1 2023-11-19 13:24:58,452 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=755933.3333333334, ans=0.125 2023-11-19 13:25:15,941 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5200, loss[loss=0.06785, simple_loss=0.09142, pruned_loss=0.01253, audio_tagging_loss=0.009614, over 14535.00 frames. ], tot_loss[loss=0.08522, simple_loss=0.1047, pruned_loss=0.02266, audio_tagging_loss=0.0102, over 3054945.78 frames. ], batch size: 53, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:25:23,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.bypass.scale_min, batch_count=756066.6666666666, ans=0.2 2023-11-19 13:25:31,601 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.033e+01 8.473e+01 9.085e+01 1.039e+02 1.273e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 13:25:38,365 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=144, metric=8.86 vs. limit=10.0 2023-11-19 13:25:40,292 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.98 vs. limit=15.0 2023-11-19 13:25:44,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=756200.0, ans=0.1 2023-11-19 13:25:54,159 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=756266.6666666666, ans=0.05 2023-11-19 13:26:11,198 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=3.13 vs. limit=15.0 2023-11-19 13:26:11,821 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5250, loss[loss=0.119, simple_loss=0.1307, pruned_loss=0.0414, audio_tagging_loss=0.01221, over 15381.00 frames. ], tot_loss[loss=0.08665, simple_loss=0.1062, pruned_loss=0.02333, audio_tagging_loss=0.01023, over 3046075.85 frames. ], batch size: 56, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:26:18,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer1.prob, batch_count=756400.0, ans=0.125 2023-11-19 13:26:22,422 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=19.81 vs. limit=22.5 2023-11-19 13:26:32,656 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module2.balancer1.prob, batch_count=756533.3333333334, ans=0.125 2023-11-19 13:26:36,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer1.prob, batch_count=756533.3333333334, ans=0.125 2023-11-19 13:26:38,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.min_positive, batch_count=756533.3333333334, ans=0.025 2023-11-19 13:26:39,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=756533.3333333334, ans=0.0 2023-11-19 13:27:01,980 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.balancer1.prob, batch_count=756666.6666666666, ans=0.125 2023-11-19 13:27:07,153 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5300, loss[loss=0.103, simple_loss=0.1279, pruned_loss=0.0295, audio_tagging_loss=0.009548, over 15000.00 frames. ], tot_loss[loss=0.08597, simple_loss=0.1058, pruned_loss=0.02296, audio_tagging_loss=0.01012, over 3044524.01 frames. ], batch size: 54, lr: 6.96e-03, grad_scale: 32.0 2023-11-19 13:27:22,573 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.871e+01 8.251e+01 9.151e+01 1.020e+02 1.250e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 13:27:36,736 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.whiten, num_groups=1, num_channels=256, metric=4.71 vs. limit=12.0 2023-11-19 13:27:46,685 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module1.balancer1.prob, batch_count=756933.3333333334, ans=0.125 2023-11-19 13:28:02,884 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5350, loss[loss=0.07086, simple_loss=0.08341, pruned_loss=0.01509, audio_tagging_loss=0.01407, over 15769.00 frames. ], tot_loss[loss=0.08583, simple_loss=0.1056, pruned_loss=0.02285, audio_tagging_loss=0.01016, over 3043144.30 frames. ], batch size: 61, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:28:10,516 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=757066.6666666666, ans=0.0 2023-11-19 13:28:17,246 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=757133.3333333334, ans=0.125 2023-11-19 13:28:23,237 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=757133.3333333334, ans=0.1 2023-11-19 13:28:28,606 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=757200.0, ans=0.0 2023-11-19 13:28:30,673 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=757200.0, ans=0.0 2023-11-19 13:28:35,212 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2.whitening_limit, batch_count=757266.6666666666, ans=15.0 2023-11-19 13:28:57,753 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5400, loss[loss=0.1078, simple_loss=0.1384, pruned_loss=0.02943, audio_tagging_loss=0.009125, over 14435.00 frames. ], tot_loss[loss=0.0854, simple_loss=0.105, pruned_loss=0.02257, audio_tagging_loss=0.01032, over 3046697.16 frames. ], batch size: 53, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:29:14,080 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.833e+01 8.379e+01 8.876e+01 9.570e+01 1.112e+02, threshold=1.775e+02, percent-clipped=0.0 2023-11-19 13:29:19,735 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.out_combiner.scale_min, batch_count=757533.3333333334, ans=0.2 2023-11-19 13:29:20,803 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=757533.3333333334, ans=0.1 2023-11-19 13:29:20,818 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=757533.3333333334, ans=0.125 2023-11-19 13:29:29,729 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff2_skip_rate, batch_count=757533.3333333334, ans=0.0 2023-11-19 13:29:36,536 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=8.31 vs. limit=12.0 2023-11-19 13:29:38,444 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.11 vs. limit=6.0 2023-11-19 13:29:40,419 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=757600.0, ans=0.0 2023-11-19 13:29:54,363 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5450, loss[loss=0.067, simple_loss=0.08233, pruned_loss=0.01483, audio_tagging_loss=0.01101, over 15434.00 frames. ], tot_loss[loss=0.08557, simple_loss=0.1051, pruned_loss=0.02264, audio_tagging_loss=0.01036, over 3044459.65 frames. ], batch size: 59, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:29:54,558 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=757733.3333333334, ans=0.0 2023-11-19 13:30:00,289 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=512, metric=3.50 vs. limit=15.0 2023-11-19 13:30:01,057 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=757733.3333333334, ans=0.2 2023-11-19 13:30:09,906 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.prob, batch_count=757800.0, ans=0.125 2023-11-19 13:30:11,038 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff2_skip_rate, batch_count=757800.0, ans=0.0 2023-11-19 13:30:13,154 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.prob, batch_count=757800.0, ans=0.125 2023-11-19 13:30:13,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=757800.0, ans=0.1 2023-11-19 13:30:17,263 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module1.balancer1.max_abs, batch_count=757866.6666666666, ans=10.0 2023-11-19 13:30:23,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=757866.6666666666, ans=0.125 2023-11-19 13:30:46,520 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=5.08 vs. limit=6.0 2023-11-19 13:30:49,735 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5500, loss[loss=0.07223, simple_loss=0.08755, pruned_loss=0.01668, audio_tagging_loss=0.01177, over 16136.00 frames. ], tot_loss[loss=0.08587, simple_loss=0.1053, pruned_loss=0.02276, audio_tagging_loss=0.01048, over 3052867.38 frames. ], batch size: 59, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:31:02,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=758133.3333333334, ans=0.125 2023-11-19 13:31:02,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=758133.3333333334, ans=0.2 2023-11-19 13:31:04,522 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.775e+01 8.323e+01 9.025e+01 9.961e+01 1.664e+02, threshold=1.805e+02, percent-clipped=0.0 2023-11-19 13:31:13,659 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=9.71 vs. limit=15.0 2023-11-19 13:31:18,075 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=2.581e-03 2023-11-19 13:31:27,418 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer2.prob, batch_count=758266.6666666666, ans=0.125 2023-11-19 13:31:38,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.whiten_keys.whitening_limit, batch_count=758333.3333333334, ans=6.0 2023-11-19 13:31:44,673 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5550, loss[loss=0.1047, simple_loss=0.1264, pruned_loss=0.03265, audio_tagging_loss=0.008872, over 15005.00 frames. ], tot_loss[loss=0.08568, simple_loss=0.1048, pruned_loss=0.02265, audio_tagging_loss=0.01061, over 3053990.80 frames. ], batch size: 55, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:32:00,049 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=758466.6666666666, ans=0.125 2023-11-19 13:32:18,072 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.58 vs. limit=15.0 2023-11-19 13:32:19,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff2_skip_rate, batch_count=758600.0, ans=0.0 2023-11-19 13:32:29,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=758666.6666666666, ans=0.0 2023-11-19 13:32:38,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=758666.6666666666, ans=0.0 2023-11-19 13:32:40,826 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5600, loss[loss=0.1011, simple_loss=0.1275, pruned_loss=0.02831, audio_tagging_loss=0.009086, over 14057.00 frames. ], tot_loss[loss=0.08527, simple_loss=0.1042, pruned_loss=0.02246, audio_tagging_loss=0.01072, over 3052911.68 frames. ], batch size: 53, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:32:43,754 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=758733.3333333334, ans=0.5 2023-11-19 13:32:56,429 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.309e+01 8.349e+01 9.140e+01 1.023e+02 1.369e+02, threshold=1.828e+02, percent-clipped=0.0 2023-11-19 13:32:59,092 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.00 vs. limit=15.0 2023-11-19 13:33:10,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.ff2_skip_rate, batch_count=758866.6666666666, ans=0.0 2023-11-19 13:33:13,436 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.01 vs. limit=15.0 2023-11-19 13:33:18,573 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/ze0LsBtoDm0_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:33:32,052 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=759000.0, ans=0.125 2023-11-19 13:33:35,870 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=759066.6666666666, ans=0.0 2023-11-19 13:33:36,687 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5650, loss[loss=0.0892, simple_loss=0.1114, pruned_loss=0.02341, audio_tagging_loss=0.01008, over 15750.00 frames. ], tot_loss[loss=0.08566, simple_loss=0.1049, pruned_loss=0.02252, audio_tagging_loss=0.01066, over 3057454.79 frames. ], batch size: 56, lr: 6.95e-03, grad_scale: 32.0 2023-11-19 13:33:38,070 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=759066.6666666666, ans=0.0 2023-11-19 13:33:49,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer2.prob, batch_count=759133.3333333334, ans=0.125 2023-11-19 13:33:56,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=759133.3333333334, ans=0.0 2023-11-19 13:33:57,930 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=11.18 vs. limit=15.0 2023-11-19 13:34:04,316 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=759200.0, ans=0.09899494936611666 2023-11-19 13:34:12,280 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=759266.6666666666, ans=0.07 2023-11-19 13:34:18,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.skip_rate, batch_count=759266.6666666666, ans=0.035 2023-11-19 13:34:23,583 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.attention_skip_rate, batch_count=759333.3333333334, ans=0.0 2023-11-19 13:34:31,774 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5700, loss[loss=0.08615, simple_loss=0.1005, pruned_loss=0.02263, audio_tagging_loss=0.01325, over 15790.00 frames. ], tot_loss[loss=0.08486, simple_loss=0.1038, pruned_loss=0.02228, audio_tagging_loss=0.0107, over 3055667.42 frames. ], batch size: 58, lr: 6.94e-03, grad_scale: 32.0 2023-11-19 13:34:47,530 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.640e+01 8.102e+01 8.841e+01 9.614e+01 1.155e+02, threshold=1.768e+02, percent-clipped=0.0 2023-11-19 13:34:53,867 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.39 vs. limit=15.0 2023-11-19 13:35:03,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass.skip_rate, batch_count=759533.3333333334, ans=0.07 2023-11-19 13:35:11,916 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=759600.0, ans=0.125 2023-11-19 13:35:17,083 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=759666.6666666666, ans=0.125 2023-11-19 13:35:17,299 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=759666.6666666666, ans=0.1 2023-11-19 13:35:27,571 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5750, loss[loss=0.08149, simple_loss=0.107, pruned_loss=0.02038, audio_tagging_loss=0.007628, over 15871.00 frames. ], tot_loss[loss=0.08466, simple_loss=0.1037, pruned_loss=0.02233, audio_tagging_loss=0.01048, over 3060073.30 frames. ], batch size: 58, lr: 6.94e-03, grad_scale: 32.0 2023-11-19 13:35:33,087 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff3_skip_rate, batch_count=759733.3333333334, ans=0.0 2023-11-19 13:35:38,897 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff3_skip_rate, batch_count=759800.0, ans=0.0 2023-11-19 13:35:47,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=759800.0, ans=0.1 2023-11-19 13:36:03,694 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.31 vs. limit=6.0 2023-11-19 13:36:08,284 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.14 vs. limit=10.0 2023-11-19 13:36:13,552 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.prob, batch_count=760000.0, ans=0.125 2023-11-19 13:36:16,986 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.whiten, num_groups=1, num_channels=512, metric=10.32 vs. limit=12.0 2023-11-19 13:36:18,516 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=256, metric=11.90 vs. limit=15.0 2023-11-19 13:36:23,404 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5800, loss[loss=0.09897, simple_loss=0.1265, pruned_loss=0.02762, audio_tagging_loss=0.008085, over 14216.00 frames. ], tot_loss[loss=0.08527, simple_loss=0.1047, pruned_loss=0.02256, audio_tagging_loss=0.01035, over 3054838.82 frames. ], batch size: 52, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 13:36:39,823 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.949e+01 8.575e+01 9.143e+01 9.990e+01 1.422e+02, threshold=1.829e+02, percent-clipped=0.0 2023-11-19 13:37:19,220 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5850, loss[loss=0.08958, simple_loss=0.1071, pruned_loss=0.02645, audio_tagging_loss=0.009587, over 15356.00 frames. ], tot_loss[loss=0.08518, simple_loss=0.1044, pruned_loss=0.02269, audio_tagging_loss=0.01029, over 3051454.54 frames. ], batch size: 59, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 13:37:28,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=760400.0, ans=0.1 2023-11-19 13:37:28,442 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=760400.0, ans=0.125 2023-11-19 13:37:32,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=760466.6666666666, ans=0.1 2023-11-19 13:37:44,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=760533.3333333334, ans=0.2 2023-11-19 13:37:58,630 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=760600.0, ans=0.125 2023-11-19 13:38:01,763 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=760600.0, ans=0.2 2023-11-19 13:38:05,168 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=11.09 vs. limit=15.0 2023-11-19 13:38:09,744 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.attention_skip_rate, batch_count=760666.6666666666, ans=0.0 2023-11-19 13:38:15,423 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5900, loss[loss=0.07644, simple_loss=0.08646, pruned_loss=0.02056, audio_tagging_loss=0.01265, over 14174.00 frames. ], tot_loss[loss=0.08555, simple_loss=0.105, pruned_loss=0.02284, audio_tagging_loss=0.01021, over 3051548.02 frames. ], batch size: 53, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 13:38:26,365 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=760800.0, ans=0.125 2023-11-19 13:38:32,428 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.639e+01 8.376e+01 9.188e+01 1.002e+02 1.553e+02, threshold=1.838e+02, percent-clipped=0.0 2023-11-19 13:38:35,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=760800.0, ans=0.125 2023-11-19 13:38:37,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.ff3_skip_rate, batch_count=760866.6666666666, ans=0.0 2023-11-19 13:38:39,917 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=760866.6666666666, ans=0.125 2023-11-19 13:38:47,637 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=13.57 vs. limit=15.0 2023-11-19 13:38:53,136 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=760933.3333333334, ans=0.125 2023-11-19 13:39:07,438 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_module1.balancer2.prob, batch_count=761000.0, ans=0.125 2023-11-19 13:39:10,376 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 5950, loss[loss=0.08686, simple_loss=0.1082, pruned_loss=0.02015, audio_tagging_loss=0.01259, over 14741.00 frames. ], tot_loss[loss=0.08541, simple_loss=0.1048, pruned_loss=0.02273, audio_tagging_loss=0.01027, over 3060416.21 frames. ], batch size: 55, lr: 6.94e-03, grad_scale: 16.0 2023-11-19 13:39:20,641 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass_mid.scale_min, batch_count=761133.3333333334, ans=0.2 2023-11-19 13:39:24,991 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=761133.3333333334, ans=0.0 2023-11-19 13:39:27,034 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=761133.3333333334, ans=0.0 2023-11-19 13:39:31,289 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=761200.0, ans=0.125 2023-11-19 13:39:40,738 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=761200.0, ans=0.2 2023-11-19 13:40:03,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer1.prob, batch_count=761333.3333333334, ans=0.125 2023-11-19 13:40:06,251 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6000, loss[loss=0.1059, simple_loss=0.1273, pruned_loss=0.03294, audio_tagging_loss=0.009292, over 15176.00 frames. ], tot_loss[loss=0.08651, simple_loss=0.1062, pruned_loss=0.0232, audio_tagging_loss=0.01023, over 3055264.98 frames. ], batch size: 57, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:40:06,253 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 13:40:34,887 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.6380, 4.1341, 3.6487, 3.1074], device='cuda:0') 2023-11-19 13:40:38,521 INFO [train_asr.py:1147] (0/4) Epoch 10, validation: loss=0.06367, simple_loss=0.05534, pruned_loss=0.00639, audio_tagging_loss=0.02961, over 4681554.00 frames. 2023-11-19 13:40:38,522 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 13:40:38,723 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module1.balancer1.prob, batch_count=761400.0, ans=0.125 2023-11-19 13:40:55,647 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.484e+01 8.242e+01 8.869e+01 9.811e+01 1.293e+02, threshold=1.774e+02, percent-clipped=0.0 2023-11-19 13:41:03,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=761533.3333333334, ans=0.0 2023-11-19 13:41:06,801 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn1.whiten, num_groups=1, num_channels=384, metric=20.18 vs. limit=22.5 2023-11-19 13:41:13,255 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module1.balancer1.prob, batch_count=761600.0, ans=0.125 2023-11-19 13:41:17,856 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/NoNxFjwXuuc_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:41:34,260 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6050, loss[loss=0.09002, simple_loss=0.1115, pruned_loss=0.02488, audio_tagging_loss=0.009413, over 15712.00 frames. ], tot_loss[loss=0.0869, simple_loss=0.1068, pruned_loss=0.02332, audio_tagging_loss=0.01016, over 3056932.23 frames. ], batch size: 58, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:41:36,597 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_module2.balancer2.prob, batch_count=761733.3333333334, ans=0.125 2023-11-19 13:41:43,560 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer2.prob, batch_count=761733.3333333334, ans=0.125 2023-11-19 13:41:49,103 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.62 vs. limit=6.0 2023-11-19 13:41:55,644 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=761866.6666666666, ans=0.125 2023-11-19 13:41:55,686 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer2.prob, batch_count=761866.6666666666, ans=0.125 2023-11-19 13:42:29,476 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.01 vs. limit=22.5 2023-11-19 13:42:30,042 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6100, loss[loss=0.08323, simple_loss=0.09847, pruned_loss=0.01985, audio_tagging_loss=0.01414, over 14094.00 frames. ], tot_loss[loss=0.08619, simple_loss=0.1058, pruned_loss=0.02303, audio_tagging_loss=0.01026, over 3052404.16 frames. ], batch size: 55, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:42:38,166 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=762066.6666666666, ans=0.0 2023-11-19 13:42:46,880 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.906e+01 8.561e+01 9.384e+01 1.050e+02 1.586e+02, threshold=1.877e+02, percent-clipped=0.0 2023-11-19 13:43:04,556 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_skip_rate, batch_count=762266.6666666666, ans=0.0 2023-11-19 13:43:25,978 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6150, loss[loss=0.06594, simple_loss=0.08002, pruned_loss=0.01518, audio_tagging_loss=0.01075, over 15044.00 frames. ], tot_loss[loss=0.08588, simple_loss=0.1051, pruned_loss=0.02301, audio_tagging_loss=0.01031, over 3045800.84 frames. ], batch size: 57, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:43:41,287 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.5.prob, batch_count=762466.6666666666, ans=0.125 2023-11-19 13:43:54,379 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=762533.3333333334, ans=0.0 2023-11-19 13:44:03,371 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer2.prob, batch_count=762600.0, ans=0.125 2023-11-19 13:44:08,037 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:44:16,829 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=762666.6666666666, ans=0.0 2023-11-19 13:44:21,445 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6200, loss[loss=0.09536, simple_loss=0.1137, pruned_loss=0.02552, audio_tagging_loss=0.01301, over 15310.00 frames. ], tot_loss[loss=0.08547, simple_loss=0.1045, pruned_loss=0.0228, audio_tagging_loss=0.01043, over 3052177.39 frames. ], batch size: 56, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:44:25,890 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.conv_skip_rate, batch_count=762733.3333333334, ans=0.0 2023-11-19 13:44:33,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=762800.0, ans=0.0 2023-11-19 13:44:37,807 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.950e+01 8.474e+01 9.020e+01 9.808e+01 1.345e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 13:44:52,953 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.out_combiner.scale_min, batch_count=762866.6666666666, ans=0.2 2023-11-19 13:45:03,945 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module1.balancer2.prob, batch_count=762933.3333333334, ans=0.125 2023-11-19 13:45:07,105 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.nonlin_attention.balancer.prob, batch_count=763000.0, ans=0.125 2023-11-19 13:45:17,048 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6250, loss[loss=0.07713, simple_loss=0.09553, pruned_loss=0.02086, audio_tagging_loss=0.008505, over 14706.00 frames. ], tot_loss[loss=0.08568, simple_loss=0.1045, pruned_loss=0.02292, audio_tagging_loss=0.0105, over 3051209.16 frames. ], batch size: 54, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:45:38,188 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:45:40,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=763200.0, ans=0.09899494936611666 2023-11-19 13:45:52,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=763266.6666666666, ans=0.0 2023-11-19 13:46:03,989 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=9.58 vs. limit=15.0 2023-11-19 13:46:09,427 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_module2.balancer1.min_positive, batch_count=763333.3333333334, ans=0.025 2023-11-19 13:46:11,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=763400.0, ans=0.125 2023-11-19 13:46:12,448 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6300, loss[loss=0.06494, simple_loss=0.06951, pruned_loss=0.01614, audio_tagging_loss=0.01404, over 14498.00 frames. ], tot_loss[loss=0.08581, simple_loss=0.1047, pruned_loss=0.02288, audio_tagging_loss=0.01058, over 3047661.78 frames. ], batch size: 54, lr: 6.93e-03, grad_scale: 32.0 2023-11-19 13:46:29,498 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.022e+01 8.302e+01 8.988e+01 1.019e+02 1.261e+02, threshold=1.798e+02, percent-clipped=0.0 2023-11-19 13:46:36,053 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.out_balancer.prob, batch_count=763533.3333333334, ans=0.125 2023-11-19 13:46:45,318 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=763600.0, ans=0.125 2023-11-19 13:46:46,288 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:46:53,189 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=763600.0, ans=0.125 2023-11-19 13:46:58,306 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=6.03 vs. limit=15.0 2023-11-19 13:46:59,045 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=763666.6666666666, ans=0.1 2023-11-19 13:47:01,436 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=12.06 vs. limit=15.0 2023-11-19 13:47:08,312 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6350, loss[loss=0.09876, simple_loss=0.1192, pruned_loss=0.02371, audio_tagging_loss=0.01546, over 15032.00 frames. ], tot_loss[loss=0.08537, simple_loss=0.1041, pruned_loss=0.02265, audio_tagging_loss=0.01067, over 3053495.31 frames. ], batch size: 56, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 13:47:17,717 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=8.55 vs. limit=12.0 2023-11-19 13:47:22,775 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=763800.0, ans=0.2 2023-11-19 13:47:25,789 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass_mid.scale_min, batch_count=763800.0, ans=0.2 2023-11-19 13:47:27,524 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=763800.0, ans=0.125 2023-11-19 13:47:35,406 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.conv_skip_rate, batch_count=763866.6666666666, ans=0.0 2023-11-19 13:47:44,682 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=192, metric=3.92 vs. limit=10.0 2023-11-19 13:48:03,971 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6400, loss[loss=0.08585, simple_loss=0.1042, pruned_loss=0.02175, audio_tagging_loss=0.012, over 16570.00 frames. ], tot_loss[loss=0.08595, simple_loss=0.1048, pruned_loss=0.02283, audio_tagging_loss=0.01072, over 3047653.52 frames. ], batch size: 62, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 13:48:14,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=764133.3333333334, ans=0.1 2023-11-19 13:48:22,448 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.848e+01 7.945e+01 8.669e+01 9.496e+01 1.172e+02, threshold=1.734e+02, percent-clipped=0.0 2023-11-19 13:48:31,104 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=764200.0, ans=0.0 2023-11-19 13:48:40,925 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.8.prob, batch_count=764266.6666666666, ans=0.125 2023-11-19 13:48:44,559 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=11.80 vs. limit=15.0 2023-11-19 13:48:55,445 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.skip_rate, batch_count=764333.3333333334, ans=0.04949747468305833 2023-11-19 13:48:55,538 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass_mid.scale_min, batch_count=764333.3333333334, ans=0.2 2023-11-19 13:48:59,542 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6450, loss[loss=0.07309, simple_loss=0.09406, pruned_loss=0.01936, audio_tagging_loss=0.006703, over 15309.00 frames. ], tot_loss[loss=0.08599, simple_loss=0.1048, pruned_loss=0.02277, audio_tagging_loss=0.01081, over 3048432.26 frames. ], batch size: 56, lr: 6.92e-03, grad_scale: 32.0 2023-11-19 13:49:20,827 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=764533.3333333334, ans=0.0 2023-11-19 13:49:35,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.balancer1.prob, batch_count=764600.0, ans=0.125 2023-11-19 13:49:38,851 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module1.balancer2.min_positive, batch_count=764600.0, ans=0.05 2023-11-19 13:49:46,616 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=764666.6666666666, ans=0.1 2023-11-19 13:49:48,869 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.whiten, num_groups=1, num_channels=384, metric=7.49 vs. limit=12.0 2023-11-19 13:49:54,755 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6500, loss[loss=0.09015, simple_loss=0.1059, pruned_loss=0.0255, audio_tagging_loss=0.01171, over 15175.00 frames. ], tot_loss[loss=0.08541, simple_loss=0.1041, pruned_loss=0.02259, audio_tagging_loss=0.01079, over 3049335.23 frames. ], batch size: 56, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 13:49:55,971 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=764733.3333333334, ans=0.1 2023-11-19 13:50:13,156 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.123e+01 8.395e+01 9.151e+01 9.992e+01 1.336e+02, threshold=1.830e+02, percent-clipped=0.0 2023-11-19 13:50:22,920 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.bypass_mid.scale_min, batch_count=764866.6666666666, ans=0.2 2023-11-19 13:50:29,667 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=764933.3333333334, ans=0.1 2023-11-19 13:50:46,573 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.self_attn1.whiten, num_groups=1, num_channels=256, metric=14.86 vs. limit=22.5 2023-11-19 13:50:50,145 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6550, loss[loss=0.06922, simple_loss=0.09407, pruned_loss=0.01354, audio_tagging_loss=0.008646, over 14811.00 frames. ], tot_loss[loss=0.08559, simple_loss=0.1046, pruned_loss=0.02272, audio_tagging_loss=0.01056, over 3048603.37 frames. ], batch size: 56, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 13:51:17,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.out_proj.dropout_p, batch_count=765200.0, ans=0.1 2023-11-19 13:51:25,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=765266.6666666666, ans=0.05 2023-11-19 13:51:45,100 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6600, loss[loss=0.07156, simple_loss=0.09314, pruned_loss=0.01498, audio_tagging_loss=0.01001, over 15903.00 frames. ], tot_loss[loss=0.08573, simple_loss=0.1054, pruned_loss=0.02269, audio_tagging_loss=0.01035, over 3053149.92 frames. ], batch size: 62, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 13:51:46,276 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=765400.0, ans=0.0 2023-11-19 13:51:52,067 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=765400.0, ans=0.0 2023-11-19 13:51:58,876 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.conv_module1.whiten, num_groups=1, num_channels=256, metric=14.51 vs. limit=15.0 2023-11-19 13:52:03,384 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.780e+01 7.948e+01 8.762e+01 9.685e+01 1.504e+02, threshold=1.752e+02, percent-clipped=0.0 2023-11-19 13:52:05,042 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.layerdrop_rate, batch_count=765466.6666666666, ans=0.015 2023-11-19 13:52:17,959 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=765600.0, ans=0.0 2023-11-19 13:52:34,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer_na.min_abs, batch_count=765666.6666666666, ans=0.02 2023-11-19 13:52:40,475 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6650, loss[loss=0.06833, simple_loss=0.08357, pruned_loss=0.01539, audio_tagging_loss=0.01116, over 16547.00 frames. ], tot_loss[loss=0.08545, simple_loss=0.1051, pruned_loss=0.02256, audio_tagging_loss=0.01035, over 3054554.83 frames. ], batch size: 63, lr: 6.92e-03, grad_scale: 16.0 2023-11-19 13:52:49,687 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_skip_rate, batch_count=765733.3333333334, ans=0.0 2023-11-19 13:52:52,782 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward2.hidden_balancer.prob, batch_count=765800.0, ans=0.125 2023-11-19 13:53:01,268 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.scale_min, batch_count=765866.6666666666, ans=0.2 2023-11-19 13:53:26,180 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.ff2_skip_rate, batch_count=766000.0, ans=0.0 2023-11-19 13:53:35,928 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6700, loss[loss=0.06156, simple_loss=0.07983, pruned_loss=0.01259, audio_tagging_loss=0.009056, over 14889.00 frames. ], tot_loss[loss=0.08546, simple_loss=0.1052, pruned_loss=0.02256, audio_tagging_loss=0.01029, over 3041202.61 frames. ], batch size: 56, lr: 6.91e-03, grad_scale: 16.0 2023-11-19 13:53:48,055 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=9.84 vs. limit=15.0 2023-11-19 13:53:55,023 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.852e+01 8.241e+01 9.083e+01 1.005e+02 1.420e+02, threshold=1.817e+02, percent-clipped=0.0 2023-11-19 13:53:57,248 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=766200.0, ans=0.0 2023-11-19 13:54:18,794 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.07 vs. limit=10.0 2023-11-19 13:54:20,661 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_skip_rate, batch_count=766333.3333333334, ans=0.0 2023-11-19 13:54:25,928 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module2.balancer1.prob, batch_count=766333.3333333334, ans=0.125 2023-11-19 13:54:26,076 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer1.prob, batch_count=766333.3333333334, ans=0.125 2023-11-19 13:54:31,103 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6750, loss[loss=0.09123, simple_loss=0.1061, pruned_loss=0.02634, audio_tagging_loss=0.01184, over 14667.00 frames. ], tot_loss[loss=0.08502, simple_loss=0.1044, pruned_loss=0.02253, audio_tagging_loss=0.01028, over 3039142.25 frames. ], batch size: 55, lr: 6.91e-03, grad_scale: 16.0 2023-11-19 13:54:57,886 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.bypass_mid.scale_min, batch_count=766533.3333333334, ans=0.2 2023-11-19 13:55:16,489 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=766666.6666666666, ans=0.0 2023-11-19 13:55:27,977 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6800, loss[loss=0.09846, simple_loss=0.1212, pruned_loss=0.02848, audio_tagging_loss=0.009369, over 16059.00 frames. ], tot_loss[loss=0.08538, simple_loss=0.1047, pruned_loss=0.02271, audio_tagging_loss=0.01031, over 3044228.20 frames. ], batch size: 61, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 13:55:32,519 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=766733.3333333334, ans=0.0 2023-11-19 13:55:46,472 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.136e+01 8.286e+01 8.985e+01 9.839e+01 1.456e+02, threshold=1.797e+02, percent-clipped=0.0 2023-11-19 13:56:01,143 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=766933.3333333334, ans=0.0 2023-11-19 13:56:15,921 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=767000.0, ans=0.125 2023-11-19 13:56:23,561 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6850, loss[loss=0.09636, simple_loss=0.1221, pruned_loss=0.02628, audio_tagging_loss=0.009045, over 15560.00 frames. ], tot_loss[loss=0.0847, simple_loss=0.1038, pruned_loss=0.02246, audio_tagging_loss=0.01033, over 3040232.50 frames. ], batch size: 59, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 13:56:23,810 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_skip_rate, batch_count=767066.6666666666, ans=0.0 2023-11-19 13:56:38,580 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.attention_skip_rate, batch_count=767133.3333333334, ans=0.0 2023-11-19 13:56:56,228 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=767266.6666666666, ans=0.125 2023-11-19 13:57:07,354 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=767333.3333333334, ans=0.0 2023-11-19 13:57:18,761 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6900, loss[loss=0.1054, simple_loss=0.1189, pruned_loss=0.03339, audio_tagging_loss=0.01256, over 15391.00 frames. ], tot_loss[loss=0.08494, simple_loss=0.1045, pruned_loss=0.02241, audio_tagging_loss=0.01026, over 3044420.11 frames. ], batch size: 56, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 13:57:24,765 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.skip_rate, batch_count=767400.0, ans=0.07 2023-11-19 13:57:32,752 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=767466.6666666666, ans=0.125 2023-11-19 13:57:38,225 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.298e+01 8.115e+01 8.697e+01 9.342e+01 1.240e+02, threshold=1.739e+02, percent-clipped=0.0 2023-11-19 13:57:51,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_module2.balancer1.prob, batch_count=767600.0, ans=0.125 2023-11-19 13:57:58,994 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.self_attn_weights.pos_emb_skip_rate, batch_count=767600.0, ans=0.0 2023-11-19 13:58:00,780 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/Xez1ffAcb0w_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 13:58:08,009 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.skip_rate, batch_count=767666.6666666666, ans=0.09899494936611666 2023-11-19 13:58:13,198 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=10.53 vs. limit=15.0 2023-11-19 13:58:14,578 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 6950, loss[loss=0.07489, simple_loss=0.09009, pruned_loss=0.0184, audio_tagging_loss=0.01145, over 15369.00 frames. ], tot_loss[loss=0.08501, simple_loss=0.1048, pruned_loss=0.02234, audio_tagging_loss=0.01024, over 3042012.94 frames. ], batch size: 57, lr: 6.91e-03, grad_scale: 32.0 2023-11-19 13:58:15,924 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.min_positive, batch_count=767733.3333333334, ans=0.05 2023-11-19 13:58:28,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.balancer2.prob, batch_count=767800.0, ans=0.125 2023-11-19 13:58:30,439 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=767800.0, ans=0.0 2023-11-19 13:58:34,801 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.bypass.skip_rate, batch_count=767800.0, ans=0.09899494936611666 2023-11-19 13:58:39,076 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 13:59:10,769 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7000, loss[loss=0.08803, simple_loss=0.1016, pruned_loss=0.02427, audio_tagging_loss=0.01297, over 14445.00 frames. ], tot_loss[loss=0.08473, simple_loss=0.1041, pruned_loss=0.02229, audio_tagging_loss=0.01038, over 3036931.78 frames. ], batch size: 56, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 13:59:18,320 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=768066.6666666666, ans=0.1 2023-11-19 13:59:21,798 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=512, metric=12.35 vs. limit=15.0 2023-11-19 13:59:29,028 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.706e+01 8.254e+01 9.022e+01 1.015e+02 1.308e+02, threshold=1.804e+02, percent-clipped=0.0 2023-11-19 13:59:54,586 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.hidden_balancer.prob, batch_count=768333.3333333334, ans=0.125 2023-11-19 14:00:04,857 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7050, loss[loss=0.06206, simple_loss=0.07913, pruned_loss=0.01156, audio_tagging_loss=0.01093, over 14621.00 frames. ], tot_loss[loss=0.08451, simple_loss=0.1039, pruned_loss=0.02215, audio_tagging_loss=0.01042, over 3033814.50 frames. ], batch size: 55, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:00:59,859 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.self_attn2.whiten.whitening_limit, batch_count=768733.3333333334, ans=22.5 2023-11-19 14:01:00,283 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7100, loss[loss=0.09312, simple_loss=0.1141, pruned_loss=0.02356, audio_tagging_loss=0.01251, over 16498.00 frames. ], tot_loss[loss=0.08503, simple_loss=0.1044, pruned_loss=0.02234, audio_tagging_loss=0.01048, over 3049454.55 frames. ], batch size: 63, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:01:01,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.conv_skip_rate, batch_count=768733.3333333334, ans=0.0 2023-11-19 14:01:05,044 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=512, metric=12.15 vs. limit=15.0 2023-11-19 14:01:17,324 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.ff3_skip_rate, batch_count=768800.0, ans=0.0 2023-11-19 14:01:19,248 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.232e+01 8.436e+01 9.096e+01 9.960e+01 1.200e+02, threshold=1.819e+02, percent-clipped=0.0 2023-11-19 14:01:28,158 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=2.20 vs. limit=6.0 2023-11-19 14:01:30,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer2.prob, batch_count=768866.6666666666, ans=0.125 2023-11-19 14:01:31,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.self_attn1.whiten, num_groups=1, num_channels=384, metric=14.61 vs. limit=22.5 2023-11-19 14:01:56,308 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7150, loss[loss=0.06847, simple_loss=0.08425, pruned_loss=0.01554, audio_tagging_loss=0.01081, over 15106.00 frames. ], tot_loss[loss=0.08534, simple_loss=0.1048, pruned_loss=0.02253, audio_tagging_loss=0.01041, over 3050460.09 frames. ], batch size: 56, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:02:21,664 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=9.05 vs. limit=10.0 2023-11-19 14:02:27,023 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=16.75 vs. limit=22.5 2023-11-19 14:02:38,706 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass.scale_min, batch_count=769266.6666666666, ans=0.2 2023-11-19 14:02:45,577 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.conv_module1.whiten, num_groups=1, num_channels=192, metric=12.89 vs. limit=15.0 2023-11-19 14:02:47,091 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward2.hidden_balancer.prob, batch_count=769333.3333333334, ans=0.125 2023-11-19 14:02:52,147 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7200, loss[loss=0.1022, simple_loss=0.1233, pruned_loss=0.03283, audio_tagging_loss=0.007766, over 15208.00 frames. ], tot_loss[loss=0.0857, simple_loss=0.105, pruned_loss=0.02279, audio_tagging_loss=0.01042, over 3046054.14 frames. ], batch size: 56, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:03:01,503 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=769400.0, ans=0.125 2023-11-19 14:03:05,116 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=769466.6666666666, ans=0.125 2023-11-19 14:03:11,335 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.281e+01 8.379e+01 9.176e+01 1.015e+02 1.604e+02, threshold=1.835e+02, percent-clipped=0.0 2023-11-19 14:03:48,477 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7250, loss[loss=0.08782, simple_loss=0.1131, pruned_loss=0.02059, audio_tagging_loss=0.01067, over 14803.00 frames. ], tot_loss[loss=0.0866, simple_loss=0.1061, pruned_loss=0.02305, audio_tagging_loss=0.01051, over 3046832.83 frames. ], batch size: 56, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:03:57,004 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=769733.3333333334, ans=0.1 2023-11-19 14:04:04,330 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=11.00 vs. limit=15.0 2023-11-19 14:04:07,467 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.3.conv_module1.whiten, num_groups=1, num_channels=512, metric=5.08 vs. limit=15.0 2023-11-19 14:04:08,579 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten1, num_groups=1, num_channels=384, metric=7.32 vs. limit=10.0 2023-11-19 14:04:15,171 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=769866.6666666666, ans=0.0 2023-11-19 14:04:29,976 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.conv_module2.balancer1.prob, batch_count=769933.3333333334, ans=0.125 2023-11-19 14:04:43,487 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7300, loss[loss=0.06296, simple_loss=0.07722, pruned_loss=0.01315, audio_tagging_loss=0.0112, over 15285.00 frames. ], tot_loss[loss=0.0864, simple_loss=0.106, pruned_loss=0.02301, audio_tagging_loss=0.01037, over 3051027.66 frames. ], batch size: 58, lr: 6.90e-03, grad_scale: 32.0 2023-11-19 14:04:58,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.self_attn_weights.pos_emb_skip_rate, batch_count=770133.3333333334, ans=0.0 2023-11-19 14:05:02,641 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.433e+01 8.339e+01 9.048e+01 1.014e+02 1.411e+02, threshold=1.810e+02, percent-clipped=0.0 2023-11-19 14:05:08,854 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.balancer.max_positive, batch_count=770200.0, ans=0.95 2023-11-19 14:05:26,413 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=770266.6666666666, ans=0.125 2023-11-19 14:05:35,294 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=770333.3333333334, ans=0.125 2023-11-19 14:05:39,784 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7350, loss[loss=0.09513, simple_loss=0.1191, pruned_loss=0.02793, audio_tagging_loss=0.007634, over 14439.00 frames. ], tot_loss[loss=0.08612, simple_loss=0.1059, pruned_loss=0.02291, audio_tagging_loss=0.01027, over 3046423.71 frames. ], batch size: 53, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:05:42,346 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=256, metric=11.44 vs. limit=22.5 2023-11-19 14:05:43,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer1.max_abs, batch_count=770400.0, ans=10.0 2023-11-19 14:05:57,173 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=770466.6666666666, ans=0.125 2023-11-19 14:06:13,436 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:06:22,390 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:06:23,749 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass.skip_rate, batch_count=770666.6666666666, ans=0.07 2023-11-19 14:06:35,356 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.ff2_skip_rate, batch_count=770733.3333333334, ans=0.0 2023-11-19 14:06:36,273 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7400, loss[loss=0.1, simple_loss=0.1166, pruned_loss=0.03226, audio_tagging_loss=0.009468, over 13562.00 frames. ], tot_loss[loss=0.08561, simple_loss=0.1054, pruned_loss=0.02272, audio_tagging_loss=0.01018, over 3044796.65 frames. ], batch size: 52, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:06:41,719 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.scale_min, batch_count=770733.3333333334, ans=0.2 2023-11-19 14:06:41,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=770733.3333333334, ans=0.0 2023-11-19 14:06:46,954 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=770800.0, ans=0.0 2023-11-19 14:06:54,605 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.256e+01 8.661e+01 9.537e+01 1.042e+02 1.641e+02, threshold=1.907e+02, percent-clipped=0.0 2023-11-19 14:07:05,944 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.bypass_mid.scale_min, batch_count=770866.6666666666, ans=0.2 2023-11-19 14:07:13,268 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=384, metric=9.39 vs. limit=15.0 2023-11-19 14:07:22,322 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=256, metric=9.67 vs. limit=15.0 2023-11-19 14:07:31,158 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7450, loss[loss=0.06926, simple_loss=0.09195, pruned_loss=0.01669, audio_tagging_loss=0.006587, over 15603.00 frames. ], tot_loss[loss=0.08576, simple_loss=0.1059, pruned_loss=0.02273, audio_tagging_loss=0.01007, over 3048722.50 frames. ], batch size: 58, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:07:49,431 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module1.balancer2.prob, batch_count=771133.3333333334, ans=0.125 2023-11-19 14:07:54,192 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.bypass.scale_min, batch_count=771200.0, ans=0.2 2023-11-19 14:08:07,709 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=512, metric=20.06 vs. limit=22.5 2023-11-19 14:08:15,192 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=9.70 vs. limit=10.0 2023-11-19 14:08:24,866 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=771333.3333333334, ans=0.125 2023-11-19 14:08:25,882 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:08:26,735 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7500, loss[loss=0.08401, simple_loss=0.09565, pruned_loss=0.02496, audio_tagging_loss=0.01122, over 16033.00 frames. ], tot_loss[loss=0.08566, simple_loss=0.1056, pruned_loss=0.02289, audio_tagging_loss=0.009949, over 3052553.13 frames. ], batch size: 60, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:08:36,544 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.balancer1.prob, batch_count=771400.0, ans=0.125 2023-11-19 14:08:38,672 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=771466.6666666666, ans=0.1 2023-11-19 14:08:46,252 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.991e+01 8.223e+01 8.899e+01 9.673e+01 1.181e+02, threshold=1.780e+02, percent-clipped=0.0 2023-11-19 14:09:02,794 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.nonlin_attention.balancer.max_positive, batch_count=771600.0, ans=0.95 2023-11-19 14:09:11,112 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=771666.6666666666, ans=0.1 2023-11-19 14:09:20,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer1.prob, batch_count=771666.6666666666, ans=0.125 2023-11-19 14:09:23,262 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7550, loss[loss=0.1042, simple_loss=0.125, pruned_loss=0.03077, audio_tagging_loss=0.01091, over 15197.00 frames. ], tot_loss[loss=0.08512, simple_loss=0.1047, pruned_loss=0.02272, audio_tagging_loss=0.01006, over 3050985.49 frames. ], batch size: 57, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:09:27,848 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.skip_rate, batch_count=771733.3333333334, ans=0.04949747468305833 2023-11-19 14:09:36,392 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=256, metric=9.10 vs. limit=15.0 2023-11-19 14:09:38,242 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.attention_skip_rate, batch_count=771800.0, ans=0.0 2023-11-19 14:09:39,449 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module2.balancer2.min_positive, batch_count=771800.0, ans=0.05 2023-11-19 14:09:44,051 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=771866.6666666666, ans=0.0 2023-11-19 14:09:52,633 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=771866.6666666666, ans=0.125 2023-11-19 14:09:57,767 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.balancer1.prob, batch_count=771933.3333333334, ans=0.125 2023-11-19 14:10:17,844 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7600, loss[loss=0.07686, simple_loss=0.09515, pruned_loss=0.02012, audio_tagging_loss=0.009165, over 14500.00 frames. ], tot_loss[loss=0.08465, simple_loss=0.1038, pruned_loss=0.02264, audio_tagging_loss=0.01012, over 3046738.42 frames. ], batch size: 54, lr: 6.89e-03, grad_scale: 32.0 2023-11-19 14:10:21,029 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.conv.2.prob, batch_count=772066.6666666666, ans=0.125 2023-11-19 14:10:21,249 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.feed_forward1.out_proj.dropout_p, batch_count=772066.6666666666, ans=0.1 2023-11-19 14:10:22,795 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer1.prob, batch_count=772066.6666666666, ans=0.125 2023-11-19 14:10:23,942 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.out_combiner.scale_min, batch_count=772066.6666666666, ans=0.2 2023-11-19 14:10:36,183 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.972e+01 8.387e+01 9.154e+01 1.016e+02 1.447e+02, threshold=1.831e+02, percent-clipped=0.0 2023-11-19 14:10:36,410 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=772133.3333333334, ans=0.125 2023-11-19 14:10:49,153 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.1.encoder.layers.0.whiten, num_groups=1, num_channels=256, metric=5.38 vs. limit=12.0 2023-11-19 14:10:54,378 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=21.46 vs. limit=22.5 2023-11-19 14:10:56,153 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=772266.6666666666, ans=0.125 2023-11-19 14:11:13,460 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7650, loss[loss=0.07929, simple_loss=0.1062, pruned_loss=0.01605, audio_tagging_loss=0.01015, over 16187.00 frames. ], tot_loss[loss=0.08485, simple_loss=0.1038, pruned_loss=0.02267, audio_tagging_loss=0.01027, over 3046906.06 frames. ], batch size: 58, lr: 6.89e-03, grad_scale: 16.0 2023-11-19 14:11:17,180 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten1, num_groups=1, num_channels=288, metric=7.61 vs. limit=10.0 2023-11-19 14:11:38,793 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.nonlin_attention.whiten2, num_groups=1, num_channels=512, metric=9.15 vs. limit=15.0 2023-11-19 14:12:08,892 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7700, loss[loss=0.1001, simple_loss=0.1223, pruned_loss=0.02988, audio_tagging_loss=0.009013, over 14852.00 frames. ], tot_loss[loss=0.08476, simple_loss=0.1038, pruned_loss=0.02251, audio_tagging_loss=0.01033, over 3041845.52 frames. ], batch size: 54, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 14:12:29,297 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.117e+01 8.040e+01 8.605e+01 9.398e+01 1.279e+02, threshold=1.721e+02, percent-clipped=0.0 2023-11-19 14:12:44,492 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_whiten, num_groups=1, num_channels=256, metric=8.77 vs. limit=15.0 2023-11-19 14:13:02,290 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.balancer2.prob, batch_count=773000.0, ans=0.125 2023-11-19 14:13:04,317 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7750, loss[loss=0.0679, simple_loss=0.07545, pruned_loss=0.0187, audio_tagging_loss=0.01147, over 14328.00 frames. ], tot_loss[loss=0.08433, simple_loss=0.1032, pruned_loss=0.02226, audio_tagging_loss=0.01046, over 3041801.75 frames. ], batch size: 54, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 14:13:09,817 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.ff2_skip_rate, batch_count=773066.6666666666, ans=0.0 2023-11-19 14:13:25,254 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.bypass.scale_min, batch_count=773200.0, ans=0.2 2023-11-19 14:13:47,862 INFO [checkpoint.py:75] (0/4) Saving checkpoint to multi_KD/exp_train_asr_full_libri1_do_audio_tagging1_as_unbalanced_scale1.0/checkpoint-116000.pt 2023-11-19 14:14:01,670 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=7.24 vs. limit=15.0 2023-11-19 14:14:01,957 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7800, loss[loss=0.06247, simple_loss=0.08225, pruned_loss=0.0114, audio_tagging_loss=0.00994, over 15638.00 frames. ], tot_loss[loss=0.08491, simple_loss=0.1042, pruned_loss=0.02241, audio_tagging_loss=0.01038, over 3045536.77 frames. ], batch size: 58, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 14:14:12,733 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.2.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:14:13,715 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.conv_skip_rate, batch_count=773466.6666666666, ans=0.0 2023-11-19 14:14:18,448 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.balancer2.prob, batch_count=773466.6666666666, ans=0.125 2023-11-19 14:14:23,103 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.774e+01 8.739e+01 9.425e+01 1.048e+02 2.167e+02, threshold=1.885e+02, percent-clipped=1.0 2023-11-19 14:14:28,770 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module2.balancer2.prob, batch_count=773533.3333333334, ans=0.125 2023-11-19 14:14:29,759 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_module1.balancer2.min_positive, batch_count=773533.3333333334, ans=0.05 2023-11-19 14:14:32,868 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.conv_module1.balancer2.prob, batch_count=773533.3333333334, ans=0.125 2023-11-19 14:14:57,156 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7850, loss[loss=0.08438, simple_loss=0.1036, pruned_loss=0.02287, audio_tagging_loss=0.009688, over 14295.00 frames. ], tot_loss[loss=0.08521, simple_loss=0.1047, pruned_loss=0.02246, audio_tagging_loss=0.0104, over 3048146.27 frames. ], batch size: 56, lr: 6.88e-03, grad_scale: 8.0 2023-11-19 14:15:06,760 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=773733.3333333334, ans=0.125 2023-11-19 14:15:07,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=773800.0, ans=0.0 2023-11-19 14:15:12,926 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.conv_module1.balancer1.prob, batch_count=773800.0, ans=0.125 2023-11-19 14:15:26,539 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=5.89 vs. limit=15.0 2023-11-19 14:15:31,569 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward3.hidden_balancer.prob, batch_count=773933.3333333334, ans=0.125 2023-11-19 14:15:36,783 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=773933.3333333334, ans=0.1 2023-11-19 14:15:39,440 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.min_abs, batch_count=773933.3333333334, ans=0.5 2023-11-19 14:15:50,128 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.bypass.skip_rate, batch_count=774000.0, ans=0.07 2023-11-19 14:15:53,099 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7900, loss[loss=0.1015, simple_loss=0.1282, pruned_loss=0.02762, audio_tagging_loss=0.009781, over 15013.00 frames. ], tot_loss[loss=0.0848, simple_loss=0.1041, pruned_loss=0.02228, audio_tagging_loss=0.01048, over 3039659.74 frames. ], batch size: 57, lr: 6.88e-03, grad_scale: 8.0 2023-11-19 14:15:54,454 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=774066.6666666666, ans=0.1 2023-11-19 14:16:08,521 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer1.prob, batch_count=774133.3333333334, ans=0.125 2023-11-19 14:16:13,450 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.041e+01 8.211e+01 8.938e+01 9.745e+01 1.285e+02, threshold=1.788e+02, percent-clipped=0.0 2023-11-19 14:16:21,572 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.attention_skip_rate, batch_count=774200.0, ans=0.0 2023-11-19 14:16:48,184 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 7950, loss[loss=0.1003, simple_loss=0.1262, pruned_loss=0.02668, audio_tagging_loss=0.01053, over 14928.00 frames. ], tot_loss[loss=0.0855, simple_loss=0.1043, pruned_loss=0.02268, audio_tagging_loss=0.01065, over 3036342.53 frames. ], batch size: 57, lr: 6.88e-03, grad_scale: 8.0 2023-11-19 14:16:57,985 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.ff2_skip_rate, batch_count=774466.6666666666, ans=0.0 2023-11-19 14:16:59,890 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/uQjH4tNUZ_g_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 14:17:08,658 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.balancer2.prob, batch_count=774466.6666666666, ans=0.125 2023-11-19 14:17:11,813 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.nonlin_attention.balancer.min_positive, batch_count=774533.3333333334, ans=0.05 2023-11-19 14:17:14,001 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=11.38 vs. limit=15.0 2023-11-19 14:17:14,512 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward3.hidden_balancer.prob, batch_count=774533.3333333334, ans=0.125 2023-11-19 14:17:22,965 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.ff2_skip_rate, batch_count=774600.0, ans=0.0 2023-11-19 14:17:37,535 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=774666.6666666666, ans=0.125 2023-11-19 14:17:40,669 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=774666.6666666666, ans=0.1 2023-11-19 14:17:43,714 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8000, loss[loss=0.08789, simple_loss=0.1015, pruned_loss=0.02424, audio_tagging_loss=0.0129, over 15620.00 frames. ], tot_loss[loss=0.0855, simple_loss=0.1039, pruned_loss=0.02278, audio_tagging_loss=0.0108, over 3038994.61 frames. ], batch size: 59, lr: 6.88e-03, grad_scale: 16.0 2023-11-19 14:17:55,571 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=774800.0, ans=0.1 2023-11-19 14:18:00,308 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.balancer2.prob, batch_count=774800.0, ans=0.125 2023-11-19 14:18:02,420 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.balancer.prob, batch_count=774800.0, ans=0.125 2023-11-19 14:18:05,287 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.487e+01 8.348e+01 9.114e+01 9.901e+01 1.524e+02, threshold=1.823e+02, percent-clipped=0.0 2023-11-19 14:18:21,891 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=774933.3333333334, ans=0.0 2023-11-19 14:18:23,075 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward2.hidden_balancer.prob, batch_count=774933.3333333334, ans=0.125 2023-11-19 14:18:23,120 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=774933.3333333334, ans=0.2 2023-11-19 14:18:31,065 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=775000.0, ans=0.1 2023-11-19 14:18:38,635 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.62 vs. limit=5.0 2023-11-19 14:18:39,889 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8050, loss[loss=0.08024, simple_loss=0.09444, pruned_loss=0.01599, audio_tagging_loss=0.01702, over 15555.00 frames. ], tot_loss[loss=0.08548, simple_loss=0.1035, pruned_loss=0.02284, audio_tagging_loss=0.0109, over 3038161.74 frames. ], batch size: 57, lr: 6.87e-03, grad_scale: 16.0 2023-11-19 14:19:05,492 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=775200.0, ans=0.125 2023-11-19 14:19:05,619 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=775200.0, ans=0.2 2023-11-19 14:19:12,344 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder_embed.convnext.hidden_balancer.prob, batch_count=775266.6666666666, ans=0.125 2023-11-19 14:19:23,453 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:19:28,241 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:19:36,027 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8100, loss[loss=0.1054, simple_loss=0.1344, pruned_loss=0.02933, audio_tagging_loss=0.008905, over 15313.00 frames. ], tot_loss[loss=0.08567, simple_loss=0.1039, pruned_loss=0.023, audio_tagging_loss=0.01075, over 3034699.68 frames. ], batch size: 56, lr: 6.87e-03, grad_scale: 16.0 2023-11-19 14:19:45,106 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.whiten, num_groups=1, num_channels=192, metric=3.62 vs. limit=12.0 2023-11-19 14:19:47,805 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_module2.balancer2.prob, batch_count=775466.6666666666, ans=0.125 2023-11-19 14:19:49,433 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=192, metric=12.62 vs. limit=15.0 2023-11-19 14:19:55,734 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.attention_skip_rate, batch_count=775466.6666666666, ans=0.0 2023-11-19 14:19:56,606 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.035e+01 8.298e+01 8.945e+01 9.680e+01 1.266e+02, threshold=1.789e+02, percent-clipped=0.0 2023-11-19 14:20:31,257 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8150, loss[loss=0.0948, simple_loss=0.1178, pruned_loss=0.02525, audio_tagging_loss=0.01066, over 14915.00 frames. ], tot_loss[loss=0.08606, simple_loss=0.1049, pruned_loss=0.02312, audio_tagging_loss=0.01048, over 3039982.18 frames. ], batch size: 56, lr: 6.87e-03, grad_scale: 8.0 2023-11-19 14:20:33,637 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=775733.3333333334, ans=0.1 2023-11-19 14:20:48,446 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.bypass.scale_min, batch_count=775800.0, ans=0.2 2023-11-19 14:21:07,533 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.feed_forward1.hidden_balancer.prob, batch_count=775933.3333333334, ans=0.125 2023-11-19 14:21:24,898 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.5.encoder.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=256, metric=14.22 vs. limit=15.0 2023-11-19 14:21:26,667 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/8C7biyx9TQ4_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 14:21:27,679 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8200, loss[loss=0.08955, simple_loss=0.1107, pruned_loss=0.02599, audio_tagging_loss=0.008202, over 15570.00 frames. ], tot_loss[loss=0.08641, simple_loss=0.1056, pruned_loss=0.02331, audio_tagging_loss=0.01031, over 3048677.55 frames. ], batch size: 57, lr: 6.87e-03, grad_scale: 8.0 2023-11-19 14:21:41,645 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass.skip_rate, batch_count=776133.3333333334, ans=0.09899494936611666 2023-11-19 14:21:49,752 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.188e+01 8.524e+01 9.249e+01 1.061e+02 1.477e+02, threshold=1.850e+02, percent-clipped=0.0 2023-11-19 14:22:23,482 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8250, loss[loss=0.09455, simple_loss=0.1149, pruned_loss=0.02491, audio_tagging_loss=0.01222, over 13789.00 frames. ], tot_loss[loss=0.0868, simple_loss=0.1061, pruned_loss=0.02352, audio_tagging_loss=0.01023, over 3042588.24 frames. ], batch size: 52, lr: 6.87e-03, grad_scale: 8.0 2023-11-19 14:22:50,230 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=776533.3333333334, ans=0.0 2023-11-19 14:22:55,399 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=776600.0, ans=0.1 2023-11-19 14:23:00,663 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=776600.0, ans=0.0 2023-11-19 14:23:13,326 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.3.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:23:16,459 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.ff3_skip_rate, batch_count=776666.6666666666, ans=0.0 2023-11-19 14:23:18,348 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8300, loss[loss=0.07037, simple_loss=0.08604, pruned_loss=0.01702, audio_tagging_loss=0.01032, over 15000.00 frames. ], tot_loss[loss=0.08626, simple_loss=0.1055, pruned_loss=0.02321, audio_tagging_loss=0.01028, over 3041494.81 frames. ], batch size: 57, lr: 6.87e-03, grad_scale: 8.0 2023-11-19 14:23:18,546 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.attention_skip_rate, batch_count=776733.3333333334, ans=0.0 2023-11-19 14:23:27,823 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.49 vs. limit=6.0 2023-11-19 14:23:30,062 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=776800.0, ans=0.125 2023-11-19 14:23:39,122 INFO [scaling.py:1022] (0/4) Whitening: name=encoder_embed.convnext.out_whiten, num_groups=1, num_channels=128, metric=4.52 vs. limit=5.0 2023-11-19 14:23:40,362 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.122e+01 8.080e+01 8.886e+01 9.811e+01 1.458e+02, threshold=1.777e+02, percent-clipped=0.0 2023-11-19 14:23:53,634 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.62 vs. limit=15.0 2023-11-19 14:24:02,737 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.1.bypass.scale_min, batch_count=777000.0, ans=0.2 2023-11-19 14:24:06,532 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.feed_forward1.out_proj.dropout_p, batch_count=777000.0, ans=0.1 2023-11-19 14:24:14,328 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8350, loss[loss=0.09655, simple_loss=0.1094, pruned_loss=0.02801, audio_tagging_loss=0.01385, over 14839.00 frames. ], tot_loss[loss=0.08675, simple_loss=0.1065, pruned_loss=0.02333, audio_tagging_loss=0.01017, over 3040496.88 frames. ], batch size: 55, lr: 6.86e-03, grad_scale: 8.0 2023-11-19 14:24:18,947 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.attention_skip_rate, batch_count=777066.6666666666, ans=0.0 2023-11-19 14:24:20,039 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=777066.6666666666, ans=0.125 2023-11-19 14:24:20,471 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.1.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=9.49 vs. limit=15.0 2023-11-19 14:24:32,622 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.hidden_balancer.prob, batch_count=777133.3333333334, ans=0.125 2023-11-19 14:24:33,670 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=777133.3333333334, ans=0.0 2023-11-19 14:24:39,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.skip_rate, batch_count=777200.0, ans=0.04949747468305833 2023-11-19 14:24:48,751 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.bypass.scale_min, batch_count=777266.6666666666, ans=0.2 2023-11-19 14:25:09,775 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8400, loss[loss=0.07938, simple_loss=0.09231, pruned_loss=0.01978, audio_tagging_loss=0.01344, over 14990.00 frames. ], tot_loss[loss=0.08685, simple_loss=0.1068, pruned_loss=0.02325, audio_tagging_loss=0.0102, over 3047007.61 frames. ], batch size: 56, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 14:25:14,823 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.self_attn_weights.pos_emb_skip_rate, batch_count=777400.0, ans=0.0 2023-11-19 14:25:31,395 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.feed_forward3.hidden_balancer.prob, batch_count=777533.3333333334, ans=0.125 2023-11-19 14:25:32,116 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.696e+01 8.445e+01 9.359e+01 1.034e+02 1.708e+02, threshold=1.872e+02, percent-clipped=0.0 2023-11-19 14:26:05,401 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8450, loss[loss=0.06967, simple_loss=0.08116, pruned_loss=0.01788, audio_tagging_loss=0.01121, over 15167.00 frames. ], tot_loss[loss=0.08687, simple_loss=0.1068, pruned_loss=0.02324, audio_tagging_loss=0.01025, over 3050272.99 frames. ], batch size: 58, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 14:26:15,576 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=777800.0, ans=0.125 2023-11-19 14:26:26,800 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff2_skip_rate, batch_count=777866.6666666666, ans=0.0 2023-11-19 14:26:27,940 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_skip_rate, batch_count=777866.6666666666, ans=0.0 2023-11-19 14:26:30,096 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=777866.6666666666, ans=0.125 2023-11-19 14:26:48,966 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.balancer2.prob, batch_count=778000.0, ans=0.125 2023-11-19 14:26:54,653 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.conv_module1.whiten, num_groups=1, num_channels=384, metric=9.09 vs. limit=15.0 2023-11-19 14:26:56,948 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.balancer2.prob, batch_count=778000.0, ans=0.125 2023-11-19 14:27:01,319 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8500, loss[loss=0.1196, simple_loss=0.1379, pruned_loss=0.03942, audio_tagging_loss=0.01122, over 14604.00 frames. ], tot_loss[loss=0.08612, simple_loss=0.1058, pruned_loss=0.02293, audio_tagging_loss=0.0103, over 3045619.52 frames. ], batch size: 56, lr: 6.86e-03, grad_scale: 16.0 2023-11-19 14:27:23,626 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer1.prob, batch_count=778200.0, ans=0.125 2023-11-19 14:27:24,285 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.947e+01 8.482e+01 9.526e+01 1.059e+02 1.313e+02, threshold=1.905e+02, percent-clipped=0.0 2023-11-19 14:27:27,604 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.self_attn_weights.pos_emb_skip_rate, batch_count=778200.0, ans=0.0 2023-11-19 14:27:27,667 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.1.encoder.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:27:44,720 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.bypass_mid.scale_min, batch_count=778333.3333333334, ans=0.2 2023-11-19 14:27:56,058 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8550, loss[loss=0.09388, simple_loss=0.1183, pruned_loss=0.02618, audio_tagging_loss=0.008557, over 14755.00 frames. ], tot_loss[loss=0.08558, simple_loss=0.1052, pruned_loss=0.02263, audio_tagging_loss=0.01036, over 3049134.60 frames. ], batch size: 53, lr: 6.86e-03, grad_scale: 8.0 2023-11-19 14:27:57,887 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.conv_skip_rate, batch_count=778400.0, ans=0.0 2023-11-19 14:28:00,031 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.3.feed_forward1.hidden_balancer.prob, batch_count=778400.0, ans=0.125 2023-11-19 14:28:21,631 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.self_attn_weights.whiten_keys.whitening_limit, batch_count=778533.3333333334, ans=6.0 2023-11-19 14:28:35,662 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_skip_rate, batch_count=778600.0, ans=0.0 2023-11-19 14:28:52,529 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8600, loss[loss=0.08918, simple_loss=0.1031, pruned_loss=0.02309, audio_tagging_loss=0.01455, over 14760.00 frames. ], tot_loss[loss=0.08512, simple_loss=0.1043, pruned_loss=0.02241, audio_tagging_loss=0.01055, over 3050771.62 frames. ], batch size: 55, lr: 6.86e-03, grad_scale: 8.0 2023-11-19 14:28:55,234 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.feed_forward1.out_whiten, num_groups=1, num_channels=384, metric=12.41 vs. limit=15.0 2023-11-19 14:28:57,192 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.1.whiten, num_groups=1, num_channels=512, metric=9.15 vs. limit=12.0 2023-11-19 14:29:13,699 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.conv_skip_rate, batch_count=778866.6666666666, ans=0.0 2023-11-19 14:29:15,651 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 5.831e+01 8.133e+01 8.812e+01 9.832e+01 1.513e+02, threshold=1.762e+02, percent-clipped=0.0 2023-11-19 14:29:34,777 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.2.feed_forward3.hidden_balancer.prob, batch_count=778933.3333333334, ans=0.125 2023-11-19 14:29:47,749 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8650, loss[loss=0.1031, simple_loss=0.1275, pruned_loss=0.0304, audio_tagging_loss=0.008987, over 15253.00 frames. ], tot_loss[loss=0.08561, simple_loss=0.1049, pruned_loss=0.02265, audio_tagging_loss=0.01053, over 3056060.97 frames. ], batch size: 57, lr: 6.86e-03, grad_scale: 8.0 2023-11-19 14:29:55,874 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.bypass_mid.scale_min, batch_count=779066.6666666666, ans=0.2 2023-11-19 14:29:56,863 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.0.layers.1.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:30:14,422 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.bypass_mid.scale_min, batch_count=779200.0, ans=0.2 2023-11-19 14:30:21,447 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.feed_forward1.hidden_balancer.prob, batch_count=779266.6666666666, ans=0.125 2023-11-19 14:30:43,882 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8700, loss[loss=0.1025, simple_loss=0.1262, pruned_loss=0.03209, audio_tagging_loss=0.007275, over 16229.00 frames. ], tot_loss[loss=0.08529, simple_loss=0.1046, pruned_loss=0.02248, audio_tagging_loss=0.01051, over 3060896.73 frames. ], batch size: 59, lr: 6.85e-03, grad_scale: 8.0 2023-11-19 14:30:46,581 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.feed_forward2.out_whiten, num_groups=1, num_channels=512, metric=12.81 vs. limit=15.0 2023-11-19 14:30:59,054 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=192, metric=10.25 vs. limit=15.0 2023-11-19 14:31:03,909 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.3.encoder.layers.0.self_attn2.whiten, num_groups=1, num_channels=512, metric=22.34 vs. limit=22.5 2023-11-19 14:31:07,616 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.516e+01 8.174e+01 9.057e+01 1.021e+02 2.200e+02, threshold=1.811e+02, percent-clipped=1.0 2023-11-19 14:31:21,687 INFO [scaling.py:1118] (0/4) WithLoss: name=encoder.encoders.2.encoder.layers.0.self_attn_weights, loss-sum=0.000e+00 2023-11-19 14:31:32,736 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_skip_rate, batch_count=779666.6666666666, ans=0.0 2023-11-19 14:31:39,902 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8750, loss[loss=0.0555, simple_loss=0.06205, pruned_loss=0.009923, audio_tagging_loss=0.01456, over 13886.00 frames. ], tot_loss[loss=0.0859, simple_loss=0.1054, pruned_loss=0.02269, audio_tagging_loss=0.01054, over 3054894.52 frames. ], batch size: 55, lr: 6.85e-03, grad_scale: 8.0 2023-11-19 14:31:51,941 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.1.nonlin_attention.balancer.prob, batch_count=779800.0, ans=0.125 2023-11-19 14:32:20,569 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.whiten, num_groups=1, num_channels=384, metric=7.97 vs. limit=12.0 2023-11-19 14:32:25,786 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.0.layers.0.ff3_skip_rate, batch_count=780000.0, ans=0.0 2023-11-19 14:32:28,011 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.ff2_skip_rate, batch_count=780000.0, ans=0.0 2023-11-19 14:32:35,357 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.conv_module2.balancer1.prob, batch_count=780066.6666666666, ans=0.125 2023-11-19 14:32:36,129 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8800, loss[loss=0.108, simple_loss=0.1319, pruned_loss=0.02973, audio_tagging_loss=0.01234, over 15487.00 frames. ], tot_loss[loss=0.08692, simple_loss=0.1064, pruned_loss=0.02316, audio_tagging_loss=0.01056, over 3057063.75 frames. ], batch size: 56, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:32:41,223 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.prob, batch_count=780066.6666666666, ans=0.125 2023-11-19 14:32:42,269 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.nonlin_attention.balancer.prob, batch_count=780066.6666666666, ans=0.125 2023-11-19 14:32:51,746 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.feed_forward1.out_proj.dropout_p, batch_count=780133.3333333334, ans=0.1 2023-11-19 14:32:59,411 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.938e+01 8.658e+01 9.293e+01 1.017e+02 2.957e+02, threshold=1.859e+02, percent-clipped=2.0 2023-11-19 14:33:17,064 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.0.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=192, metric=8.57 vs. limit=15.0 2023-11-19 14:33:30,849 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.1.conv_module2.balancer2.prob, batch_count=780400.0, ans=0.125 2023-11-19 14:33:31,586 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8850, loss[loss=0.0791, simple_loss=0.1035, pruned_loss=0.01695, audio_tagging_loss=0.0104, over 15031.00 frames. ], tot_loss[loss=0.08636, simple_loss=0.1058, pruned_loss=0.02287, audio_tagging_loss=0.0106, over 3048222.35 frames. ], batch size: 55, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:33:38,666 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.conv_module1.balancer2.min_abs, batch_count=780400.0, ans=0.5 2023-11-19 14:33:40,505 WARNING [train_asr.py:1319] (0/4) Exclude cut with ID unbalanced/1Dq7QH61iXQ_0.000_1.000.wav from training. Number of frames (before subsampling): 100. Number of frames (after subsampling): 23. Text: Dummy text added as a place holder. Please ignore this if possible. Tokens: ['▁D', 'ummy', '▁', 'text', '▁', 'added', '▁', 'as', '▁', 'a', '▁', 'place', '▁', 'holder.', '▁P', 'lease', '▁', 'ignore', '▁', 'this', '▁', 'if', '▁', 'possible']. Number of tokens: 24 2023-11-19 14:33:42,138 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.0.feed_forward3.out_whiten, num_groups=1, num_channels=384, metric=10.03 vs. limit=15.0 2023-11-19 14:33:49,216 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.1.feed_forward2.out_whiten.whitening_limit, batch_count=780466.6666666666, ans=15.0 2023-11-19 14:34:14,021 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.self_attn2.whiten, num_groups=1, num_channels=384, metric=17.69 vs. limit=22.5 2023-11-19 14:34:18,549 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.2.attention_skip_rate, batch_count=780666.6666666666, ans=0.0 2023-11-19 14:34:19,683 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.5.encoder.layers.0.feed_forward1.out_proj.dropout_p, batch_count=780666.6666666666, ans=0.1 2023-11-19 14:34:26,787 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8900, loss[loss=0.09275, simple_loss=0.1146, pruned_loss=0.02893, audio_tagging_loss=0.006523, over 14668.00 frames. ], tot_loss[loss=0.08606, simple_loss=0.1055, pruned_loss=0.0228, audio_tagging_loss=0.0105, over 3046266.18 frames. ], batch size: 54, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:34:43,779 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.balancer_ff3.min_abs, batch_count=780800.0, ans=0.2 2023-11-19 14:34:50,416 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 6.845e+01 8.412e+01 9.119e+01 1.005e+02 1.451e+02, threshold=1.824e+02, percent-clipped=0.0 2023-11-19 14:34:54,184 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.0.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=11.69 vs. limit=15.0 2023-11-19 14:35:01,045 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.2.nonlin_attention.whiten2, num_groups=1, num_channels=384, metric=4.92 vs. limit=15.0 2023-11-19 14:35:14,555 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.1.encoder.layers.0.bypass.scale_min, batch_count=781000.0, ans=0.2 2023-11-19 14:35:22,295 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 8950, loss[loss=0.09099, simple_loss=0.1108, pruned_loss=0.02619, audio_tagging_loss=0.009399, over 14470.00 frames. ], tot_loss[loss=0.08527, simple_loss=0.1047, pruned_loss=0.02254, audio_tagging_loss=0.0104, over 3053315.22 frames. ], batch size: 53, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:35:24,950 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.4.encoder.layers.1.conv_module2.whiten, num_groups=1, num_channels=384, metric=2.99 vs. limit=15.0 2023-11-19 14:35:43,611 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.1.conv_module2.balancer2.prob, batch_count=781200.0, ans=0.125 2023-11-19 14:36:00,526 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.0.conv_module1.balancer1.max_abs, batch_count=781266.6666666666, ans=10.0 2023-11-19 14:36:04,913 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.3.encoder.layers.0.ff3_skip_rate, batch_count=781266.6666666666, ans=0.0 2023-11-19 14:36:18,255 INFO [train_asr.py:1115] (0/4) Epoch 10, batch 9000, loss[loss=0.09029, simple_loss=0.108, pruned_loss=0.0256, audio_tagging_loss=0.01067, over 15417.00 frames. ], tot_loss[loss=0.08503, simple_loss=0.1044, pruned_loss=0.02254, audio_tagging_loss=0.01028, over 3057704.79 frames. ], batch size: 58, lr: 6.85e-03, grad_scale: 16.0 2023-11-19 14:36:18,257 INFO [train_asr.py:1138] (0/4) Computing validation loss 2023-11-19 14:36:38,117 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.3.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([3.2499, 3.7333, 4.1625, 3.4362, 3.9855, 3.7296, 3.6230, 3.8642], device='cuda:0') 2023-11-19 14:36:52,851 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.4.encoder.layers.2.self_attn_weights, attn_weights_entropy = tensor([2.6668, 4.1424, 3.6328, 2.9775], device='cuda:0') 2023-11-19 14:36:56,608 INFO [zipformer.py:1873] (0/4) name=encoder.encoders.1.encoder.layers.1.self_attn_weights, attn_weights_entropy = tensor([4.8072, 4.8651, 4.7664, 4.8364], device='cuda:0') 2023-11-19 14:36:58,297 INFO [train_asr.py:1147] (0/4) Epoch 10, validation: loss=0.06535, simple_loss=0.05527, pruned_loss=0.006386, audio_tagging_loss=0.03133, over 4681554.00 frames. 2023-11-19 14:36:58,298 INFO [train_asr.py:1148] (0/4) Maximum memory allocated so far is 26250MB 2023-11-19 14:37:09,625 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.4.encoder.layers.2.conv_module2.balancer1.prob, batch_count=781466.6666666666, ans=0.125 2023-11-19 14:37:21,140 INFO [scaling.py:1022] (0/4) Whitening: name=encoder.encoders.2.encoder.layers.1.self_attn_weights.whiten_keys, num_groups=4, num_channels=128, metric=4.75 vs. limit=6.0 2023-11-19 14:37:24,833 INFO [scaling.py:213] (0/4) ScheduledFloat: name=encoder.encoders.2.encoder.layers.0.conv_skip_rate, batch_count=781533.3333333334, ans=0.0 2023-11-19 14:37:29,704 INFO [optim.py:476] (0/4) Clipping_scale=2.0, grad-norm quartiles 7.012e+01 8.251e+01 8.814e+01 9.719e+01 1.451e+02, threshold=1.763e+02, percent-clipped=0.0